Skip to content

Discussion Request: Table naming strategy modification #35

@brendagutman

Description

@brendagutman

Is your feature request related to a problem? Please describe

Topic for discussion:

  • Should we modify our file naming system within the dbt repo for some of the pipeline stages? Namely the schemas that will be external, the access, and export models.

Background:
The current model naming convention uses prefixes to allow dbt to distinguish between dbt-references. dbt doesn't define a reference using the directory, only filename, so it would doesn't distinguish between access/accesspolicy.sql and export/fhir/accesspolicy.sql because they have the same filename, causing an error when dbt is run. We used prefixes to get around this, and have explicit table names. i.e. inc_brainpower_stb_accesspolicy vs access_accesspolicy vs fhir_accesspolicy

dbt functionality:

  • Model name (dbt internal) --> based on filename
  • Reference name (ref()) --> always the model name
  • Warehouse object --> model name, but can be overridden (alias, schema, database)

Issue:
dbt creates tables in the warehouse according to their filenames unless otherwise configured. These are not configured currently, resulting in long, unnecessarily annoying table names.

Proposal 1
Remove the prefixes from the tables altogether, if possible. Using prefixes may be an artifact of another pipeline and not be required for this one. I'd like to do a POC on this if we think it is worth it.

Proposal 2
Leave the prefixes, overwrite warehouse object names using the alias configuration. Do this for external schemas(access and export stages). We could do the same for study-specific stages, this would depend on harmonizer preference, but there should be a convention defined either way.

example: fhir access policy

  • Model name: no change - export/fhir/fhir_accesspolicy.sql
  • Reference name: no change - fhir_accesspolicy
  • Warehouse object: CHANGED - prod_fhir.fhir_accesspolicy --> prod_fhir.accesspolicy

Pros:

  1. Defining an alias for each table decouples the naming convention of model files in the repo and warehouse table names. The db schemas make it clear which table is being referenced, there is no need to restate the stage in the tablename.
  2. Users of the external models would appreciate it.

Describe the solution you'd like

  1. I'd like to do a POC to remove the use of table naming prefixes.
  2. If I cannot remove the prefixes due to a dbt or pipeline specific requirement then I want to use alias' to modify the warehouse object - table naming convention for the externally viewed schemas.

Describe alternatives you've considered

No response

Additional Context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions