Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
0750e91
feature/union-data
fivetran-catfritz Oct 16, 2025
5e7b061
add new files
fivetran-catfritz Oct 16, 2025
8731319
fixes
fivetran-catfritz Oct 16, 2025
7996d58
adjust staging
fivetran-catfritz Oct 16, 2025
edb9fb5
update tests
fivetran-catfritz Oct 16, 2025
6e48fb4
fixes
fivetran-catfritz Oct 17, 2025
0847422
changelog
fivetran-catfritz Oct 17, 2025
b80a7d6
update docs
fivetran-catfritz Oct 17, 2025
5ee26fb
Generate dbt docs via GitHub Actions
github-actions[bot] Oct 17, 2025
66af67a
update tests
fivetran-catfritz Oct 20, 2025
d374b1c
update tests
fivetran-catfritz Oct 20, 2025
49031b5
changelog
fivetran-catfritz Oct 20, 2025
b866815
one more test update
fivetran-catfritz Oct 20, 2025
5393797
update union_connections
fivetran-catfritz Oct 21, 2025
fd38316
update union_connections
fivetran-catfritz Oct 21, 2025
1315d4a
update union_connections
fivetran-catfritz Oct 22, 2025
5efbf7a
update github_union_connections
fivetran-catfritz Oct 22, 2025
210b65c
put back the thing
fivetran-catfritz Oct 22, 2025
a66527a
Apply suggestions from code review
fivetran-catfritz Oct 22, 2025
453f703
update source enablement
fivetran-catfritz Oct 22, 2025
b3f000c
changelog
fivetran-catfritz Oct 23, 2025
7712c9c
changelog
fivetran-catfritz Oct 23, 2025
5aab18a
Apply suggestions from code review
fivetran-catfritz Oct 23, 2025
05653d2
update src configs
fivetran-catfritz Oct 23, 2025
6139bcd
Update CHANGELOG.md
fivetran-catfritz Oct 24, 2025
a5f4825
formatting
fivetran-catfritz Oct 27, 2025
b2ee9bf
Merge branch 'feature/union-data' of https://github.com/fivetran/dbt_…
fivetran-catfritz Oct 27, 2025
7ec576b
Generate dbt docs via GitHub Actions
github-actions[bot] Oct 27, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -71,3 +71,5 @@ env/
env.bak/
venv/
venv.bak/

CLAUDE.md
19 changes: 19 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,22 @@
# dbt_github v1.1.0

## Schema/Data Change
**1 total change • 0 possible breaking changes**

| Data Model(s) | Change type | Old | New | Notes |
| ------------- | ----------- | ----| --- | ----- |
| All models | New column | | `source_relation` | Identifies the source connection when using multiple GitHub connections |

## Feature Update
- **Union Data Functionality**: This release supports running the package on multiple GitHub source connections. See the [README](https://github.com/fivetran/dbt_github/tree/main?tab=readme-ov-file#step-3-define-database-and-schema-variables) for details on how to leverage this feature.

## Tests Update
- Removes uniqueness tests. The new unioning feature requires combination-of-column tests to consider the new `source_relation` column in addition to the existing primary key, but this is not supported across dbt versions.
- These tests will be reintroduced once a version-agnostic solution is available.

## Under the Hood
- Updates consistency tests to enable dynamic column exclusion.

# dbt_github v1.0.0

[PR #67](https://github.com/fivetran/dbt_github/pull/67) includes the following updates:
Expand Down
63 changes: 61 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,19 +60,78 @@ Include the following github package version in your `packages.yml` file.
```yaml
packages:
- package: fivetran/github
version: [">=1.0.0", "<1.1.0"] # we recommend using ranges to capture non-breaking changes automatically
version: [">=1.1.0", "<1.2.0"] # we recommend using ranges to capture non-breaking changes automatically
```

> All required sources and staging models are now bundled into this transformation package. Do not include `fivetran/github_source` in your `packages.yml` since this package has been deprecated.

### Step 3: Define database and schema variables

#### Option A: Single connection
By default, this package runs using your [destination](https://docs.getdbt.com/docs/running-a-dbt-project/using-the-command-line-interface/configure-your-profile) and the `github` schema. If this is not where your GitHub data is (for example, if your github schema is named `github_fivetran`), add the following configuration to your root `dbt_project.yml` file:

```yml
vars:
github:
github_database: your_database_name
github_schema: your_schema_name
github_schema: your_schema_name
```

#### Option B: Union multiple connections
If you have multiple GitHub connections in Fivetran and would like to use this package on all of them simultaneously, we have provided functionality to do so. For each source table, the package will union all of the data together and pass the unioned table into the transformations. The `source_relation` column in each model indicates the origin of each record.

To use this functionality, you will need to set the `github_sources` variable in your root `dbt_project.yml` file:

```yml
# dbt_project.yml

vars:
github:
github_sources:
- database: connection_1_destination_name # Required
schema: connection_1_schema_name # Required
name: connection_1_source_name # Required only if following the step in the following subsection

- database: connection_2_destination_name
schema: connection_2_schema_name
name: connection_2_source_name
```

##### Recommended: Incorporate unioned sources into DAG
> *If you are running the package through [Fivetran Transformations for dbt Core™](https://fivetran.com/docs/transformations/dbt#transformationsfordbtcore), the below step is necessary in order to synchronize model runs with your GitHub connections. Alternatively, you may choose to run the package through Fivetran [Quickstart](https://fivetran.com/docs/transformations/quickstart), which would create separate sets of models for each GitHub source rather than one set of unioned models.*

By default, this package defines one single-connection source, called `github`, which will be disabled if you are unioning multiple connections. This means that your DAG will not include your GitHub sources, though the package will run successfully.

To properly incorporate all of your GitHub connections into your project's DAG:
1. Define each of your sources in a `.yml` file in your project. Utilize the following template for the `source`-level configurations, and, **most importantly**, copy and paste the table and column-level definitions from the package's `src_github.yml` [file](https://github.com/fivetran/dbt_github/blob/main/models/staging/src_github.yml).

```yml
# a .yml file in your root project

version: 2

sources:
- name: <name> # ex: Should match name in github_sources
schema: <schema_name>
database: <database_name>
loader: fivetran
config:
loaded_at_field: _fivetran_synced
freshness: # feel free to adjust to your liking
warn_after: {count: 72, period: hour}
error_after: {count: 168, period: hour}

tables: # copy and paste from github/models/staging/src_github.yml - see https://support.atlassian.com/bitbucket-cloud/docs/yaml-anchors/ for how to use anchors to only do so once
```

> **Note**: If there are source tables you do not have (see [Step 4](https://github.com/fivetran/dbt_github?tab=readme-ov-file#step-4-disable-models-for-non-existent-sources)), you may still include them, as long as you have set the right variables to `False`.

2. Set the `has_defined_sources` variable (scoped to the `github` package) to `True`, like such:
```yml
# dbt_project.yml
vars:
github:
has_defined_sources: true
```

### Step 4: Disable models for non-existent sources
Expand Down
2 changes: 1 addition & 1 deletion dbt_project.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
config-version: 2
name: 'github'
version: '1.0.0'
version: '1.1.0'
require-dbt-version: [">=1.3.0", "<2.0.0"]
models:
github:
Expand Down
2 changes: 1 addition & 1 deletion docs/catalog.json

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion docs/manifest.json

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion integration_tests/dbt_project.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
name: 'github_integration_tests'
version: '1.0.0'
version: '1.1.0'
config-version: 2
profile: 'integration_tests'
vars:
Expand Down
9 changes: 7 additions & 2 deletions integration_tests/tests/consistency_issues.sql
Original file line number Diff line number Diff line change
Expand Up @@ -3,15 +3,20 @@
enabled=var('fivetran_validation_tests_enabled', false)
) }}

-- the differences in prod/dev run times will lead to discrepancies because it leverages current_timestamp, and string aggs don't always have the same order
{% set exclude_cols = ['days_issue_open', 'labels', 'repository_team_names', 'assignees']
+ var('gh_consistency_exclude_columns', [])
%}

-- this test ensures the github__issues end model matches the prior version
with prod as (
select * except(days_issue_open, labels, repository_team_names, assignees) --the differences in prod/dev run times will lead to discrepancies because it leverages current_timestamp, and string aggs don't always have the same order
select {{ dbt_utils.star(from=ref('github__issues'), except=exclude_cols) }}
from {{ target.schema }}_github_prod.github__issues
where date(updated_at) < date({{ dbt.current_timestamp() }})
),

dev as (
select * except(days_issue_open, labels, repository_team_names, assignees) --the differences in prod/dev run times will lead to slight discrepancies because it leverages current_timestamp, and string aggs don't always order the same
select {{ dbt_utils.star(from=ref('github__issues'), except=exclude_cols) }}
from {{ target.schema }}_github_dev.github__issues
where date(updated_at) < date({{ dbt.current_timestamp() }})
),
Expand Down
11 changes: 8 additions & 3 deletions integration_tests/tests/consistency_pull_requests.sql
Original file line number Diff line number Diff line change
Expand Up @@ -3,16 +3,21 @@
enabled=var('fivetran_validation_tests_enabled', false)
) }}

-- the differences in prod/dev run times will lead to discrepancies because these fields leverages current_timestamp
{% set exclude_cols = ['days_issue_open', 'hours_request_review_to_first_review', 'hours_request_review_to_first_action', 'hours_request_review_to_merge', 'assignees']
+ var('gh_consistency_exclude_columns', [])
%}

-- this test ensures the github__pull_requests end model matches the prior version
with prod as (
select * except(days_issue_open, hours_request_review_to_first_review, hours_request_review_to_first_action, hours_request_review_to_merge) --the differences in prod/dev run times will lead to discrepancies because these fields leverages current_timestamp
select {{ dbt_utils.star(from=ref('github__pull_requests'), except=exclude_cols) }}
from {{ target.schema }}_github_prod.github__pull_requests
where date(updated_at) < date({{ dbt.current_timestamp() }})
),

dev as (
select * except(days_issue_open, hours_request_review_to_first_review, hours_request_review_to_first_action, hours_request_review_to_merge) --the differences in prod/dev run times will lead to discrepancies because these fields leverages current_timestamp
from {{ target.schema }}_github_prod.github__pull_requests
select {{ dbt_utils.star(from=ref('github__pull_requests'), except=exclude_cols) }}
from {{ target.schema }}_github_dev.github__pull_requests
where date(updated_at) < date({{ dbt.current_timestamp() }})
),

Expand Down
15 changes: 15 additions & 0 deletions macros/union/apply_source_relation.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
{% macro apply_source_relation() -%}

{{ adapter.dispatch('apply_source_relation', 'github') () }}

{%- endmacro %}

{% macro default__apply_source_relation() -%}

{% if var('github_sources', []) != [] %}
, _dbt_source_relation as source_relation
{% else %}
, '{{ var("github_database", target.database) }}' || '.'|| '{{ var("github_schema", "github") }}' as source_relation
{% endif %}

{%- endmacro %}
94 changes: 94 additions & 0 deletions macros/union/github_union_connections.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
{% macro github_union_connections(connection_dictionary, single_source_name, single_table_name, default_identifier=single_table_name) %}

{{ adapter.dispatch('github_union_connections', 'github') (connection_dictionary, single_source_name, single_table_name, default_identifier) }}

{%- endmacro %}

{% macro default__github_union_connections(connection_dictionary, single_source_name, single_table_name, default_identifier=single_table_name) %}

{%- set exception_warning = "\n\nPlease be aware: The " ~ single_source_name|upper ~ "." ~ single_table_name|upper ~ " table was not found in your schema(s). The Fivetran Data Model will create a completely empty staging model as to not break downstream transformations. To turn off these warnings, set the `fivetran__remove_empty_table_warnings` variable to TRUE (see https://github.com/fivetran/dbt_fivetran_utils/tree/releases/v0.4.latest#union_data-source for details).\n"%}
{%- set using_empty_table_warnings = (execute and not var('fivetran__remove_empty_table_warnings', false)) %}
{%- set connections = var(connection_dictionary, []) %}
{%- set using_unioning = connections | length > 0 %}

{%- if using_unioning %}
{# For unioning #}
{%- set relations = [] -%}
{%- for connection in connections -%}

{% if var('has_defined_sources', false) %}
{%- set database = source(connection.name, single_table_name).database %}
{%- set schema = source(connection.name, single_table_name).schema %}
{%- set identifier = source(connection.name, single_table_name).identifier %}
{%- else %}
{%- set database = connection.database if connection.database else target.database %}
{%- set schema = connection.schema if connection.schema else single_source_name %}
{%- set identifier = default_identifier %}
{%- endif %}

{%- set relation=adapter.get_relation(
database=database,
schema=schema,
identifier=identifier
)
-%}

{%- if relation is not none -%}
{%- do relations.append(relation) -%}
{%- endif -%}

-- ** Values passed to adapter.get_relation:
{{ '-- full-identifier_var: ' ~ identifier_var }}
{{ '-- database: ' ~ database }}
{{ '-- schema: ' ~ schema }}
{{ '-- identifier: ' ~ identifier ~ '\n' }}

{%- endfor -%}

{%- if relations != [] -%}
{{ github.github_union_relations(relations) }}

{%- else -%}
{{ exceptions.warn(exception_warning) if using_empty_table_warnings }}

select
cast(null as {{ dbt.type_string() }}) as _dbt_source_relation
limit {{ '0' if target.type != 'redshift' else '1' }}
{%- endif -%}

{% else %}
{# Not unioning #}

{% set identifier_var = single_source_name + "_" + single_table_name + "_identifier"%}
{%- set database = source(single_source_name, single_table_name).database %}
{%- set schema = source(single_source_name, single_table_name).schema %}
{%- set identifier = source(single_source_name, single_table_name).identifier %}

{%- set relation=adapter.get_relation(
database=database,
schema=schema,
identifier=identifier
)
-%}

-- ** Values passed to adapter.get_relation:
{{ '-- full-identifier_var: ' ~ identifier_var }}
{{ '-- database: ' ~ database }}
{{ '-- schema: ' ~ schema }}
{{ '-- identifier: ' ~ identifier ~ '\n' }}

{% if relation is not none -%}
select
{{ dbt_utils.star(from=source(single_source_name, single_table_name)) }}
from {{ source(single_source_name, single_table_name) }} as source_table

{% else %}
{{ exceptions.warn(exception_warning) if using_empty_table_warnings }}

select
cast(null as {{ dbt.type_string() }}) as _dbt_source_relation
limit {{ '0' if target.type != 'redshift' else '1' }}
{%- endif -%}
{% endif -%}

{%- endmacro %}
Loading