Skip to content

fix: snapshot SaasTemplateDataset baseline before batch template update loop#7578

Open
Linker44 wants to merge 4 commits intomainfrom
fix/saas-template-dataset-merge-baseline-snapshot
Open

fix: snapshot SaasTemplateDataset baseline before batch template update loop#7578
Linker44 wants to merge 4 commits intomainfrom
fix/saas-template-dataset-merge-baseline-snapshot

Conversation

@Linker44
Copy link
Contributor

@Linker44 Linker44 commented Mar 5, 2026

Summary

  • SaasTemplateDataset stores one row per connector type as the merge baseline for the 3-way dataset merge (customer edits vs old template vs new template)
  • update_existing_connection_configs_for_connector_type loops over all connection configs of a given type and calls upsert_dataset_config_from_template for each — which both reads and writes SaasTemplateDataset on every iteration
  • The first iteration's write bumps the stored baseline to the new template, so all subsequent iterations use the new template as their baseline instead of the old one
  • Result: the 3-way merge incorrectly treats unchanged customer fields as customer modifications and silently skips applying template updates on those configs
  • Affects any deployment with multiple connection configs of the same SaaS connector type, whether on the same system or different systems

Fix: read SaasTemplateDataset once before the loop and pass it as an optional stored_dataset_json snapshot through update_saas_instanceupsert_dataset_config_from_template. All iterations merge against the same correct pre-update baseline. Single-integration callers (instantiate_connection) are unchanged — they pass nothing and the existing read-from-DB path applies.

Changes

  • upsert_dataset_config_from_template: new optional stored_dataset_json param; uses it as merge baseline when provided, otherwise reads SaasTemplateDataset as before
  • update_saas_instance: threads the optional stored_dataset_json param through
  • update_existing_connection_configs_for_connector_type: snapshots SaasTemplateDataset once before the loop, passes snapshot to each iteration

Test plan

  • New test test_update_existing_connection_configs_preserves_merge_baseline_across_multiple_configs — creates two connection configs of the same type, runs the batch update, and asserts both received the template update (fails without the fix on whichever config is processed second)
  • Existing test_upsert_dataset_with_merge and test_update_existing_connection_configs_case_insensitive continue to pass

Summary by CodeRabbit

  • Bug Fixes

    • Prevented earlier changes from affecting later configs during batch updates of SaaS connectors by ensuring all items merge against a consistent pre-update baseline.
  • Tests

    • Added a test verifying batch updates preserve the same merge baseline across multiple connector configurations.

…te loop

When update_existing_connection_configs_for_connector_type processes multiple
connection configs of the same type, each call to upsert_dataset_config_from_template
was reading and then writing SaasTemplateDataset mid-loop. The first iteration's write
would set the stored baseline to the new template, so subsequent iterations used the
new template as their merge baseline instead of the old one — causing the 3-way merge
to misidentify unchanged customer fields as customer modifications and silently skip
applying template updates on those configs.

Fix reads the stored baseline once before the loop and passes it as an optional
snapshot parameter through update_saas_instance → upsert_dataset_config_from_template.
All iterations now merge against the same correct pre-update baseline. The write-back
on each iteration is idempotent (all write the same value) so it is left in place.

Affects any deployment with multiple connection configs of the same SaaS connector type,
regardless of whether they belong to the same system or different systems.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@vercel
Copy link
Contributor

vercel bot commented Mar 5, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
fides-plus-nightly Ready Ready Preview, Comment Mar 6, 2026 7:28pm
1 Skipped Deployment
Project Deployment Actions Updated (UTC)
fides-privacy-center Ignored Ignored Mar 6, 2026 7:28pm

Request Review

@coderabbitai
Copy link

coderabbitai bot commented Mar 5, 2026

📝 Walkthrough

Walkthrough

Capture the stored dataset JSON snapshot from SaasTemplateDataset and thread it through ConnectionService update calls so batch updates use a consistent pre-update baseline for 3-way merges, preventing first-iteration changes from corrupting subsequent iterations.

Changes

Cohort / File(s) Summary
Baseline snapshot extraction
src/fides/api/util/saas_config_updater.py
Compute stored_dataset_json from SaasTemplateDataset.dataset_json (dict) and pass it into the update chain instead of letting later reads overwrite the baseline.
Snapshot threading & merge logic
src/fides/service/connection/connection_service.py
Add optional stored_dataset_json parameter to update_saas_instance and upsert_dataset_config_from_template; prefer this effective snapshot when computing merges; snapshot stored dataset once before batch loops and forward it to per-config updates.
Tests for baseline preservation
tests/service/test_connection_service.py
Add test_update_existing_connection_configs_preserves_merge_baseline_across_multiple_configs to verify multiple configs merge against the same pre-update baseline and do not leak mid-loop changes.

Sequence Diagram(s)

sequenceDiagram
    actor Scheduler
    participant ConnSvc as ConnectionService
    participant Updater as saas_config_updater.py
    participant DB as Database/SaasTemplateDataset
    participant Merger as merge_datasets

    Scheduler->>ConnSvc: update_existing_connection_configs_for_connector_type(connector_type, template)
    ConnSvc->>DB: read SaasTemplateDataset (snapshot)
    DB-->>ConnSvc: stored_dataset_json
    ConnSvc->>ConnSvc: for each config -> call update_saas_instance(..., stored_dataset_json)
    ConnSvc->>Updater: update_saas_instance(..., stored_dataset_json)
    Updater->>Merger: upsert_dataset_config_from_template(..., stored_dataset_json)
    Merger-->>Updater: merged dataset (uses provided stored_dataset_json)
    Updater->>DB: persist updated DatasetConfig
    DB-->>ConnSvc: ack
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 I froze a dataset snapshot before the loop began,
So merges hop true across each config in the span.
No first-hop mess spreads — each update stays neat,
Baselines preserved, every merge complete. ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Description check ❓ Inconclusive The description covers the problem, solution, code changes, and test plan, but the pre-merge checklist section is not filled out (checkboxes remain unchecked). Complete the pre-merge checklist items (CI pipelines, CHANGELOG.md update, UX feedback, followup issues, database migrations, documentation) by checking appropriate boxes or marking items as not applicable.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately and concisely describes the main fix: snapshotting SaasTemplateDataset before a batch loop to preserve the merge baseline.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/saas-template-dataset-merge-baseline-snapshot

Comment @coderabbitai help to get the list of available commands and usage tips.

@Linker44 Linker44 requested a review from galvana March 5, 2026 19:32
@Linker44 Linker44 self-assigned this Mar 5, 2026
…loop

update_saas_configs (run on every server startup) had the same mid-loop
SaasTemplateDataset write corruption as update_existing_connection_configs_for_
connector_type. Snapshot the stored baseline before the per-connector-type loop
and pass it through to update_saas_instance.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@Linker44 Linker44 marked this pull request as ready for review March 5, 2026 19:39
@Linker44 Linker44 requested a review from a team as a code owner March 5, 2026 19:39
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 5, 2026

Greptile Summary

This PR fixes a real correctness bug in the SaaS template batch-update path: when multiple connection configs of the same connector type are updated together, upsert_dataset_config_from_template was writing the new template back to SaasTemplateDataset on every iteration, causing every iteration after the first to use the new template as the 3-way merge baseline instead of the old one. The fix — snapshot the old baseline once before the loop and thread it through update_saas_instanceupsert_dataset_config_from_template — is logically sound, and the docstrings explain the reason clearly.

Key issues found:

  • Weak regression test (tests/service/test_connection_service.py, lines 1697–1794): The new test uses stored_dataset (which already contains a "products" collection) as both the v1 baseline and the customer data, and _get_template_yaml() (also containing "products") as the v2 template. Because the customer data already has "products", the assertion assert "products" in collection_names is trivially satisfied on both configs even without the snapshot fix. The test should introduce a new collection (e.g., "orders") in v2 that is absent from v1 to actually detect the baseline corruption.

  • Redundant DB query on every batch iteration (connection_service.py, lines 738–744): upsert_dataset_config_from_template unconditionally reads SaasTemplateDataset from the database even when a non-None stored_dataset_json snapshot is already provided by the caller. On large batches this adds a pointless SELECT per connection config.

Confidence Score: 3/5

  • The production fix is logically correct, but the new test does not actually detect the bug it was written to guard against, leaving no regression coverage.
  • The snapshot mechanism in both saas_config_updater.py and connection_service.py is correct: merge_datasets deep-copies its stored_dataset argument so the shared snapshot cannot be mutated across iterations, and the write-back reassigns the attribute rather than mutating in place. The critical weakness is the test — since both v1 and v2 templates have "products" and the customer dataset also already has "products", the assertion trivially passes without the fix, providing false confidence. A proper test should demonstrate a collection (or field) that appears in v2 but not in v1, verifying that the second config in the batch receives the new template content.
  • tests/service/test_connection_service.py — the new test needs to be redesigned so that the assertion actually fails without the snapshot fix applied.

Important Files Changed

Filename Overview
src/fides/api/util/saas_config_updater.py Correctly snapshots the SaasTemplateDataset baseline before the loop and passes it to each update_saas_instance call. The get_or_create return value is properly unpacked to retrieve the stored template for snapshotting.
src/fides/service/connection/connection_service.py Core fix is sound: snapshots the baseline before the loop in update_existing_connection_configs_for_connector_type and threads it through update_saas_instance → upsert_dataset_config_from_template. Minor issue: upsert_dataset_config_from_template still performs a DB read of SaasTemplateDataset even when the caller provides a non-None snapshot, which is an unnecessary query on every batch iteration.
tests/service/test_connection_service.py New test has a design flaw: the stored_dataset fixture already contains a "products" collection identical to what _get_template_yaml() produces at the collection level. The assertion "products" in collection_names passes trivially (customer data already has "products") and does not actually detect the baseline corruption bug being fixed.

Comments Outside Diff (1)

  1. src/fides/service/connection/connection_service.py, line 738-744 (link)

    Unnecessary DB read when snapshot is already provided

    stored_dataset_template is always fetched from the database here, even when stored_dataset_json is a non-None pre-read snapshot and stored_dataset_template will not be used for the merge. On every iteration of a batch update this issues an extra SELECT that provides no benefit.

    Consider guarding the read behind the stored_dataset_json is None check, or only reading when connector_type is available and the snapshot was not provided:

    stored_dataset_template = (
        SaasTemplateDataset.get_by(
            self.db, field="connection_type", value=connector_type
        )
        if connector_type and stored_dataset_json is None
        else (
            SaasTemplateDataset.get_by(
                self.db, field="connection_type", value=connector_type
            )
            if connector_type
            else None
        )
    )

    Or more simply, since stored_dataset_template is only used to (a) supply a fallback merge baseline and (b) write back the new template at the end, the read can be skipped for the merge path but is still needed for the write-back at line 812. A cleaner refactor would separate those two concerns.

Last reviewed commit: 0f552ab

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
tests/service/test_connection_service.py (1)

1680-1794: Good test coverage for the baseline snapshot fix.

The test correctly validates that both connection configs receive the template update when processed in a batch. This directly exercises the fix for the baseline corruption issue.

Consider adding cleanup for created test resources.

Other tests in this file (e.g., test_upsert_dataset_with_merge at line 926) explicitly clean up created resources. This test creates two ConnectionConfig and two DatasetConfig objects but doesn't delete them, which could leave test data in the database and potentially affect other tests.

🧹 Suggested cleanup addition
             assert "products" in collection_names, (
                 f"Template update did not land on {instance_key}: "
                 f"collections={collection_names}"
             )
+
+        # Clean up
+        for suffix in ("alpha", "beta"):
+            instance_key = f"multi_config_{suffix}"
+            connection_config = ConnectionConfig.get_by(
+                db, field="key", value=f"multi_config_connection_{suffix}"
+            )
+            if connection_config:
+                connection_config.delete(db)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/service/test_connection_service.py` around lines 1680 - 1794, This test
creates two ConnectionConfig and two DatasetConfig records (via
ConnectionConfig.create and DatasetConfig.create_or_update) and a
SaasTemplateDataset (via SaasTemplateDataset.get_or_create) but doesn't remove
them; add cleanup at the end of
test_update_existing_connection_configs_preserves_merge_baseline_across_multiple_configs
to delete the created resources by filtering for the instance fides_key values
("multi_config_alpha", "multi_config_beta") and the connector_type and removing
the matching DatasetConfig and ConnectionConfig records and the
SaasTemplateDataset so subsequent tests are not affected.
src/fides/service/connection/connection_service.py (1)

847-859: Add .lower() to snapshot lookup for consistency with case-insensitive connection config query.

The connection config query uses connector_type.lower() (line 837), but the snapshot retrieval uses connector_type directly (line 852). Although SaaSConfig enforces lowercase normalization on the type field, adding .lower() here improves consistency with the defensive case-insensitive pattern already established for the connection config query.

🔧 Suggested fix
         stored_dataset_template = SaasTemplateDataset.get_by(
-            self.db, field="connection_type", value=connector_type
+            self.db, field="connection_type", value=connector_type.lower()
         )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/fides/service/connection/connection_service.py` around lines 847 - 859,
The snapshot lookup should use a lowercased connector_type to match the
case-insensitive query earlier: change the SaasTemplateDataset.get_by call that
assigns stored_dataset_template to pass connector_type.lower() instead of
connector_type, and keep the subsequent stored_dataset_json_snapshot logic
unchanged so the snapshot remains a dict when present; update references to
connector_type in that get_by call only (SaasTemplateDataset.get_by,
stored_dataset_template, stored_dataset_json_snapshot).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/fides/service/connection/connection_service.py`:
- Around line 847-859: The snapshot lookup should use a lowercased
connector_type to match the case-insensitive query earlier: change the
SaasTemplateDataset.get_by call that assigns stored_dataset_template to pass
connector_type.lower() instead of connector_type, and keep the subsequent
stored_dataset_json_snapshot logic unchanged so the snapshot remains a dict when
present; update references to connector_type in that get_by call only
(SaasTemplateDataset.get_by, stored_dataset_template,
stored_dataset_json_snapshot).

In `@tests/service/test_connection_service.py`:
- Around line 1680-1794: This test creates two ConnectionConfig and two
DatasetConfig records (via ConnectionConfig.create and
DatasetConfig.create_or_update) and a SaasTemplateDataset (via
SaasTemplateDataset.get_or_create) but doesn't remove them; add cleanup at the
end of
test_update_existing_connection_configs_preserves_merge_baseline_across_multiple_configs
to delete the created resources by filtering for the instance fides_key values
("multi_config_alpha", "multi_config_beta") and the connector_type and removing
the matching DatasetConfig and ConnectionConfig records and the
SaasTemplateDataset so subsequent tests are not affected.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: bdd5a19d-0fc1-417a-868e-d288488c4113

📥 Commits

Reviewing files that changed from the base of the PR and between e7fa470 and 0f552ab.

📒 Files selected for processing (3)
  • src/fides/api/util/saas_config_updater.py
  • src/fides/service/connection/connection_service.py
  • tests/service/test_connection_service.py

- Make test actually exercise the bug by introducing a new "orders"
  collection in v2 that is absent from v1, so the assertion is not
  trivially satisfied
- Add test cleanup for created ConnectionConfig, DatasetConfig, and
  SaasTemplateDataset resources
- Use connector_type.lower() in snapshot lookup for consistency with
  case-insensitive connection config query

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/fides/service/connection/connection_service.py (1)

701-824: ⚠️ Potential issue | 🟠 Major

Baseline can still advance on partial batch failures.

update_existing_connection_configs_for_connector_type continues after exceptions (Line 877), but upsert_dataset_config_from_template still persists SaasTemplateDataset on each successful iteration (Line 812). If one config fails mid-batch, the shared baseline is already moved forward, and retrying failed configs can merge against the wrong baseline.

🛠 Suggested fix direction
diff --git a/src/fides/service/connection/connection_service.py b/src/fides/service/connection/connection_service.py
@@
-    def upsert_dataset_config_from_template(
+    def upsert_dataset_config_from_template(
         self,
         connection_config: ConnectionConfig,
         template: ConnectorTemplate,
         template_values: SaasConnectionTemplateValues,
         stored_dataset_json: Optional[dict] = None,
+        persist_template_baseline: bool = True,
     ) -> DatasetConfig:
@@
-        if stored_dataset_template:
+        if persist_template_baseline and stored_dataset_template:
             stored_dataset_template.dataset_json = yaml.safe_load(template.dataset)[
                 "dataset"
             ][0]
             stored_dataset_template.save(db=self.db)
@@
-    def update_saas_instance(
+    def update_saas_instance(
         self,
         connection_config: ConnectionConfig,
         template: ConnectorTemplate,
         saas_config_instance: SaaSConfig,
         stored_dataset_json: Optional[dict] = None,
+        persist_template_baseline: bool = True,
     ) -> None:
@@
         self.upsert_dataset_config_from_template(
             connection_config,
             template,
             template_vals,
             stored_dataset_json=stored_dataset_json,
+            persist_template_baseline=persist_template_baseline,
         )
@@
-        for connection_config in connection_configs:
+        all_updates_succeeded = True
+        for connection_config in connection_configs:
             ...
             try:
                 self.update_saas_instance(
                     connection_config,
                     template,
                     saas_config_instance,
                     stored_dataset_json=stored_dataset_json_snapshot,
+                    persist_template_baseline=False,
                 )
             except Exception:
+                all_updates_succeeded = False
                 logger.exception(...)
+
+        if all_updates_succeeded and stored_dataset_template:
+            stored_dataset_template.dataset_json = yaml.safe_load(template.dataset)["dataset"][0]
+            stored_dataset_template.save(db=self.db)

Also applies to: 847-881

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/fides/service/connection/connection_service.py` around lines 701 - 824,
The bug is that upsert_dataset_config_from_template persists SaasTemplateDataset
(stored_dataset_template.dataset_json and save) on each successful iteration,
advancing the shared baseline during batch updates in
update_existing_connection_configs_for_connector_type; to fix, avoid
mutating/persisting stored_dataset_template inside
upsert_dataset_config_from_template when a pre-read snapshot is supplied: if
stored_dataset_json is not None (the batch caller provided the baseline), skip
the block that assigns stored_dataset_template.dataset_json and calls
stored_dataset_template.save(db=self.db), so the shared baseline only changes
when callers intentionally allow a DB write (or add an explicit flag to control
persistence) — update the logic around stored_dataset_template and its save call
in upsert_dataset_config_from_template to respect stored_dataset_json.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@src/fides/service/connection/connection_service.py`:
- Around line 701-824: The bug is that upsert_dataset_config_from_template
persists SaasTemplateDataset (stored_dataset_template.dataset_json and save) on
each successful iteration, advancing the shared baseline during batch updates in
update_existing_connection_configs_for_connector_type; to fix, avoid
mutating/persisting stored_dataset_template inside
upsert_dataset_config_from_template when a pre-read snapshot is supplied: if
stored_dataset_json is not None (the batch caller provided the baseline), skip
the block that assigns stored_dataset_template.dataset_json and calls
stored_dataset_template.save(db=self.db), so the shared baseline only changes
when callers intentionally allow a DB write (or add an explicit flag to control
persistence) — update the logic around stored_dataset_template and its save call
in upsert_dataset_config_from_template to respect stored_dataset_json.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9acbe206-0abe-4d5b-b980-b8cf454ce6cd

📥 Commits

Reviewing files that changed from the base of the PR and between 0f552ab and 61f70c6.

📒 Files selected for processing (2)
  • src/fides/service/connection/connection_service.py
  • tests/service/test_connection_service.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant