fix: snapshot SaasTemplateDataset baseline before batch template update loop#7578
fix: snapshot SaasTemplateDataset baseline before batch template update loop#7578
Conversation
…te loop When update_existing_connection_configs_for_connector_type processes multiple connection configs of the same type, each call to upsert_dataset_config_from_template was reading and then writing SaasTemplateDataset mid-loop. The first iteration's write would set the stored baseline to the new template, so subsequent iterations used the new template as their merge baseline instead of the old one — causing the 3-way merge to misidentify unchanged customer fields as customer modifications and silently skip applying template updates on those configs. Fix reads the stored baseline once before the loop and passes it as an optional snapshot parameter through update_saas_instance → upsert_dataset_config_from_template. All iterations now merge against the same correct pre-update baseline. The write-back on each iteration is idempotent (all write the same value) so it is left in place. Affects any deployment with multiple connection configs of the same SaaS connector type, regardless of whether they belong to the same system or different systems. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
1 Skipped Deployment
|
📝 WalkthroughWalkthroughCapture the stored dataset JSON snapshot from Changes
Sequence Diagram(s)sequenceDiagram
actor Scheduler
participant ConnSvc as ConnectionService
participant Updater as saas_config_updater.py
participant DB as Database/SaasTemplateDataset
participant Merger as merge_datasets
Scheduler->>ConnSvc: update_existing_connection_configs_for_connector_type(connector_type, template)
ConnSvc->>DB: read SaasTemplateDataset (snapshot)
DB-->>ConnSvc: stored_dataset_json
ConnSvc->>ConnSvc: for each config -> call update_saas_instance(..., stored_dataset_json)
ConnSvc->>Updater: update_saas_instance(..., stored_dataset_json)
Updater->>Merger: upsert_dataset_config_from_template(..., stored_dataset_json)
Merger-->>Updater: merged dataset (uses provided stored_dataset_json)
Updater->>DB: persist updated DatasetConfig
DB-->>ConnSvc: ack
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Comment |
…loop update_saas_configs (run on every server startup) had the same mid-loop SaasTemplateDataset write corruption as update_existing_connection_configs_for_ connector_type. Snapshot the stored baseline before the per-connector-type loop and pass it through to update_saas_instance. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Greptile SummaryThis PR fixes a real correctness bug in the SaaS template batch-update path: when multiple connection configs of the same connector type are updated together, Key issues found:
Confidence Score: 3/5
Important Files Changed
|
There was a problem hiding this comment.
🧹 Nitpick comments (2)
tests/service/test_connection_service.py (1)
1680-1794: Good test coverage for the baseline snapshot fix.The test correctly validates that both connection configs receive the template update when processed in a batch. This directly exercises the fix for the baseline corruption issue.
Consider adding cleanup for created test resources.
Other tests in this file (e.g.,
test_upsert_dataset_with_mergeat line 926) explicitly clean up created resources. This test creates twoConnectionConfigand twoDatasetConfigobjects but doesn't delete them, which could leave test data in the database and potentially affect other tests.🧹 Suggested cleanup addition
assert "products" in collection_names, ( f"Template update did not land on {instance_key}: " f"collections={collection_names}" ) + + # Clean up + for suffix in ("alpha", "beta"): + instance_key = f"multi_config_{suffix}" + connection_config = ConnectionConfig.get_by( + db, field="key", value=f"multi_config_connection_{suffix}" + ) + if connection_config: + connection_config.delete(db)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/service/test_connection_service.py` around lines 1680 - 1794, This test creates two ConnectionConfig and two DatasetConfig records (via ConnectionConfig.create and DatasetConfig.create_or_update) and a SaasTemplateDataset (via SaasTemplateDataset.get_or_create) but doesn't remove them; add cleanup at the end of test_update_existing_connection_configs_preserves_merge_baseline_across_multiple_configs to delete the created resources by filtering for the instance fides_key values ("multi_config_alpha", "multi_config_beta") and the connector_type and removing the matching DatasetConfig and ConnectionConfig records and the SaasTemplateDataset so subsequent tests are not affected.src/fides/service/connection/connection_service.py (1)
847-859: Add.lower()to snapshot lookup for consistency with case-insensitive connection config query.The connection config query uses
connector_type.lower()(line 837), but the snapshot retrieval usesconnector_typedirectly (line 852). AlthoughSaaSConfigenforces lowercase normalization on the type field, adding.lower()here improves consistency with the defensive case-insensitive pattern already established for the connection config query.🔧 Suggested fix
stored_dataset_template = SaasTemplateDataset.get_by( - self.db, field="connection_type", value=connector_type + self.db, field="connection_type", value=connector_type.lower() )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/fides/service/connection/connection_service.py` around lines 847 - 859, The snapshot lookup should use a lowercased connector_type to match the case-insensitive query earlier: change the SaasTemplateDataset.get_by call that assigns stored_dataset_template to pass connector_type.lower() instead of connector_type, and keep the subsequent stored_dataset_json_snapshot logic unchanged so the snapshot remains a dict when present; update references to connector_type in that get_by call only (SaasTemplateDataset.get_by, stored_dataset_template, stored_dataset_json_snapshot).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@src/fides/service/connection/connection_service.py`:
- Around line 847-859: The snapshot lookup should use a lowercased
connector_type to match the case-insensitive query earlier: change the
SaasTemplateDataset.get_by call that assigns stored_dataset_template to pass
connector_type.lower() instead of connector_type, and keep the subsequent
stored_dataset_json_snapshot logic unchanged so the snapshot remains a dict when
present; update references to connector_type in that get_by call only
(SaasTemplateDataset.get_by, stored_dataset_template,
stored_dataset_json_snapshot).
In `@tests/service/test_connection_service.py`:
- Around line 1680-1794: This test creates two ConnectionConfig and two
DatasetConfig records (via ConnectionConfig.create and
DatasetConfig.create_or_update) and a SaasTemplateDataset (via
SaasTemplateDataset.get_or_create) but doesn't remove them; add cleanup at the
end of
test_update_existing_connection_configs_preserves_merge_baseline_across_multiple_configs
to delete the created resources by filtering for the instance fides_key values
("multi_config_alpha", "multi_config_beta") and the connector_type and removing
the matching DatasetConfig and ConnectionConfig records and the
SaasTemplateDataset so subsequent tests are not affected.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: bdd5a19d-0fc1-417a-868e-d288488c4113
📒 Files selected for processing (3)
src/fides/api/util/saas_config_updater.pysrc/fides/service/connection/connection_service.pytests/service/test_connection_service.py
- Make test actually exercise the bug by introducing a new "orders" collection in v2 that is absent from v1, so the assertion is not trivially satisfied - Add test cleanup for created ConnectionConfig, DatasetConfig, and SaasTemplateDataset resources - Use connector_type.lower() in snapshot lookup for consistency with case-insensitive connection config query Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/fides/service/connection/connection_service.py (1)
701-824:⚠️ Potential issue | 🟠 MajorBaseline can still advance on partial batch failures.
update_existing_connection_configs_for_connector_typecontinues after exceptions (Line 877), butupsert_dataset_config_from_templatestill persistsSaasTemplateDataseton each successful iteration (Line 812). If one config fails mid-batch, the shared baseline is already moved forward, and retrying failed configs can merge against the wrong baseline.🛠 Suggested fix direction
diff --git a/src/fides/service/connection/connection_service.py b/src/fides/service/connection/connection_service.py @@ - def upsert_dataset_config_from_template( + def upsert_dataset_config_from_template( self, connection_config: ConnectionConfig, template: ConnectorTemplate, template_values: SaasConnectionTemplateValues, stored_dataset_json: Optional[dict] = None, + persist_template_baseline: bool = True, ) -> DatasetConfig: @@ - if stored_dataset_template: + if persist_template_baseline and stored_dataset_template: stored_dataset_template.dataset_json = yaml.safe_load(template.dataset)[ "dataset" ][0] stored_dataset_template.save(db=self.db) @@ - def update_saas_instance( + def update_saas_instance( self, connection_config: ConnectionConfig, template: ConnectorTemplate, saas_config_instance: SaaSConfig, stored_dataset_json: Optional[dict] = None, + persist_template_baseline: bool = True, ) -> None: @@ self.upsert_dataset_config_from_template( connection_config, template, template_vals, stored_dataset_json=stored_dataset_json, + persist_template_baseline=persist_template_baseline, ) @@ - for connection_config in connection_configs: + all_updates_succeeded = True + for connection_config in connection_configs: ... try: self.update_saas_instance( connection_config, template, saas_config_instance, stored_dataset_json=stored_dataset_json_snapshot, + persist_template_baseline=False, ) except Exception: + all_updates_succeeded = False logger.exception(...) + + if all_updates_succeeded and stored_dataset_template: + stored_dataset_template.dataset_json = yaml.safe_load(template.dataset)["dataset"][0] + stored_dataset_template.save(db=self.db)Also applies to: 847-881
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/fides/service/connection/connection_service.py` around lines 701 - 824, The bug is that upsert_dataset_config_from_template persists SaasTemplateDataset (stored_dataset_template.dataset_json and save) on each successful iteration, advancing the shared baseline during batch updates in update_existing_connection_configs_for_connector_type; to fix, avoid mutating/persisting stored_dataset_template inside upsert_dataset_config_from_template when a pre-read snapshot is supplied: if stored_dataset_json is not None (the batch caller provided the baseline), skip the block that assigns stored_dataset_template.dataset_json and calls stored_dataset_template.save(db=self.db), so the shared baseline only changes when callers intentionally allow a DB write (or add an explicit flag to control persistence) — update the logic around stored_dataset_template and its save call in upsert_dataset_config_from_template to respect stored_dataset_json.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@src/fides/service/connection/connection_service.py`:
- Around line 701-824: The bug is that upsert_dataset_config_from_template
persists SaasTemplateDataset (stored_dataset_template.dataset_json and save) on
each successful iteration, advancing the shared baseline during batch updates in
update_existing_connection_configs_for_connector_type; to fix, avoid
mutating/persisting stored_dataset_template inside
upsert_dataset_config_from_template when a pre-read snapshot is supplied: if
stored_dataset_json is not None (the batch caller provided the baseline), skip
the block that assigns stored_dataset_template.dataset_json and calls
stored_dataset_template.save(db=self.db), so the shared baseline only changes
when callers intentionally allow a DB write (or add an explicit flag to control
persistence) — update the logic around stored_dataset_template and its save call
in upsert_dataset_config_from_template to respect stored_dataset_json.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 9acbe206-0abe-4d5b-b980-b8cf454ce6cd
📒 Files selected for processing (2)
src/fides/service/connection/connection_service.pytests/service/test_connection_service.py
Summary
SaasTemplateDatasetstores one row per connector type as the merge baseline for the 3-way dataset merge (customer edits vs old template vs new template)update_existing_connection_configs_for_connector_typeloops over all connection configs of a given type and callsupsert_dataset_config_from_templatefor each — which both reads and writesSaasTemplateDataseton every iterationFix: read
SaasTemplateDatasetonce before the loop and pass it as an optionalstored_dataset_jsonsnapshot throughupdate_saas_instance→upsert_dataset_config_from_template. All iterations merge against the same correct pre-update baseline. Single-integration callers (instantiate_connection) are unchanged — they pass nothing and the existing read-from-DB path applies.Changes
upsert_dataset_config_from_template: new optionalstored_dataset_jsonparam; uses it as merge baseline when provided, otherwise readsSaasTemplateDatasetas beforeupdate_saas_instance: threads the optionalstored_dataset_jsonparam throughupdate_existing_connection_configs_for_connector_type: snapshotsSaasTemplateDatasetonce before the loop, passes snapshot to each iterationTest plan
test_update_existing_connection_configs_preserves_merge_baseline_across_multiple_configs— creates two connection configs of the same type, runs the batch update, and asserts both received the template update (fails without the fix on whichever config is processed second)test_upsert_dataset_with_mergeandtest_update_existing_connection_configs_case_insensitivecontinue to passSummary by CodeRabbit
Bug Fixes
Tests