ENG-2635: Reconcile set_schema with namespace_meta for Postgres connectors#7510
ENG-2635: Reconcile set_schema with namespace_meta for Postgres connectors#7510JadeCara wants to merge 14 commits intoENG-2635-gcp-postgres-namespace-supportfrom
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub. 2 Skipped Deployments
|
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
When namespace_meta is present, set_schema() is now skipped because table names are already schema-qualified in the generated SQL. The namespace_meta state is stored on the connector instance via query_config() and checked in set_schema(). Also deduplicates get_qualified_table_name into the base SQLConnector. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
61a7a1d to
4c7cc49
Compare
Greptile SummaryThis PR correctly reconciles two conflicting schema-qualification mechanisms for Postgres connectors: the legacy Key changes:
Confidence Score: 4/5
Important Files Changed
|
| class TestPostgreSQLConnectorSetSchema: | ||
| """Tests for set_schema reconciliation with namespace_meta.""" | ||
|
|
||
| @pytest.fixture | ||
| def connector_with_db_schema(self): | ||
| """Connector with db_schema configured in secrets.""" | ||
| config = MagicMock() | ||
| config.secrets = { | ||
| "host": "localhost", | ||
| "port": 5432, | ||
| "dbname": "testdb", | ||
| "username": "user", | ||
| "password": "pass", | ||
| "db_schema": "billing", | ||
| } | ||
| connector = PostgreSQLConnector(configuration=config) | ||
| return connector | ||
|
|
||
| def test_set_schema_skipped_when_namespace_meta_present( | ||
| self, connector_with_db_schema | ||
| ): | ||
| """When namespace_meta is present, set_schema should be a no-op.""" | ||
| connector = connector_with_db_schema | ||
| connector._current_namespace_meta = {"schema": "billing"} | ||
|
|
||
| mock_connection = MagicMock() | ||
| connector.set_schema(mock_connection) | ||
|
|
||
| # Should NOT execute any SQL | ||
| mock_connection.execute.assert_not_called() | ||
|
|
||
| def test_set_schema_runs_when_namespace_meta_absent(self, connector_with_db_schema): | ||
| """When namespace_meta is absent and db_schema is configured, set_schema should run.""" | ||
| connector = connector_with_db_schema | ||
| connector._current_namespace_meta = None | ||
|
|
||
| mock_connection = MagicMock() | ||
| connector.set_schema(mock_connection) | ||
|
|
||
| # Should execute SET search_path | ||
| mock_connection.execute.assert_called_once() | ||
|
|
||
| def test_set_schema_noop_when_no_db_schema_and_no_namespace(self): | ||
| """When neither namespace_meta nor db_schema is configured, set_schema is a no-op.""" | ||
| config = MagicMock() | ||
| config.secrets = { | ||
| "host": "localhost", | ||
| "port": 5432, | ||
| "dbname": "testdb", | ||
| "username": "user", | ||
| "password": "pass", | ||
| } | ||
| connector = PostgreSQLConnector(configuration=config) | ||
| connector._current_namespace_meta = None | ||
|
|
||
| mock_connection = MagicMock() | ||
| connector.set_schema(mock_connection) | ||
|
|
||
| mock_connection.execute.assert_not_called() | ||
|
|
||
| @patch( | ||
| "fides.api.service.connectors.postgres_connector.SQLConnector.get_namespace_meta" | ||
| ) | ||
| def test_query_config_sets_namespace_meta_state(self, mock_get_ns): | ||
| """query_config() should store namespace_meta on the connector instance.""" | ||
| mock_get_ns.return_value = {"schema": "billing"} | ||
|
|
||
| config = MagicMock() | ||
| config.secrets = {"host": "localhost"} | ||
| connector = PostgreSQLConnector(configuration=config) | ||
| assert connector._current_namespace_meta is None | ||
|
|
||
| mock_node = MagicMock() | ||
| mock_node.address.dataset = "test_dataset" | ||
| mock_node.collection.name = "customer" | ||
|
|
||
| connector.query_config(mock_node) | ||
| assert connector._current_namespace_meta == {"schema": "billing"} | ||
|
|
||
| @patch( | ||
| "fides.api.service.connectors.postgres_connector.SQLConnector.get_namespace_meta" | ||
| ) | ||
| def test_query_config_clears_namespace_meta_state(self, mock_get_ns): | ||
| """query_config() should clear namespace_meta when dataset has none.""" | ||
| mock_get_ns.return_value = None | ||
|
|
||
| config = MagicMock() | ||
| config.secrets = {"host": "localhost"} | ||
| connector = PostgreSQLConnector(configuration=config) | ||
| connector._current_namespace_meta = {"schema": "old_value"} | ||
|
|
||
| mock_node = MagicMock() | ||
| mock_node.address.dataset = "test_dataset" | ||
| mock_node.collection.name = "customer" | ||
|
|
||
| connector.query_config(mock_node) | ||
| assert connector._current_namespace_meta is None |
There was a problem hiding this comment.
Missing unit tests for the consolidated get_qualified_table_name base logic
The new TestPostgreSQLConnectorSetSchema class provides solid coverage for the set_schema reconciliation, but there are no automated unit tests for the get_qualified_table_name consolidation in SQLConnector — specifically the three paths manually verified in the PR description:
- No namespace → returns plain
collection_name - Schema only → returns
schema.collection_name - Schema + database_name → returns
database_name.schema.collection_name
The PR description shows these were verified manually, but adding corresponding unit tests would prevent regressions if the base logic is changed in the future. These can be added to the existing TestPostgreSQLConnectorSetSchema class or a separate TestSQLConnectorGetQualifiedTableName class, mocking get_namespace_meta similarly to how the test_query_config_* tests already do.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
…set-schema-reconciliation
…set-schema-reconciliation
…set-schema-reconciliation
…set-schema-reconciliation
RDSPostgresQueryConfig now inherits from QueryStringWithoutTuplesOverrideQueryConfig to avoid wrapping single-value params in tuples. pg8000 (used for RDS IAM auth) does not auto-unpack tuples like psycopg2, causing equality comparisons to silently return 0 rows. Also fixes namespace_meta initialization in the connector's query_config() and registers RDS connectors in task_resources. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…set-schema-reconciliation
…set-schema-reconciliation
…set-schema-reconciliation
…set-schema-reconciliation
…set-schema-reconciliation
…set-schema-reconciliation
Ticket ENG-2635
Description Of Changes
Reconcile
set_schema()withnamespace_metafor Postgres connectors (PR 3 of 3 for ENG-2635).When
namespace_metais present, table names are already schema-qualified in the generated SQL (e.g."billing"."customer"). In that case,set_schema()— which sets PostgreSQL'ssearch_path— should be skipped to avoid conflicts. Previously both mechanisms could run simultaneously, which could cause unexpected behavior ifdb_schemawas also configured.This PR also consolidates the duplicated
get_qualified_table_name()override from three Postgres connectors into the baseSQLConnector, since all three had identical logic. The base implementation usesgetattrto check forschemaanddatabase_nameon the parsed namespace_meta, falling back to the plain collection name when no namespace is configured.RDS Postgres DSR Execution Fixes
This PR also includes fixes to enable end-to-end DSR execution for RDS Postgres connectors:
pg8000 tuple parameter compatibility —
RDSPostgresQueryConfignow inherits fromQueryStringWithoutTuplesOverrideQueryConfig(in addition toPostgresQueryConfig) because the pg8000 driver used by RDS IAM auth does not auto-unpack single-element tuple parameters the way psycopg2 does. Without this fix, equality comparisons likeemail = :emailsilently return 0 rows when the bound value is a tuple('val',). This is the same pattern used byBigQueryQueryConfig,MicrosoftSQLServerQueryConfig, andGoogleCloudSQLPostgresQueryConfig.namespace_meta initialization —
RDSPostgresConnector.query_config()now storesnamespace_metaon the connector instance (self.namespace_meta) before returning, matching the pattern inPostgreSQLConnectorandGoogleCloudSQLPostgresConnector. This ensurescreate_client()has access to namespace_meta when building the connection.RDS connector registration — Added
RDSPostgresConnectorandRDSMySQLConnectorto theConnectionsfactory intask_resources.pyso DSR graph execution can instantiate them.Stacks on top of:
Code Changes
src/fides/api/service/connectors/sql_connector.py- Added_current_namespace_metainstance variable in__init__; liftedget_qualified_table_name()from subclasses into the base with namespace-aware logicsrc/fides/api/service/connectors/postgres_connector.py-set_schema()skips when_current_namespace_metais set;query_config()stores namespace_meta on the instance; removed duplicateget_qualified_table_name()src/fides/api/service/connectors/google_cloud_postgres_connector.py- Sameset_schemareconciliation pattern; removed duplicateget_qualified_table_name()src/fides/api/service/connectors/rds_postgres_connector.py- Removed duplicateget_qualified_table_name()(inherits from base);query_config()now stores namespace_meta on the connector instancesrc/fides/api/service/connectors/query_configs/postgres_query_config.py-RDSPostgresQueryConfignow inherits from bothQueryStringWithoutTuplesOverrideQueryConfigandPostgresQueryConfigfor pg8000 compatibilitysrc/fides/api/task/task_resources.py- RegisteredRDSPostgresConnectorandRDSMySQLConnectorin theConnectionsfactorytests/ops/service/connectors/test_postgres_connector.py- AddedTestPostgreSQLConnectorSetSchemawith 5 unit teststests/ops/service/connectors/test_google_cloud_postgres_connector.py- AddedTestGoogleCloudSQLPostgresConnectorSetSchemawith 5 unit testsSteps to Confirm
These scenarios verify the three-way logic in
set_schema()and the call flow throughquery_config() → set_schema()during DSR execution.Scenario 1: namespace_meta present →
set_schemaskipped, SQL uses qualified table names{ "fides_key": "my_postgres_dataset", "fides_meta": { "namespace": { "schema": "billing" } }, "collections": [...] }SELECT ... FROM "billing"."customer" WHERE .... TheSET search_pathstatement is NOT executed (check logs — noSetting PostgreSQL search_path before retrieving datamessage). This is becausequery_config()stores the namespace_meta on the connector instance, andset_schema()sees it and returns early.Scenario 2: No namespace_meta,
db_schemain connection secrets →set_schemarunsfides_meta.namespace.db_schemain secrets:{ "host": "...", "dbname": "...", "db_schema": "billing", ... }SELECT ... FROM "customer" WHERE .... TheSET search_path to 'billing'statement IS executed before the query (check logs forSetting PostgreSQL search_path before retrieving data). This is the legacy path —_current_namespace_metaisNone, soset_schema()proceeds to checkdb_schema.Scenario 3: Neither namespace_meta nor
db_schema→ no-op (backward compatible)fides_meta.namespace.db_schemain secrets.SELECT ... FROM "customer" WHERE .... NoSET search_pathis executed. Queries run against the defaultpublicschema. This confirms backward compatibility — existing Postgres connections without any schema configuration continue to work unchanged.Scenario 4: RDS Postgres DSR execution with namespace_meta
rds_postgrestype) with IAM auth. AWS credentials can be found in 1Password. The test cluster isdatabase-4(Aurora PostgreSQL) inus-east-2, with databasepostgres_exampleand schemapublic.{ "fides_key": "rds_postgres_example", "fides_meta": { "namespace": { "connection_type": "rds_postgres", "database_instance_id": "database-4", "database_id": "postgres_example", "schema": "public" } }, "collections": [...] }customertable (e.g.customer-1@example.com)."public"."customer"are generated correctly.Code path reference (for reviewers reading the code):
sql_connector.py:retrieve_data()callsself.query_config(node)(line 184), which stores_current_namespace_metaon the connector instance, then callsself.set_schema(connection)(line 197), which checks the stored state.mask_data()(lines 229, 243) andexecute_standalone_retrieval_query()(lines 160, 171).RDSPostgresQueryConfiguses Python MRO to resolve query generation methods fromQueryStringWithoutTuplesOverrideQueryConfig(no tuple params) while keeping SQL formatting fromPostgresQueryConfig(double-quoted identifiers, schema-qualified table names).Manual verification completed:
namespace_metapresent →set_schemaskippeddb_schemaset →set_schemarunsdb_schema→ no-opnamespace_metapresent →set_schemaskippedget_qualified_table_nameno namespace → plain nameget_qualified_table_nameschema only →schema.tableget_qualified_table_nameschema+db →db.schema.tablePre-Merge Checklist
CHANGELOG.mdupdated🤖 Generated with Claude Code