Skip to content

ENG-2635: Reconcile set_schema with namespace_meta for Postgres connectors#7510

Open
JadeCara wants to merge 14 commits intoENG-2635-gcp-postgres-namespace-supportfrom
ENG-2635-set-schema-reconciliation
Open

ENG-2635: Reconcile set_schema with namespace_meta for Postgres connectors#7510
JadeCara wants to merge 14 commits intoENG-2635-gcp-postgres-namespace-supportfrom
ENG-2635-set-schema-reconciliation

Conversation

@JadeCara
Copy link
Contributor

@JadeCara JadeCara commented Feb 26, 2026

Ticket ENG-2635

Description Of Changes

Reconcile set_schema() with namespace_meta for Postgres connectors (PR 3 of 3 for ENG-2635).

When namespace_meta is present, table names are already schema-qualified in the generated SQL (e.g. "billing"."customer"). In that case, set_schema() — which sets PostgreSQL's search_path — should be skipped to avoid conflicts. Previously both mechanisms could run simultaneously, which could cause unexpected behavior if db_schema was also configured.

This PR also consolidates the duplicated get_qualified_table_name() override from three Postgres connectors into the base SQLConnector, since all three had identical logic. The base implementation uses getattr to check for schema and database_name on the parsed namespace_meta, falling back to the plain collection name when no namespace is configured.

RDS Postgres DSR Execution Fixes

This PR also includes fixes to enable end-to-end DSR execution for RDS Postgres connectors:

  • pg8000 tuple parameter compatibilityRDSPostgresQueryConfig now inherits from QueryStringWithoutTuplesOverrideQueryConfig (in addition to PostgresQueryConfig) because the pg8000 driver used by RDS IAM auth does not auto-unpack single-element tuple parameters the way psycopg2 does. Without this fix, equality comparisons like email = :email silently return 0 rows when the bound value is a tuple ('val',). This is the same pattern used by BigQueryQueryConfig, MicrosoftSQLServerQueryConfig, and GoogleCloudSQLPostgresQueryConfig.

  • namespace_meta initializationRDSPostgresConnector.query_config() now stores namespace_meta on the connector instance (self.namespace_meta) before returning, matching the pattern in PostgreSQLConnector and GoogleCloudSQLPostgresConnector. This ensures create_client() has access to namespace_meta when building the connection.

  • RDS connector registration — Added RDSPostgresConnector and RDSMySQLConnector to the Connections factory in task_resources.py so DSR graph execution can instantiate them.

Stacks on top of:

Code Changes

  • src/fides/api/service/connectors/sql_connector.py - Added _current_namespace_meta instance variable in __init__; lifted get_qualified_table_name() from subclasses into the base with namespace-aware logic
  • src/fides/api/service/connectors/postgres_connector.py - set_schema() skips when _current_namespace_meta is set; query_config() stores namespace_meta on the instance; removed duplicate get_qualified_table_name()
  • src/fides/api/service/connectors/google_cloud_postgres_connector.py - Same set_schema reconciliation pattern; removed duplicate get_qualified_table_name()
  • src/fides/api/service/connectors/rds_postgres_connector.py - Removed duplicate get_qualified_table_name() (inherits from base); query_config() now stores namespace_meta on the connector instance
  • src/fides/api/service/connectors/query_configs/postgres_query_config.py - RDSPostgresQueryConfig now inherits from both QueryStringWithoutTuplesOverrideQueryConfig and PostgresQueryConfig for pg8000 compatibility
  • src/fides/api/task/task_resources.py - Registered RDSPostgresConnector and RDSMySQLConnector in the Connections factory
  • tests/ops/service/connectors/test_postgres_connector.py - Added TestPostgreSQLConnectorSetSchema with 5 unit tests
  • tests/ops/service/connectors/test_google_cloud_postgres_connector.py - Added TestGoogleCloudSQLPostgresConnectorSetSchema with 5 unit tests

Steps to Confirm

These scenarios verify the three-way logic in set_schema() and the call flow through query_config() → set_schema() during DSR execution.

Scenario 1: namespace_meta present → set_schema skipped, SQL uses qualified table names

  1. Create or update a Postgres dataset with namespace metadata:
    {
      "fides_key": "my_postgres_dataset",
      "fides_meta": {
        "namespace": {
          "schema": "billing"
        }
      },
      "collections": [...]
    }
  2. Run an access DSR against this dataset.
  3. Expected: The generated SQL uses schema-qualified table names like SELECT ... FROM "billing"."customer" WHERE .... The SET search_path statement is NOT executed (check logs — no Setting PostgreSQL search_path before retrieving data message). This is because query_config() stores the namespace_meta on the connector instance, and set_schema() sees it and returns early.

Scenario 2: No namespace_meta, db_schema in connection secrets → set_schema runs

  1. Use a Postgres dataset without fides_meta.namespace.
  2. Configure the Postgres connection with db_schema in secrets:
    {
      "host": "...",
      "dbname": "...",
      "db_schema": "billing",
      ...
    }
  3. Run an access DSR against this dataset.
  4. Expected: The SQL uses unqualified table names like SELECT ... FROM "customer" WHERE .... The SET search_path to 'billing' statement IS executed before the query (check logs for Setting PostgreSQL search_path before retrieving data). This is the legacy path — _current_namespace_meta is None, so set_schema() proceeds to check db_schema.

Scenario 3: Neither namespace_meta nor db_schema → no-op (backward compatible)

  1. Use a Postgres dataset without fides_meta.namespace.
  2. Use a Postgres connection without db_schema in secrets.
  3. Run an access DSR.
  4. Expected: SQL uses unqualified table names like SELECT ... FROM "customer" WHERE .... No SET search_path is executed. Queries run against the default public schema. This confirms backward compatibility — existing Postgres connections without any schema configuration continue to work unchanged.

Scenario 4: RDS Postgres DSR execution with namespace_meta

  1. Set up an RDS Postgres connection (rds_postgres type) with IAM auth. AWS credentials can be found in 1Password. The test cluster is database-4 (Aurora PostgreSQL) in us-east-2, with database postgres_example and schema public.
  2. Create a dataset with RDS Postgres namespace metadata:
    {
      "fides_key": "rds_postgres_example",
      "fides_meta": {
        "namespace": {
          "connection_type": "rds_postgres",
          "database_instance_id": "database-4",
          "database_id": "postgres_example",
          "schema": "public"
        }
      },
      "collections": [...]
    }
  3. Create a system and link the connection config to it.
  4. Run an access DSR targeting an email identity present in the customer table (e.g. customer-1@example.com).
  5. Expected: Data is successfully retrieved from the RDS Aurora cluster. The pg8000 driver correctly binds parameters as scalars (not tuples), and schema-qualified table names like "public"."customer" are generated correctly.

Code path reference (for reviewers reading the code):

  • sql_connector.py:retrieve_data() calls self.query_config(node) (line 184), which stores _current_namespace_meta on the connector instance, then calls self.set_schema(connection) (line 197), which checks the stored state.
  • The same flow applies in mask_data() (lines 229, 243) and execute_standalone_retrieval_query() (lines 160, 171).
  • RDSPostgresQueryConfig uses Python MRO to resolve query generation methods from QueryStringWithoutTuplesOverrideQueryConfig (no tuple params) while keeping SQL formatting from PostgresQueryConfig (double-quoted identifiers, schema-qualified table names).

Manual verification completed:

# Scenario Connector Result
1 namespace_meta present → set_schema skipped PostgreSQL ✅ PASS
2 No namespace_meta, db_schema set → set_schema runs PostgreSQL ✅ PASS
3 No namespace_meta, no db_schema → no-op PostgreSQL ✅ PASS
4 namespace_meta present → set_schema skipped GCS Postgres ✅ PASS
5 get_qualified_table_name no namespace → plain name PostgreSQL ✅ PASS
6 get_qualified_table_name schema only → schema.table PostgreSQL ✅ PASS
7 get_qualified_table_name schema+db → db.schema.table PostgreSQL ✅ PASS
8 Non-postgres connector unaffected by changes MariaDB ✅ PASS
9 RDS Postgres DSR access with namespace_meta → rows returned RDS Postgres (Aurora, us-east-2) ✅ PASS
10 RDS connector registered in task_resources RDS Postgres ✅ PASS

Pre-Merge Checklist

  • Issue requirements met
  • All CI pipelines succeeded
  • CHANGELOG.md updated
    • Add a db-migration This indicates that a change includes a database migration label to the entry if your change includes a DB migration
    • Add a high-risk This issue suggests changes that have a high-probability of breaking existing code label to the entry if your change includes a high-risk change
    • Updates unreleased work already in Changelog, no new entry necessary
  • UX feedback:
    • No UX review needed
  • Followup issues:
    • No followup issues
  • Database migrations:
    • No migrations
  • Documentation:
    • No documentation updates required

🤖 Generated with Claude Code

@vercel
Copy link
Contributor

vercel bot commented Feb 26, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

2 Skipped Deployments
Project Deployment Actions Updated (UTC)
fides-plus-nightly Ignored Ignored Preview Mar 6, 2026 10:20pm
fides-privacy-center Ignored Ignored Mar 6, 2026 10:20pm

Request Review

JadeCara pushed a commit that referenced this pull request Feb 26, 2026
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@coderabbitai
Copy link

coderabbitai bot commented Mar 4, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: be690212-cbb9-4697-a40e-2f3e85a1be9d

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch ENG-2635-set-schema-reconciliation

Comment @coderabbitai help to get the list of available commands and usage tips.

Jade Wibbels and others added 2 commits March 5, 2026 13:00
When namespace_meta is present, set_schema() is now skipped because
table names are already schema-qualified in the generated SQL. The
namespace_meta state is stored on the connector instance via
query_config() and checked in set_schema(). Also deduplicates
get_qualified_table_name into the base SQLConnector.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@JadeCara JadeCara force-pushed the ENG-2635-set-schema-reconciliation branch from 61a7a1d to 4c7cc49 Compare March 5, 2026 20:02
@JadeCara JadeCara changed the base branch from main to ENG-2635-gcp-postgres-namespace-support March 5, 2026 20:05
@JadeCara JadeCara marked this pull request as ready for review March 5, 2026 20:06
@JadeCara JadeCara requested a review from a team as a code owner March 5, 2026 20:06
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 5, 2026

Greptile Summary

This PR correctly reconciles two conflicting schema-qualification mechanisms for Postgres connectors: the legacy db_schema/search_path path and the newer namespace_meta path. It stores the current namespace_meta as _current_namespace_meta on the connector instance after each query_config() call, allowing set_schema() to short-circuit when namespace-qualified table names are already present in the generated SQL. It also consolidates the three identical get_qualified_table_name() overrides into the base SQLConnector.

Key changes:

  • SQLConnector.__init__ now initialises _current_namespace_meta = None; the base get_qualified_table_name uses query_config().namespace_meta to build schema/database-qualified names
  • PostgreSQLConnector.query_config() and GoogleCloudSQLPostgresConnector.query_config() store namespace_meta on the instance; both set_schema() implementations return early when it is set
  • RDSPostgresConnector removes its duplicate get_qualified_table_name but its query_config() does not set _current_namespace_meta, creating an inconsistency with the other two Postgres connectors (see inline comment)
  • The base get_qualified_table_name calls self.query_config(node), which for Postgres connectors has the hidden side-effect of issuing a DB query and mutating _current_namespace_meta (see inline comment)
  • Unit tests cover all set_schema scenarios well; unit tests for the consolidated get_qualified_table_name paths are absent

Confidence Score: 4/5

  • Safe to merge with minor cleanup; the core reconciliation logic is correct and manually verified, with no regressions to existing paths.
  • The primary logic (skipping set_schema when namespace_meta is present) is correct and well-tested. The two flagged concerns — RDSPostgresConnector.query_config() not setting _current_namespace_meta (a latent inconsistency, not a current bug) and the hidden side-effect in the base get_qualified_table_name — do not cause incorrect behaviour today but reduce maintainability. No migrations, no API changes, and the backward-compatible path is confirmed by the existing test class.
  • src/fides/api/service/connectors/rds_postgres_connector.py (inconsistent _current_namespace_meta update) and src/fides/api/service/connectors/sql_connector.py (hidden side-effect in get_qualified_table_name).

Important Files Changed

Filename Overview
src/fides/api/service/connectors/sql_connector.py Added _current_namespace_meta instance variable and consolidated get_qualified_table_name logic. The consolidation is correct but the new base implementation calls self.query_config(node) as a side effect, which for Postgres subclasses issues a DB query AND mutates _current_namespace_meta — a hidden side effect not obvious from the method name.
src/fides/api/service/connectors/postgres_connector.py Correctly skips set_schema when _current_namespace_meta is set, and stores namespace_meta in query_config. Clean and correct implementation.
src/fides/api/service/connectors/google_cloud_postgres_connector.py Same reconciliation pattern as PostgreSQLConnector — correctly stores namespace_meta on the instance and skips set_schema when present. Clean implementation.
src/fides/api/service/connectors/rds_postgres_connector.py Removed duplicate get_qualified_table_name, now inheriting from the base. However, query_config() does NOT set _current_namespace_meta (unlike PostgreSQLConnector and GoogleCloudSQLPostgresConnector), creating an inconsistency that could cause a latent bug if a set_schema override is ever added to this class.
tests/ops/service/connectors/test_postgres_connector.py Good coverage for set_schema scenarios (namespace present, db_schema present, neither present, state set/clear via query_config). Missing unit tests for the consolidated base get_qualified_table_name logic.
tests/ops/service/connectors/test_google_cloud_postgres_connector.py Mirrors the Postgres test structure with 5 equivalent set_schema unit tests. Coverage is thorough for the GCS-specific behavior.

Comments Outside Diff (2)

  1. src/fides/api/service/connectors/rds_postgres_connector.py, line 86-91 (link)

    _current_namespace_meta not updated in RDSPostgresConnector.query_config

    Unlike PostgreSQLConnector.query_config() and GoogleCloudSQLPostgresConnector.query_config() (both updated in this PR to set self._current_namespace_meta), this method does not follow the same pattern.

    This is harmless today because RDSPostgresConnector doesn't override set_schema and the base implementation is a no-op. However, the inconsistency creates a latent bug: if a set_schema override is ever added to RDSPostgresConnector following the same pattern used by the other two Postgres connectors, _current_namespace_meta would always be None — meaning set_schema would never be skipped even when namespace_meta is present.

    To keep the three Postgres connectors consistent and guard against this future pitfall, consider adding the same assignment here:

    def query_config(self, node: ExecutionNode) -> RDSPostgresQueryConfig:
        """Query wrapper corresponding to the input execution_node."""
        db: Session = Session.object_session(self.configuration)
        namespace_meta = SQLConnector.get_namespace_meta(db, node.address.dataset)
        self._current_namespace_meta = namespace_meta
        return RDSPostgresQueryConfig(node, namespace_meta)
  2. src/fides/api/service/connectors/sql_connector.py, line 330-352 (link)

    Hidden side-effect: get_qualified_table_name mutates _current_namespace_meta

    The consolidated implementation calls self.query_config(node), which for PostgreSQLConnector and GoogleCloudSQLPostgresConnector has two implicit side effects:

    1. Issues a DB round-trip (get_namespace_meta → SQL query on ctl_dataset).
    2. Overwrites self._current_namespace_meta on the connector instance.

    This method is currently called only from the exception handlers in retrieve_data (line 209) and mask_data (line 252) — both paths where _current_namespace_meta was already set correctly by the earlier self.query_config(node) call. So there's no correctness bug today. But the mutation is surprising: reading a method named get_qualified_table_name wouldn't lead a maintainer to expect it to also update connector state and issue a DB query.

    Consider reading from the already-set _current_namespace_meta instead of calling query_config again, or at minimum calling query_config only to access the returned object without the state side-effect mattering (the current approach). A comment noting the side-effect would improve maintainability.

Last reviewed commit: 4c7cc49

Comment on lines +26 to +122
class TestPostgreSQLConnectorSetSchema:
"""Tests for set_schema reconciliation with namespace_meta."""

@pytest.fixture
def connector_with_db_schema(self):
"""Connector with db_schema configured in secrets."""
config = MagicMock()
config.secrets = {
"host": "localhost",
"port": 5432,
"dbname": "testdb",
"username": "user",
"password": "pass",
"db_schema": "billing",
}
connector = PostgreSQLConnector(configuration=config)
return connector

def test_set_schema_skipped_when_namespace_meta_present(
self, connector_with_db_schema
):
"""When namespace_meta is present, set_schema should be a no-op."""
connector = connector_with_db_schema
connector._current_namespace_meta = {"schema": "billing"}

mock_connection = MagicMock()
connector.set_schema(mock_connection)

# Should NOT execute any SQL
mock_connection.execute.assert_not_called()

def test_set_schema_runs_when_namespace_meta_absent(self, connector_with_db_schema):
"""When namespace_meta is absent and db_schema is configured, set_schema should run."""
connector = connector_with_db_schema
connector._current_namespace_meta = None

mock_connection = MagicMock()
connector.set_schema(mock_connection)

# Should execute SET search_path
mock_connection.execute.assert_called_once()

def test_set_schema_noop_when_no_db_schema_and_no_namespace(self):
"""When neither namespace_meta nor db_schema is configured, set_schema is a no-op."""
config = MagicMock()
config.secrets = {
"host": "localhost",
"port": 5432,
"dbname": "testdb",
"username": "user",
"password": "pass",
}
connector = PostgreSQLConnector(configuration=config)
connector._current_namespace_meta = None

mock_connection = MagicMock()
connector.set_schema(mock_connection)

mock_connection.execute.assert_not_called()

@patch(
"fides.api.service.connectors.postgres_connector.SQLConnector.get_namespace_meta"
)
def test_query_config_sets_namespace_meta_state(self, mock_get_ns):
"""query_config() should store namespace_meta on the connector instance."""
mock_get_ns.return_value = {"schema": "billing"}

config = MagicMock()
config.secrets = {"host": "localhost"}
connector = PostgreSQLConnector(configuration=config)
assert connector._current_namespace_meta is None

mock_node = MagicMock()
mock_node.address.dataset = "test_dataset"
mock_node.collection.name = "customer"

connector.query_config(mock_node)
assert connector._current_namespace_meta == {"schema": "billing"}

@patch(
"fides.api.service.connectors.postgres_connector.SQLConnector.get_namespace_meta"
)
def test_query_config_clears_namespace_meta_state(self, mock_get_ns):
"""query_config() should clear namespace_meta when dataset has none."""
mock_get_ns.return_value = None

config = MagicMock()
config.secrets = {"host": "localhost"}
connector = PostgreSQLConnector(configuration=config)
connector._current_namespace_meta = {"schema": "old_value"}

mock_node = MagicMock()
mock_node.address.dataset = "test_dataset"
mock_node.collection.name = "customer"

connector.query_config(mock_node)
assert connector._current_namespace_meta is None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing unit tests for the consolidated get_qualified_table_name base logic

The new TestPostgreSQLConnectorSetSchema class provides solid coverage for the set_schema reconciliation, but there are no automated unit tests for the get_qualified_table_name consolidation in SQLConnector — specifically the three paths manually verified in the PR description:

  • No namespace → returns plain collection_name
  • Schema only → returns schema.collection_name
  • Schema + database_name → returns database_name.schema.collection_name

The PR description shows these were verified manually, but adding corresponding unit tests would prevent regressions if the base logic is changed in the future. These can be added to the existing TestPostgreSQLConnectorSetSchema class or a separate TestSQLConnectorGetQualifiedTableName class, mocking get_namespace_meta similarly to how the test_query_config_* tests already do.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

JadeCara and others added 6 commits March 5, 2026 16:27
RDSPostgresQueryConfig now inherits from QueryStringWithoutTuplesOverrideQueryConfig
to avoid wrapping single-value params in tuples. pg8000 (used for RDS IAM auth)
does not auto-unpack tuples like psycopg2, causing equality comparisons to
silently return 0 rows. Also fixes namespace_meta initialization in the
connector's query_config() and registers RDS connectors in task_resources.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Jade Wibbels and others added 2 commits March 6, 2026 11:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant