Skip to content

ENG-2635: Support namespaces for Postgres DSR execution#7500

Open
JadeCara wants to merge 33 commits intomainfrom
ENG-2635-postgres-namespace-support
Open

ENG-2635: Support namespaces for Postgres DSR execution#7500
JadeCara wants to merge 33 commits intomainfrom
ENG-2635-postgres-namespace-support

Conversation

@JadeCara
Copy link
Contributor

@JadeCara JadeCara commented Feb 26, 2026

Ticket ENG-2635

Description Of Changes

Add namespace support for Postgres connectors, enabling a single connection to handle multiple schemas for DSR execution. This follows the established Snowflake/BigQuery pattern exactly.

In cases where 200-300 datasets representing different Postgres schemas. Previously, each schema required a separate integration for DSRs. With namespace support, a single Postgres connection can execute DSRs across all schemas by using fully-qualified table names ("schema"."table" or "database"."schema"."table").

This also removes the stubbed retrieve_data()/mask_data() from RDSPostgresConnector, unblocking DSR execution for RDS Postgres connections.

PR 1 of 3 — Core Postgres + RDS Postgres namespace support. Google Cloud SQL Postgres (PR 2) and set_schema() reconciliation (PR 3) follow separately.

Code Changes

  • src/fides/api/schemas/namespace_meta/postgres_namespace_meta.py — New PostgresNamespaceMeta schema with database_name (optional) and schema (required), auto-registered via NamespaceMeta.__init_subclass__. Returns empty fallback fields for backward compat (Postgres defaults to public schema).
  • src/fides/api/service/connectors/query_configs/postgres_query_config.py — Added namespace_meta_schema, generate_table_name() for schema-qualified SQL, updated get_formatted_query_string() and get_update_stmt() to use it
  • src/fides/api/service/connectors/postgres_connector.pyquery_config() now fetches namespace_meta from DB, added get_qualified_table_name() for unquoted names in error handling
  • src/fides/api/service/connectors/rds_postgres_connector.py — Removed stubbed retrieve_data()/mask_data() (inherits SQLConnector's implementations), query_config() passes namespace_meta, added get_qualified_table_name()
  • src/fides/api/service/connectors/sql_connector.py — Added None guard in get_namespace_meta() for dry-run contexts where no DB session is available
  • src/fides/service/dataset/validation_steps/namespace_meta.py — Added None guard for connection secrets; skip namespace validation when namespace_meta fields don't overlap with the connection type's schema (cross-type dataset linking)
  • clients/admin-ui/src/features/integrations/integration-type-info/rdsPostgresInfo.tsx — Added "DSR Automation" to RDS Postgres tags
  • tests/ops/service/connectors/test_postgres_query_config.py — New tests mirroring Snowflake test structure: parametrized namespace query generation, invalid meta validation, namespaced update statements
  • tests/service/dataset_service/test_namespace_meta_validation.py — Updated for Postgres namespace support: 5 new Postgres-specific tests, cross-type namespace skip test, updated unsupported type test

Steps to Confirm

1. Run namespace query config tests

pytest tests/ops/service/connectors/test_postgres_query_config.py -v

Expected: All 9 tests pass (5 parametrized query generation, 1 invalid meta, 3 update statements).

2. Run namespace validation tests

pytest tests/service/dataset_service/test_namespace_meta_validation.py -v

Expected: All 15 tests pass, including 5 Postgres-specific and 1 cross-type namespace skip test.

3. Verify backward compatibility (no namespace = no breakage)

pytest tests/ops/service/connectors/test_queryconfig.py -v

Expected: Existing Postgres query config tests pass unchanged — datasets without namespace_meta produce unqualified table names as before.

4. Verify dataset creation with namespace_meta via API

Create a Postgres connection, then create a dataset with namespace_meta, and link it:

# Create connection
curl -X PATCH "http://localhost:8080/api/v1/connection" \
  -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \
  -d '[{"name":"Test Namespace PG","key":"test_ns_pg","connection_type":"postgres","access":"read"}]'

# Create dataset with namespace_meta
curl -X POST "http://localhost:8080/api/v1/dataset/upsert" \
  -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \
  -d '[{
    "fides_key": "ns_test",
    "name": "Namespace Test",
    "collections": [{"name": "orders", "fields": [
      {"name": "id", "fides_meta": {"primary_key": true, "data_type": "integer"}},
      {"name": "email", "fides_meta": {"data_type": "string", "identity": "email"}}
    ]}],
    "fides_meta": {"namespace": {"schema": "billing"}}
  }]'

# Link dataset config to connection
curl -X PATCH "http://localhost:8080/api/v1/connection/test_ns_pg/datasetconfig" \
  -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \
  -d '[{"fides_key": "ns_test", "ctl_dataset_fides_key": "ns_test"}]'

Expected: All return 200/201. The datasetconfig response shows "succeeded": [...] with 1 entry.

5. Verify namespace_meta is persisted and returned

curl -X GET "http://localhost:8080/api/v1/connection/test_ns_pg/datasetconfig" \
  -H "Authorization: Bearer $TOKEN"

Expected: The response includes "fides_meta": {"namespace": {"schema": "billing"}} in the ctl_dataset of the linked dataset config.

6. Verify backward compat via API (no namespace_meta required for Postgres)

# Create and link a dataset WITHOUT namespace_meta to the same Postgres connection
curl -X POST "http://localhost:8080/api/v1/dataset/upsert" \
  -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \
  -d '[{
    "fides_key": "no_ns_test",
    "name": "No Namespace",
    "collections": [{"name": "users", "fields": [
      {"name": "id", "fides_meta": {"primary_key": true, "data_type": "integer"}}
    ]}]
  }]'

curl -X PATCH "http://localhost:8080/api/v1/connection/test_ns_pg/datasetconfig" \
  -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \
  -d '[{"fides_key": "no_ns_test", "ctl_dataset_fides_key": "no_ns_test"}]'

Expected: Both return 200/201 — no validation error. Postgres connections do NOT require namespace_meta (backward compatible, defaults to public schema).

7. Check RDS Postgres Admin UI tag

Navigate to the Integrations page in the Admin UI and find "RDS Postgres".

Expected: RDS Postgres shows "DSR Automation", "Discovery", and "Detection" tags.

Pre-Merge Checklist

  • Issue requirements met
  • All CI pipelines succeeded
  • CHANGELOG.md updated
    • Add a db-migration This indicates that a change includes a database migration label to the entry if your change includes a DB migration
    • Add a high-risk This issue suggests changes that have a high-probability of breaking existing code label to the entry if your change includes a high-risk change (i.e. potential for performance impact or unexpected regression) that should be flagged
    • Updates unreleased work already in Changelog, no new entry necessary
  • UX feedback:
    • All UX related changes have been reviewed by a designer
    • No UX review needed
  • Followup issues:
    • Followup issues created
    • No followup issues
  • Database migrations:
    • Ensure that your downrev is up to date with the latest revision on main
    • Ensure that your downgrade() migration is correct and works
      • If a downgrade migration is not possible for this change, please call this out in the PR description!
    • No migrations
  • Documentation:
    • Documentation complete, PR opened in fidesdocs
    • Documentation issue created in fidesdocs
    • If there are any new client scopes created as part of the pull request, remember to update public-facing documentation that references our scope registry
    • No documentation updates required

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Postgres and RDS Postgres namespace/schema support added for database-qualified table naming and improved query/update generation.
    • Changelog entry added documenting Postgres namespace support.
  • Improvements

    • RDS Postgres integration now shows a "DSR Automation" tag in the admin UI.
    • Shared SQL namespace model standardizes database/schema handling; validation skips namespace checks when connection types mismatch.
  • Removed

    • RDS Postgres connector DSR data retrieval/masking functionality removed.
  • Tests

    • Expanded tests for Postgres/RDS namespace handling and query/update generation.

Add PostgresNamespaceMeta schema and wire namespace support through
PostgresQueryConfig, PostgreSQLConnector, and RDSPostgresConnector.
Remove stubbed retrieve_data/mask_data from RDSPostgresConnector so
it inherits SQLConnector's implementations, enabling DSR execution.
Update RDS Postgres admin UI tags to include "DSR Automation".

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@vercel
Copy link
Contributor

vercel bot commented Feb 26, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

2 Skipped Deployments
Project Deployment Actions Updated (UTC)
fides-plus-nightly Ignored Ignored Preview Mar 6, 2026 9:48pm
fides-privacy-center Ignored Ignored Mar 6, 2026 9:48pm

Request Review

Jade Wibbels and others added 2 commits February 26, 2026 08:52
Jade Wibbels and others added 7 commits February 26, 2026 09:17
Guard against None secrets in NamespaceMetaValidationStep. Update
test_validate_unsupported_connection_type to use mariadb (postgres is
now a supported namespace type). Add Postgres-specific validation tests.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Postgres defaults to the public schema when neither namespace_meta nor
db_schema is configured. Return empty fallback fields so validation
doesn't reject existing postgres connections that lack both.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove extra parentheses around bind parameters in expected query
strings to match current SQLAlchemy output format.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The dry-run query test passes db=None to TaskResources, which flows
into query_config() -> get_namespace_meta(). Return None early when
no db session is available.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When a dataset's namespace_meta has no overlapping fields with the
connection type's namespace schema (e.g. BigQuery namespace_meta on a
Postgres connection), skip validation rather than raising an error.
This happens when datasets of mixed types are bulk-linked to a single
connection.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@JadeCara JadeCara marked this pull request as ready for review February 26, 2026 18:09
@JadeCara JadeCara requested review from a team as code owners February 26, 2026 18:09
@JadeCara JadeCara requested review from adamsachs and speaker-ender and removed request for a team February 26, 2026 18:09
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

10 files reviewed, 3 comments

Edit Code Review Agent Settings | Greptile

JadeCara pushed a commit that referenced this pull request Feb 26, 2026
Add GoogleCloudSQLPostgresNamespaceMeta with database_name (optional)
and schema (required), following the same pattern as Snowflake/BigQuery.

Update GoogleCloudSQLPostgresQueryConfig with generate_table_name()
for schema-qualified SQL, and GoogleCloudSQLPostgresConnector to fetch
namespace_meta from DB and pass it to the query config.

Also includes shared fixes from PR #7500:
- Guard against None db session in get_namespace_meta
- Guard against None secrets in namespace validation
- Skip namespace validation for mismatched connection types

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
JadeCara pushed a commit that referenced this pull request Feb 26, 2026
Add GoogleCloudSQLPostgresNamespaceMeta with database_name (optional)
and schema (required), following the same pattern as Snowflake/BigQuery.

Update GoogleCloudSQLPostgresQueryConfig with generate_table_name()
for schema-qualified SQL, and GoogleCloudSQLPostgresConnector to fetch
namespace_meta from DB and pass it to the query config.

Also includes shared fixes from PR #7500:
- Guard against None db session in get_namespace_meta
- Guard against None secrets in namespace validation
- Skip namespace validation for mismatched connection types

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@JadeCara JadeCara requested review from galvana February 26, 2026 22:12
@ethyca ethyca deleted a comment from greptile-apps bot Mar 3, 2026
@JadeCara
Copy link
Contributor Author

JadeCara commented Mar 3, 2026

@greptile please review

Copy link
Contributor

@adamsachs adamsachs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for getting this in place! generally, this is looking good.

there are a few stylistic/code concerns that likely don't have much of a functionality impact, but some of them are worth addressing because i think they're potential footguns for the future.

and then there's one meatier product/functionality question around the namespace field naming and its alignment with Helios. the comment should explain that reasonably well, but i think it will be a point for further discussion! we can sync up on that separately.


connection_type: Literal["postgres"] = "postgres"
database_name: Optional[str] = None
schema: str # type: ignore[assignment]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you think about abstracting the schema here so that we can start trying to standardize more across our datastores? the same fields are used for snowflake and will be used for google cloud SQL based on your other PR, i'd like us to try to start encouraging/enforcing that for consistency.

we've actually already gone astray in this regard on the helios side, which is unfortunate (and will relate to other comments here), and i think we need to start correcting ourselves ASAP. you can see the divergence in naming conventions in this doc...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whoops, just realized this was ambiguous with schema and keying specifically on that line...what i meant was the pydantic schema generally 😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed on standardizing. I'll extract a shared SQLNamespaceMeta base class with database_name and schema fields that Postgres, Snowflake, and Cloud SQL can inherit from. BigQuery stays separate since its fields are different (project_id, dataset_id).

Helios can align its database_iddatabase_name naming in a follow-up migration.

Generates SQL valid for Postgres
"""

namespace_meta_schema = PostgresNamespaceMeta
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, so i think this is generally on the right track in that we should be standardizing as much as possible between different 'flavors' of postgres integrations (and datastore integrations generally), but i want to call out that it has a few discrepancies with what Helios is generating for RDS postgres datasets, which we ultimately need to resolve (i think mainly on the helios-side!):

  1. helios uses database_id here to indicate the database name, instead of database_name. i honestly don't know why, but it seems like we're standardizing on database_name generally and that's good - let's keep a consistent a standard as possible across datastores
  2. helios doesn't include a schema attribute in its metadata. again, i'm not sure why we don't populate a schema attribute here, we have the schema available to us.
  3. we populate a database_instance_id with the RDS database instance ID where this was found.

on the first two points, i think Helios is just wrong, so i don't think that should stop us from doing the right thing here; we just need to ensure we make the proper adjustments on the fides side, and probably migrate over any existing datasets for safety.

on the third point...i'm a bit less sure, this area has always been a bit murky! do we need the RDS DB instance ID for DSR execution? do we assume that a given integration is tied to a specific DB instance? we don't explicitly have that as a field/parameter on the RDS Postgres integration, but is it implicitly defined in the credentials used?

this is probably where 90% of the complexity of this work is. it may require us to do some broader product (re-)thinking. or not! let's break out on this a bit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed on all three points. For this PR:

  1. We're standardizing on database_name — consistent with Snowflake and the new SQLNamespaceMeta base class. Helios should migrate database_iddatabase_name in a follow-up.
  2. Helios should populate schema — the base class will make this a required field, so Helios-generated datasets will need to include it.
  3. database_instance_id isn't needed for DSR execution — the integration credentials already implicitly target a specific RDS instance. I think database_instance_id is a Helios/discovery concern (tracking where a dataset was found) rather than a DSR concern (how to query it). Happy to discuss further if you see it differently.

Want me to create a follow-up ticket for the Helios alignment work?

f"or the connection must have values for the following fields: {field_descriptions}"
)
if namespace_meta:
# Skip validation when the namespace_meta is clearly intended for a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this valid? is this a scenario we really expect? if it's only happening in tests, that seems like we should fix up the tests.

maybe i'm missing something, but i feel like this isn't something we really want to have... if it's a purely defensive check for some oddity in our test setup that's now being exposed and we don't want to make a bigger change to our tests, then i'd prefer it sit isolated in a helper function so that we can clearly mark it as a temporary workaround we can clearly remove. honestly that probably feels like the right thing to do here in any case, as this is effectively an 'add-on' to the core validation logic.

let's just say...i'm generally confused by this :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, i think maybe i see why this is needed - is it because we're not declaring a proper new namespace meta for RDS postgres? if that's the reason why, i think this is a poor workaround. shouldn't we instead just have something like the following?

class RDSPostgresNamespaceMeta(PostgresNamespaceMeta):

 connection_type: Literal["rds_postgres"] = "rds_postgres"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fwiw i wrote all of this before looking at the claude PR review, but claude agrees that the namespace validation skip logic is a 'critical' finding for its fragility:

  1. Namespace validation skip logic could be fragile

namespace_meta.py:53-65 — The field-overlap heuristic that checks whether namespace_meta fields intersect with the schema's fields:

schema_field_names = {
field_name
for field_name in namespace_meta_class.model_fields
if field_name != "connection_type"
}
if schema_field_names and not schema_field_names.intersection(
namespace_meta.keys()
):
return

This works today because Postgres (schema, database_name) and BigQuery (project_id, dataset_id) have non-overlapping field names. But if a future connector adds a schema or database_name field, this heuristic will silently break. The connection_type discriminator check
above is more robust.

Recommendation: Consider requiring connection_type in namespace_meta going forward and document that the field-overlap fallback is a compatibility shim. A logger.debug when the fallback fires would help with debugging.

but if i'm reading between the lines correctly, i really think we should just be requiring connection_type, unless there's some reasoning here i'm missing!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right — the field-overlap heuristic is fragile and unnecessary if we require connection_type. I'll:

  1. Add RDSPostgresNamespaceMeta(PostgresNamespaceMeta) with connection_type: Literal["rds_postgres"]
  2. Remove the field-overlap fallback and rely solely on connection_type for discriminating which validator applies

This also ties into the SQLNamespaceMeta base class from the earlier comment — RDS Postgres will inherit through PostgresNamespaceMeta → SQLNamespaceMeta → NamespaceMeta.

Comment on lines +135 to +149
def get_qualified_table_name(self, node: ExecutionNode) -> str:
"""Get fully qualified Postgres table name for table_exists() checks.

Returns unquoted names (e.g. schema.table) because the generic
SQLConnector.table_exists() uses split(".") + inspector.has_table().
"""
query_config = self.query_config(node)
if not query_config.namespace_meta:
return node.collection.name

meta = cast(PostgresNamespaceMeta, query_config.namespace_meta)
qualified = f"{meta.schema}.{node.collection.name}"
if meta.database_name:
return f"{meta.database_name}.{qualified}"
return qualified
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this comes from claude, and i think i largely agree:

  1. Duplicated get_qualified_table_name across both connectors

postgres_connector.py:97-113 and rds_postgres_connector.py:135-149 contain identical implementations of get_qualified_table_name. This is a DRY violation that will diverge over time.

The Snowflake and BigQuery connectors use a simpler pattern — they delegate to query_config.generate_table_name():

  # snowflake_connector.py:91-94
  def get_qualified_table_name(self, node: ExecutionNode) -> str:
      query_config = self.query_config(node)
      return query_config.generate_table_name()

However, this won't work here because PostgresQueryConfig.generate_table_name() returns quoted names ("schema"."table") while table_exists() needs unquoted names (schema.table). That's a real design tension.

Recommendation: Either:

  • Add an unquoted param to generate_table_name(), or
  • Extract the shared logic into a private method on PostgresConnector (and have RDSPostgresConnector inherit from it or call a shared utility), or
  • At minimum, add a comment explaining why this can't delegate like Snowflake does.

i think option #1 makes the most sense. ideally we have the 'qualified table name' logic defined in a single place, and the quote handling is done as part of that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed — I'll add a quoted parameter to generate_table_name() (defaulting to True to preserve current behavior). Both connectors' get_qualified_table_name() will then just delegate: return self.query_config(node).generate_table_name(quoted=False), eliminating the duplication.

Comment on lines +109 to +110
if db is None:
return None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a huge deal, but without this, does something break? at a minimum it feels like we should have db typed as Optional[Session] if this is possible; but really, it seems liek we should just be avoiding this call altogether if we don't have a Session to pass when this is being called.

maybe it goes beyond the scope of this PR, but i also don't really get why we're adding this in the PR, unless we've just found some broken functionality!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This guards against Session.object_session(self.configuration) returning None — which happens in tests when the config isn't bound to a DB session (e.g., PostgresQueryConfig unit tests that pass mock execution nodes without a real DB). Without it, those tests crash on db.scalar().

Agree the typing should reflect this — I'll update the signature to Optional[Session].

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

k, got it. i do feel like when app code has specific checks to guard against test-only codepaths it's a bit of a code smell, but i also understand that the problem here may lie deeper in the code organization and it'd take too broad of a refactor to fix, or too much mocking in the tests.

Copy link
Contributor Author

@JadeCara JadeCara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the thorough review! Addressing all the inline feedback — the main changes will be:

  1. New SQLNamespaceMeta base class with standardized database_name + schema fields
  2. RDSPostgresNamespaceMeta subclass with proper connection_type discriminator
  3. Remove fragile field-overlap validation heuristic in favor of connection_type check
  4. generate_table_name(quoted=False) to eliminate duplicated get_qualified_table_name logic
  5. Fix get_namespace_meta typing to Optional[Session]

Jade Wibbels and others added 2 commits March 5, 2026 12:23
- Extract SQLNamespaceMeta base class with shared database_name + schema fields
- Add RDSPostgresNamespaceMeta with connection_type discriminator
- Refactor Postgres/Snowflake namespace metas to inherit from SQLNamespaceMeta
- Add RDSPostgresQueryConfig for proper namespace validation routing
- Add quoted param to generate_table_name() to eliminate duplicated
  get_qualified_table_name() logic across both Postgres connectors
- Remove fragile field-overlap heuristic in namespace validation,
  rely solely on connection_type discriminator
- Fix get_namespace_meta typing to Optional[Session]
- Add RDS Postgres validation tests

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/fides/service/dataset/validation_steps/namespace_meta.py`:
- Around line 43-48: The current early return when meta_connection_type !=
connection_type (using namespace_meta.get("connection_type") stored in
meta_connection_type) silently skips validation for misspelled or unknown
values; instead, validate meta_connection_type against the set of known/expected
connection types and only silently skip when it is a different but known type
(i.e., meta_connection_type in KNOWN_CONNECTION_TYPES and != connection_type),
otherwise raise a validation exception (or return an error) when
meta_connection_type is present but not recognized so invalid metadata fails
fast.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 327b1bec-9d6d-429c-badc-a7c80159ce62

📥 Commits

Reviewing files that changed from the base of the PR and between 68fa5b5 and 61031db.

📒 Files selected for processing (10)
  • src/fides/api/schemas/namespace_meta/postgres_namespace_meta.py
  • src/fides/api/schemas/namespace_meta/rds_postgres_namespace_meta.py
  • src/fides/api/schemas/namespace_meta/snowflake_namespace_meta.py
  • src/fides/api/schemas/namespace_meta/sql_namespace_meta.py
  • src/fides/api/service/connectors/postgres_connector.py
  • src/fides/api/service/connectors/query_configs/postgres_query_config.py
  • src/fides/api/service/connectors/rds_postgres_connector.py
  • src/fides/api/service/connectors/sql_connector.py
  • src/fides/service/dataset/validation_steps/namespace_meta.py
  • tests/service/dataset_service/test_namespace_meta_validation.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • tests/service/dataset_service/test_namespace_meta_validation.py
  • src/fides/api/service/connectors/sql_connector.py

JadeCara pushed a commit that referenced this pull request Mar 5, 2026
Add GoogleCloudSQLPostgresNamespaceMeta with database_name (optional)
and schema (required), following the same pattern as Snowflake/BigQuery.

Update GoogleCloudSQLPostgresQueryConfig with generate_table_name()
for schema-qualified SQL, and GoogleCloudSQLPostgresConnector to fetch
namespace_meta from DB and pass it to the query config.

Also includes shared fixes from PR #7500:
- Guard against None db session in get_namespace_meta
- Guard against None secrets in namespace validation
- Skip namespace validation for mismatched connection types

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
tests/service/dataset_service/test_namespace_meta_validation.py (1)

295-326: Consider adding one regression test for namespace metadata missing connection_type.

You now test mismatched typed metadata well; adding a case with no connection_type would lock the intended compatibility behavior introduced in the validator and prevent accidental tightening/loosening later.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/service/dataset_service/test_namespace_meta_validation.py` around lines
295 - 326, Add a new regression test that mirrors
test_validate_mismatched_namespace_skipped but sets fides_meta.namespace without
a connection_type key to ensure validation is skipped rather than enforced;
create a FideslangDataset with fides_meta={"namespace": {"project_id": "...",
"dataset_id": "..."}} (no connection_type), use a ConnectionConfig with
ConnectionType.postgres, build the DatasetValidationContext, instantiate
NamespaceMetaValidationStep, and assert that validator.validate(context) does
not raise—this will lock in the intended compatibility behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/fides/api/schemas/namespace_meta/rds_postgres_namespace_meta.py`:
- Around line 10-12: Update the module docstring in
src/fides/api/schemas/namespace_meta/rds_postgres_namespace_meta.py to remove
the internal system reference "fidesplus RDSPostgresMonitor.get_schemas()" and
instead use generic wording (e.g., "RDS monitor" or "monitoring service") and
likewise replace "RDS connector's create_client()" with a neutral phrase like
"RDS connector client creation" so the docstring describes expected identifiers
generically without referencing internal system names; keep the rest of the
explanatory text intact.

In `@src/fides/api/service/connectors/query_configs/postgres_query_config.py`:
- Around line 19-42: The generate_table_name method (and the analogous code at
lines 72-87) currently wraps collection_name, PostgresNamespaceMeta.schema, and
PostgresNamespaceMeta.database_name in double quotes but does not escape any
embedded double quotes inside those identifiers; update generate_table_name (and
the other identifier-composition function) to first escape any internal
double-quote characters by replacing " with "" for each identifier component,
then apply the quoted wrapper when quoted=True; preserve the unquoted behavior
when quoted=False so inspector/table_exists checks still receive the raw
identifier.

---

Nitpick comments:
In `@tests/service/dataset_service/test_namespace_meta_validation.py`:
- Around line 295-326: Add a new regression test that mirrors
test_validate_mismatched_namespace_skipped but sets fides_meta.namespace without
a connection_type key to ensure validation is skipped rather than enforced;
create a FideslangDataset with fides_meta={"namespace": {"project_id": "...",
"dataset_id": "..."}} (no connection_type), use a ConnectionConfig with
ConnectionType.postgres, build the DatasetValidationContext, instantiate
NamespaceMetaValidationStep, and assert that validator.validate(context) does
not raise—this will lock in the intended compatibility behavior.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7d069e59-9a26-47da-b1f9-e6f69996ebcd

📥 Commits

Reviewing files that changed from the base of the PR and between 241750e and 5d91d41.

📒 Files selected for processing (5)
  • src/fides/api/schemas/namespace_meta/__init__.py
  • src/fides/api/schemas/namespace_meta/rds_postgres_namespace_meta.py
  • src/fides/api/service/connectors/query_configs/postgres_query_config.py
  • src/fides/service/dataset/validation_steps/namespace_meta.py
  • tests/service/dataset_service/test_namespace_meta_validation.py

Jade Wibbels and others added 3 commits March 5, 2026 16:20
Legacy datasets may have namespace_meta without a connection_type
discriminator field. This adds a field-overlap heuristic: if namespace
fields have zero overlap with the expected schema, validation is
skipped (the namespace belongs to a different connection type). If
fields do overlap, connection_type is backfilled and validation runs
strictly so malformed namespaces are still caught.

Also adds connection_type: bigquery to the enterprise test dataset YAML
and 5 new unit tests covering the backward compat scenarios.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@JadeCara JadeCara requested a review from adamsachs March 6, 2026 02:30
Copy link
Contributor

@adamsachs adamsachs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, this is getting close - it's great that this is working with a proper RDS integration and i think we're basically using the 'right' mechanisms.

still just a few points that i think could use clean-up, let me know what you think!

from fides.api.schemas.namespace_meta.namespace_meta import NamespaceMeta


class SQLNamespaceMeta(NamespaceMeta):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍


connection_type: Literal["rds_postgres"] = "rds_postgres"
database_instance_id: str
database_id: str
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so i imagine we reverted this back to database_id so it'd work well with the existing code in rds_postgres_connector.py...but i do think we should be establishing our more consistent standard here. can we just update the code there to look for database_name on the meta? i don't think that's going to break anything downstream, either here or on the monitor side in fidesplus, i really think that was just stubbed out for future use.

we do still need to update the namespace meta populated by helios monitors if we make this change, but i do think that's a relatively simple adjustment that we should go ahead and make. if we do that, then we should be able to inherit from the new SQLNamespaceMeta here, right? that'll help maintain standardization :)

note: i also wonder while we're here whether renaming the namespace field here to database_instance_name instead of database_instance_id is wise. that just seems more accurate, and if we don't change it now, we're never gonna get to it 😅

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — RDSPostgresNamespaceMeta now inherits from SQLNamespaceMeta with standardized fields:

  • database_name (inherited, replaces database_id)
  • database_instance_name (replaces database_instance_id)
  • schema (inherited, optional)

Updated the connector and tests to match. The fidesplus monitor (rds_postgres_monitor.py) will need a matching update to emit the new field names.

Note: if any RDS Postgres datasets have already been created by Helios, we'll need a data migration to rename the JSON keys in ctl_datasets.fides_meta. Suggest we handle that in a separate PR to keep this one focused.

from fides.api.schemas.namespace_meta.sql_namespace_meta import SQLNamespaceMeta


class PostgresNamespaceMeta(SQLNamespaceMeta):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"""
Prepends the schema to the base table name if RDS namespace meta
is provided. Unlike Postgres, RDS doesn't need database_name
qualification because the connection targets a specific database.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK this is clear - i see why this is branched off now again. i'll throw this out there as a suggestion, but it may be overkill: we could continue to keep the logic centralized in a single method on the base classPostgresQueryConfig.generate_table_name by having that take an optional arg for whether to include the database name or not.

but i'm ambivalent...that may not even be clearer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed it could go either way. I'll keep the current approach with the separate override — it's explicit about why RDS skips database qualification (the engine URL already targets the database), which I think is clearer than a boolean flag on the base class.

Comment on lines +109 to +110
if db is None:
return None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

k, got it. i do feel like when app code has specific checks to guard against test-only codepaths it's a bit of a code smell, but i also understand that the problem here may lie deeper in the code organization and it'd take too broad of a refactor to fix, or too much mocking in the tests.

@@ -15,10 +15,9 @@ class SnowflakeNamespaceMeta(NamespaceMeta):

connection_type: Literal["snowflake"] = "snowflake"
database_name: str
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should be able to remove this too, right? shouldn't it inherit it from SQLNamespaceMeta?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SQLNamespaceMeta declares database_name as Optional[str] = None since Postgres doesn't always need it. Snowflake requires it, so the override here tightens the type from optional to required. Added a comment to make this clearer.

Comment on lines +46 to +48
# Skip validation when the namespace_meta connection_type doesn't
# match this connection. This happens when datasets of mixed types
# (e.g. BigQuery, Snowflake) are linked to a single connection.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this a valid scenario? i don't really think this can happen...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right — this can't happen in normal app runtime. Namespace metadata is set by the type-specific Helios monitor, and datasets are always linked to a single connection via the scoped endpoint (PATCH /connection/{key}/dataset). There's no mechanism to re-link a dataset across connection types.

Removed the mismatch check and the legacy field-overlap/backfill block. Also removed the 5 tests that covered those code paths.

if not provided_fields & expected_fields:
return
# Fields overlap — backfill connection_type so it's persisted
namespace_meta["connection_type"] = connection_type
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i thought we'd be able to get rid of this block if we ensure that all of our namespace meta's have connection_type's, which they now seem to (🙏)?

maybe there's some other reason this is needed? but it just doesn't seem like a valid code path/scenario for normal app runtime. if it's just to make a test pass, i'd recommend we find an alternative, this just feels like too much fragile special-casing to have in our app code. the whole point of the connection_type mechanism is so that we don't have to do this!

i.e. are there really 'legacy' namespace_meta's lurking around...?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed — removed this entire block in the same change. All namespace metas now have connection_type via the __init_subclass__ registration mechanism, so this heuristic is no longer needed. The alembic migration (4ebe0766021b) already backfilled connection_type for existing datasets.

@JadeCara
Copy link
Contributor Author

JadeCara commented Mar 6, 2026

Thanks for the second pass! All points addressed:

  1. RDSPostgresNamespaceMeta — now inherits from SQLNamespaceMeta, renamed database_iddatabase_name and database_instance_iddatabase_instance_name. Will need a fidesplus monitor update + potentially a data migration (separate PR).
  2. RDSPostgresQueryConfig — keeping the separate override; it's explicit about why RDS skips database qualification.
  3. SnowflakeNamespaceMeta.database_name — kept the override (tightens Optional→required from base class), added clarifying comment.
  4. Mismatch check + legacy block in validation — removed both. All namespace metas have connection_type now, so neither code path is needed.

- RDSPostgresNamespaceMeta: inherit from SQLNamespaceMeta, rename
  database_id → database_name and database_instance_id → database_instance_name
- Remove mismatch check and legacy field-overlap/backfill block from
  namespace validation (can't happen in normal app runtime)
- Add clarifying comment on SnowflakeNamespaceMeta.database_name override
- Update connector and tests to use new field names

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@JadeCara
Copy link
Contributor Author

JadeCara commented Mar 6, 2026

Created PRs to start the id -> name migration as well #7589 and https://github.com/ethyca/fidesplus/pull/3203

The NamespaceMetaValidationStep was validating dataset namespace metadata
against the connection config's type, causing failures when datasets with
BigQuery namespace metadata were submitted through a Postgres connection
(e.g. in the bulk create test). Now skips validation when the namespace's
connection_type doesn't match the connection config's type.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants