ENG-2635: Support namespaces for Postgres DSR execution#7500
ENG-2635: Support namespaces for Postgres DSR execution#7500
Conversation
Add PostgresNamespaceMeta schema and wire namespace support through PostgresQueryConfig, PostgreSQLConnector, and RDSPostgresConnector. Remove stubbed retrieve_data/mask_data from RDSPostgresConnector so it inherits SQLConnector's implementations, enabling DSR execution. Update RDS Postgres admin UI tags to include "DSR Automation". Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub. 2 Skipped Deployments
|
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Guard against None secrets in NamespaceMetaValidationStep. Update test_validate_unsupported_connection_type to use mariadb (postgres is now a supported namespace type). Add Postgres-specific validation tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Postgres defaults to the public schema when neither namespace_meta nor db_schema is configured. Return empty fallback fields so validation doesn't reject existing postgres connections that lack both. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove extra parentheses around bind parameters in expected query strings to match current SQLAlchemy output format. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The dry-run query test passes db=None to TaskResources, which flows into query_config() -> get_namespace_meta(). Return None early when no db session is available. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When a dataset's namespace_meta has no overlapping fields with the connection type's namespace schema (e.g. BigQuery namespace_meta on a Postgres connection), skip validation rather than raising an error. This happens when datasets of mixed types are bulk-linked to a single connection. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add GoogleCloudSQLPostgresNamespaceMeta with database_name (optional) and schema (required), following the same pattern as Snowflake/BigQuery. Update GoogleCloudSQLPostgresQueryConfig with generate_table_name() for schema-qualified SQL, and GoogleCloudSQLPostgresConnector to fetch namespace_meta from DB and pass it to the query config. Also includes shared fixes from PR #7500: - Guard against None db session in get_namespace_meta - Guard against None secrets in namespace validation - Skip namespace validation for mismatched connection types Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add GoogleCloudSQLPostgresNamespaceMeta with database_name (optional) and schema (required), following the same pattern as Snowflake/BigQuery. Update GoogleCloudSQLPostgresQueryConfig with generate_table_name() for schema-qualified SQL, and GoogleCloudSQLPostgresConnector to fetch namespace_meta from DB and pass it to the query config. Also includes shared fixes from PR #7500: - Guard against None db session in get_namespace_meta - Guard against None secrets in namespace validation - Skip namespace validation for mismatched connection types Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
@greptile please review |
adamsachs
left a comment
There was a problem hiding this comment.
thanks for getting this in place! generally, this is looking good.
there are a few stylistic/code concerns that likely don't have much of a functionality impact, but some of them are worth addressing because i think they're potential footguns for the future.
and then there's one meatier product/functionality question around the namespace field naming and its alignment with Helios. the comment should explain that reasonably well, but i think it will be a point for further discussion! we can sync up on that separately.
|
|
||
| connection_type: Literal["postgres"] = "postgres" | ||
| database_name: Optional[str] = None | ||
| schema: str # type: ignore[assignment] |
There was a problem hiding this comment.
what do you think about abstracting the schema here so that we can start trying to standardize more across our datastores? the same fields are used for snowflake and will be used for google cloud SQL based on your other PR, i'd like us to try to start encouraging/enforcing that for consistency.
we've actually already gone astray in this regard on the helios side, which is unfortunate (and will relate to other comments here), and i think we need to start correcting ourselves ASAP. you can see the divergence in naming conventions in this doc...
There was a problem hiding this comment.
whoops, just realized this was ambiguous with schema and keying specifically on that line...what i meant was the pydantic schema generally 😄
There was a problem hiding this comment.
Agreed on standardizing. I'll extract a shared SQLNamespaceMeta base class with database_name and schema fields that Postgres, Snowflake, and Cloud SQL can inherit from. BigQuery stays separate since its fields are different (project_id, dataset_id).
Helios can align its database_id → database_name naming in a follow-up migration.
| Generates SQL valid for Postgres | ||
| """ | ||
|
|
||
| namespace_meta_schema = PostgresNamespaceMeta |
There was a problem hiding this comment.
OK, so i think this is generally on the right track in that we should be standardizing as much as possible between different 'flavors' of postgres integrations (and datastore integrations generally), but i want to call out that it has a few discrepancies with what Helios is generating for RDS postgres datasets, which we ultimately need to resolve (i think mainly on the helios-side!):
- helios uses
database_idhere to indicate the database name, instead ofdatabase_name. i honestly don't know why, but it seems like we're standardizing ondatabase_namegenerally and that's good - let's keep a consistent a standard as possible across datastores - helios doesn't include a
schemaattribute in its metadata. again, i'm not sure why we don't populate aschemaattribute here, we have the schema available to us. - we populate a
database_instance_idwith the RDS database instance ID where this was found.
on the first two points, i think Helios is just wrong, so i don't think that should stop us from doing the right thing here; we just need to ensure we make the proper adjustments on the fides side, and probably migrate over any existing datasets for safety.
on the third point...i'm a bit less sure, this area has always been a bit murky! do we need the RDS DB instance ID for DSR execution? do we assume that a given integration is tied to a specific DB instance? we don't explicitly have that as a field/parameter on the RDS Postgres integration, but is it implicitly defined in the credentials used?
this is probably where 90% of the complexity of this work is. it may require us to do some broader product (re-)thinking. or not! let's break out on this a bit.
There was a problem hiding this comment.
Agreed on all three points. For this PR:
- We're standardizing on
database_name— consistent with Snowflake and the newSQLNamespaceMetabase class. Helios should migratedatabase_id→database_namein a follow-up. - Helios should populate
schema— the base class will make this a required field, so Helios-generated datasets will need to include it. database_instance_idisn't needed for DSR execution — the integration credentials already implicitly target a specific RDS instance. I thinkdatabase_instance_idis a Helios/discovery concern (tracking where a dataset was found) rather than a DSR concern (how to query it). Happy to discuss further if you see it differently.
Want me to create a follow-up ticket for the Helios alignment work?
| f"or the connection must have values for the following fields: {field_descriptions}" | ||
| ) | ||
| if namespace_meta: | ||
| # Skip validation when the namespace_meta is clearly intended for a |
There was a problem hiding this comment.
is this valid? is this a scenario we really expect? if it's only happening in tests, that seems like we should fix up the tests.
maybe i'm missing something, but i feel like this isn't something we really want to have... if it's a purely defensive check for some oddity in our test setup that's now being exposed and we don't want to make a bigger change to our tests, then i'd prefer it sit isolated in a helper function so that we can clearly mark it as a temporary workaround we can clearly remove. honestly that probably feels like the right thing to do here in any case, as this is effectively an 'add-on' to the core validation logic.
let's just say...i'm generally confused by this :)
There was a problem hiding this comment.
oh, i think maybe i see why this is needed - is it because we're not declaring a proper new namespace meta for RDS postgres? if that's the reason why, i think this is a poor workaround. shouldn't we instead just have something like the following?
class RDSPostgresNamespaceMeta(PostgresNamespaceMeta):
connection_type: Literal["rds_postgres"] = "rds_postgres"
There was a problem hiding this comment.
fwiw i wrote all of this before looking at the claude PR review, but claude agrees that the namespace validation skip logic is a 'critical' finding for its fragility:
- Namespace validation skip logic could be fragile
namespace_meta.py:53-65 — The field-overlap heuristic that checks whether namespace_meta fields intersect with the schema's fields:
schema_field_names = {
field_name
for field_name in namespace_meta_class.model_fields
if field_name != "connection_type"
}
if schema_field_names and not schema_field_names.intersection(
namespace_meta.keys()
):
returnThis works today because Postgres (schema, database_name) and BigQuery (project_id, dataset_id) have non-overlapping field names. But if a future connector adds a schema or database_name field, this heuristic will silently break. The connection_type discriminator check
above is more robust.Recommendation: Consider requiring connection_type in namespace_meta going forward and document that the field-overlap fallback is a compatibility shim. A logger.debug when the fallback fires would help with debugging.
but if i'm reading between the lines correctly, i really think we should just be requiring connection_type, unless there's some reasoning here i'm missing!
There was a problem hiding this comment.
You're right — the field-overlap heuristic is fragile and unnecessary if we require connection_type. I'll:
- Add
RDSPostgresNamespaceMeta(PostgresNamespaceMeta)withconnection_type: Literal["rds_postgres"] - Remove the field-overlap fallback and rely solely on
connection_typefor discriminating which validator applies
This also ties into the SQLNamespaceMeta base class from the earlier comment — RDS Postgres will inherit through PostgresNamespaceMeta → SQLNamespaceMeta → NamespaceMeta.
| def get_qualified_table_name(self, node: ExecutionNode) -> str: | ||
| """Get fully qualified Postgres table name for table_exists() checks. | ||
|
|
||
| Returns unquoted names (e.g. schema.table) because the generic | ||
| SQLConnector.table_exists() uses split(".") + inspector.has_table(). | ||
| """ | ||
| query_config = self.query_config(node) | ||
| if not query_config.namespace_meta: | ||
| return node.collection.name | ||
|
|
||
| meta = cast(PostgresNamespaceMeta, query_config.namespace_meta) | ||
| qualified = f"{meta.schema}.{node.collection.name}" | ||
| if meta.database_name: | ||
| return f"{meta.database_name}.{qualified}" | ||
| return qualified |
There was a problem hiding this comment.
this comes from claude, and i think i largely agree:
- Duplicated get_qualified_table_name across both connectors
postgres_connector.py:97-113 and rds_postgres_connector.py:135-149 contain identical implementations of get_qualified_table_name. This is a DRY violation that will diverge over time.
The Snowflake and BigQuery connectors use a simpler pattern — they delegate to query_config.generate_table_name():
# snowflake_connector.py:91-94 def get_qualified_table_name(self, node: ExecutionNode) -> str: query_config = self.query_config(node) return query_config.generate_table_name()However, this won't work here because PostgresQueryConfig.generate_table_name() returns quoted names ("schema"."table") while table_exists() needs unquoted names (schema.table). That's a real design tension.
Recommendation: Either:
- Add an unquoted param to generate_table_name(), or
- Extract the shared logic into a private method on PostgresConnector (and have RDSPostgresConnector inherit from it or call a shared utility), or
- At minimum, add a comment explaining why this can't delegate like Snowflake does.
i think option #1 makes the most sense. ideally we have the 'qualified table name' logic defined in a single place, and the quote handling is done as part of that.
There was a problem hiding this comment.
Agreed — I'll add a quoted parameter to generate_table_name() (defaulting to True to preserve current behavior). Both connectors' get_qualified_table_name() will then just delegate: return self.query_config(node).generate_table_name(quoted=False), eliminating the duplication.
| if db is None: | ||
| return None |
There was a problem hiding this comment.
not a huge deal, but without this, does something break? at a minimum it feels like we should have db typed as Optional[Session] if this is possible; but really, it seems liek we should just be avoiding this call altogether if we don't have a Session to pass when this is being called.
maybe it goes beyond the scope of this PR, but i also don't really get why we're adding this in the PR, unless we've just found some broken functionality!
There was a problem hiding this comment.
This guards against Session.object_session(self.configuration) returning None — which happens in tests when the config isn't bound to a DB session (e.g., PostgresQueryConfig unit tests that pass mock execution nodes without a real DB). Without it, those tests crash on db.scalar().
Agree the typing should reflect this — I'll update the signature to Optional[Session].
There was a problem hiding this comment.
k, got it. i do feel like when app code has specific checks to guard against test-only codepaths it's a bit of a code smell, but i also understand that the problem here may lie deeper in the code organization and it'd take too broad of a refactor to fix, or too much mocking in the tests.
JadeCara
left a comment
There was a problem hiding this comment.
Thanks for the thorough review! Addressing all the inline feedback — the main changes will be:
- New
SQLNamespaceMetabase class with standardizeddatabase_name+schemafields RDSPostgresNamespaceMetasubclass with properconnection_typediscriminator- Remove fragile field-overlap validation heuristic in favor of
connection_typecheck generate_table_name(quoted=False)to eliminate duplicatedget_qualified_table_namelogic- Fix
get_namespace_metatyping toOptional[Session]
- Extract SQLNamespaceMeta base class with shared database_name + schema fields - Add RDSPostgresNamespaceMeta with connection_type discriminator - Refactor Postgres/Snowflake namespace metas to inherit from SQLNamespaceMeta - Add RDSPostgresQueryConfig for proper namespace validation routing - Add quoted param to generate_table_name() to eliminate duplicated get_qualified_table_name() logic across both Postgres connectors - Remove fragile field-overlap heuristic in namespace validation, rely solely on connection_type discriminator - Fix get_namespace_meta typing to Optional[Session] - Add RDS Postgres validation tests Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/fides/service/dataset/validation_steps/namespace_meta.py`:
- Around line 43-48: The current early return when meta_connection_type !=
connection_type (using namespace_meta.get("connection_type") stored in
meta_connection_type) silently skips validation for misspelled or unknown
values; instead, validate meta_connection_type against the set of known/expected
connection types and only silently skip when it is a different but known type
(i.e., meta_connection_type in KNOWN_CONNECTION_TYPES and != connection_type),
otherwise raise a validation exception (or return an error) when
meta_connection_type is present but not recognized so invalid metadata fails
fast.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 327b1bec-9d6d-429c-badc-a7c80159ce62
📒 Files selected for processing (10)
src/fides/api/schemas/namespace_meta/postgres_namespace_meta.pysrc/fides/api/schemas/namespace_meta/rds_postgres_namespace_meta.pysrc/fides/api/schemas/namespace_meta/snowflake_namespace_meta.pysrc/fides/api/schemas/namespace_meta/sql_namespace_meta.pysrc/fides/api/service/connectors/postgres_connector.pysrc/fides/api/service/connectors/query_configs/postgres_query_config.pysrc/fides/api/service/connectors/rds_postgres_connector.pysrc/fides/api/service/connectors/sql_connector.pysrc/fides/service/dataset/validation_steps/namespace_meta.pytests/service/dataset_service/test_namespace_meta_validation.py
🚧 Files skipped from review as they are similar to previous changes (2)
- tests/service/dataset_service/test_namespace_meta_validation.py
- src/fides/api/service/connectors/sql_connector.py
Add GoogleCloudSQLPostgresNamespaceMeta with database_name (optional) and schema (required), following the same pattern as Snowflake/BigQuery. Update GoogleCloudSQLPostgresQueryConfig with generate_table_name() for schema-qualified SQL, and GoogleCloudSQLPostgresConnector to fetch namespace_meta from DB and pass it to the query config. Also includes shared fixes from PR #7500: - Guard against None db session in get_namespace_meta - Guard against None secrets in namespace validation - Skip namespace validation for mismatched connection types Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
tests/service/dataset_service/test_namespace_meta_validation.py (1)
295-326: Consider adding one regression test for namespace metadata missingconnection_type.You now test mismatched typed metadata well; adding a case with no
connection_typewould lock the intended compatibility behavior introduced in the validator and prevent accidental tightening/loosening later.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/service/dataset_service/test_namespace_meta_validation.py` around lines 295 - 326, Add a new regression test that mirrors test_validate_mismatched_namespace_skipped but sets fides_meta.namespace without a connection_type key to ensure validation is skipped rather than enforced; create a FideslangDataset with fides_meta={"namespace": {"project_id": "...", "dataset_id": "..."}} (no connection_type), use a ConnectionConfig with ConnectionType.postgres, build the DatasetValidationContext, instantiate NamespaceMetaValidationStep, and assert that validator.validate(context) does not raise—this will lock in the intended compatibility behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/fides/api/schemas/namespace_meta/rds_postgres_namespace_meta.py`:
- Around line 10-12: Update the module docstring in
src/fides/api/schemas/namespace_meta/rds_postgres_namespace_meta.py to remove
the internal system reference "fidesplus RDSPostgresMonitor.get_schemas()" and
instead use generic wording (e.g., "RDS monitor" or "monitoring service") and
likewise replace "RDS connector's create_client()" with a neutral phrase like
"RDS connector client creation" so the docstring describes expected identifiers
generically without referencing internal system names; keep the rest of the
explanatory text intact.
In `@src/fides/api/service/connectors/query_configs/postgres_query_config.py`:
- Around line 19-42: The generate_table_name method (and the analogous code at
lines 72-87) currently wraps collection_name, PostgresNamespaceMeta.schema, and
PostgresNamespaceMeta.database_name in double quotes but does not escape any
embedded double quotes inside those identifiers; update generate_table_name (and
the other identifier-composition function) to first escape any internal
double-quote characters by replacing " with "" for each identifier component,
then apply the quoted wrapper when quoted=True; preserve the unquoted behavior
when quoted=False so inspector/table_exists checks still receive the raw
identifier.
---
Nitpick comments:
In `@tests/service/dataset_service/test_namespace_meta_validation.py`:
- Around line 295-326: Add a new regression test that mirrors
test_validate_mismatched_namespace_skipped but sets fides_meta.namespace without
a connection_type key to ensure validation is skipped rather than enforced;
create a FideslangDataset with fides_meta={"namespace": {"project_id": "...",
"dataset_id": "..."}} (no connection_type), use a ConnectionConfig with
ConnectionType.postgres, build the DatasetValidationContext, instantiate
NamespaceMetaValidationStep, and assert that validator.validate(context) does
not raise—this will lock in the intended compatibility behavior.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 7d069e59-9a26-47da-b1f9-e6f69996ebcd
📒 Files selected for processing (5)
src/fides/api/schemas/namespace_meta/__init__.pysrc/fides/api/schemas/namespace_meta/rds_postgres_namespace_meta.pysrc/fides/api/service/connectors/query_configs/postgres_query_config.pysrc/fides/service/dataset/validation_steps/namespace_meta.pytests/service/dataset_service/test_namespace_meta_validation.py
src/fides/api/schemas/namespace_meta/rds_postgres_namespace_meta.py
Outdated
Show resolved
Hide resolved
Legacy datasets may have namespace_meta without a connection_type discriminator field. This adds a field-overlap heuristic: if namespace fields have zero overlap with the expected schema, validation is skipped (the namespace belongs to a different connection type). If fields do overlap, connection_type is backfilled and validation runs strictly so malformed namespaces are still caught. Also adds connection_type: bigquery to the enterprise test dataset YAML and 5 new unit tests covering the backward compat scenarios. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
adamsachs
left a comment
There was a problem hiding this comment.
OK, this is getting close - it's great that this is working with a proper RDS integration and i think we're basically using the 'right' mechanisms.
still just a few points that i think could use clean-up, let me know what you think!
| from fides.api.schemas.namespace_meta.namespace_meta import NamespaceMeta | ||
|
|
||
|
|
||
| class SQLNamespaceMeta(NamespaceMeta): |
|
|
||
| connection_type: Literal["rds_postgres"] = "rds_postgres" | ||
| database_instance_id: str | ||
| database_id: str |
There was a problem hiding this comment.
so i imagine we reverted this back to database_id so it'd work well with the existing code in rds_postgres_connector.py...but i do think we should be establishing our more consistent standard here. can we just update the code there to look for database_name on the meta? i don't think that's going to break anything downstream, either here or on the monitor side in fidesplus, i really think that was just stubbed out for future use.
we do still need to update the namespace meta populated by helios monitors if we make this change, but i do think that's a relatively simple adjustment that we should go ahead and make. if we do that, then we should be able to inherit from the new SQLNamespaceMeta here, right? that'll help maintain standardization :)
note: i also wonder while we're here whether renaming the namespace field here to database_instance_name instead of database_instance_id is wise. that just seems more accurate, and if we don't change it now, we're never gonna get to it 😅
There was a problem hiding this comment.
Done — RDSPostgresNamespaceMeta now inherits from SQLNamespaceMeta with standardized fields:
database_name(inherited, replacesdatabase_id)database_instance_name(replacesdatabase_instance_id)schema(inherited, optional)
Updated the connector and tests to match. The fidesplus monitor (rds_postgres_monitor.py) will need a matching update to emit the new field names.
Note: if any RDS Postgres datasets have already been created by Helios, we'll need a data migration to rename the JSON keys in ctl_datasets.fides_meta. Suggest we handle that in a separate PR to keep this one focused.
| from fides.api.schemas.namespace_meta.sql_namespace_meta import SQLNamespaceMeta | ||
|
|
||
|
|
||
| class PostgresNamespaceMeta(SQLNamespaceMeta): |
| """ | ||
| Prepends the schema to the base table name if RDS namespace meta | ||
| is provided. Unlike Postgres, RDS doesn't need database_name | ||
| qualification because the connection targets a specific database. |
There was a problem hiding this comment.
OK this is clear - i see why this is branched off now again. i'll throw this out there as a suggestion, but it may be overkill: we could continue to keep the logic centralized in a single method on the base classPostgresQueryConfig.generate_table_name by having that take an optional arg for whether to include the database name or not.
but i'm ambivalent...that may not even be clearer.
There was a problem hiding this comment.
Agreed it could go either way. I'll keep the current approach with the separate override — it's explicit about why RDS skips database qualification (the engine URL already targets the database), which I think is clearer than a boolean flag on the base class.
| if db is None: | ||
| return None |
There was a problem hiding this comment.
k, got it. i do feel like when app code has specific checks to guard against test-only codepaths it's a bit of a code smell, but i also understand that the problem here may lie deeper in the code organization and it'd take too broad of a refactor to fix, or too much mocking in the tests.
| @@ -15,10 +15,9 @@ class SnowflakeNamespaceMeta(NamespaceMeta): | |||
|
|
|||
| connection_type: Literal["snowflake"] = "snowflake" | |||
| database_name: str | |||
There was a problem hiding this comment.
we should be able to remove this too, right? shouldn't it inherit it from SQLNamespaceMeta?
There was a problem hiding this comment.
SQLNamespaceMeta declares database_name as Optional[str] = None since Postgres doesn't always need it. Snowflake requires it, so the override here tightens the type from optional to required. Added a comment to make this clearer.
| # Skip validation when the namespace_meta connection_type doesn't | ||
| # match this connection. This happens when datasets of mixed types | ||
| # (e.g. BigQuery, Snowflake) are linked to a single connection. |
There was a problem hiding this comment.
is this a valid scenario? i don't really think this can happen...
There was a problem hiding this comment.
You're right — this can't happen in normal app runtime. Namespace metadata is set by the type-specific Helios monitor, and datasets are always linked to a single connection via the scoped endpoint (PATCH /connection/{key}/dataset). There's no mechanism to re-link a dataset across connection types.
Removed the mismatch check and the legacy field-overlap/backfill block. Also removed the 5 tests that covered those code paths.
| if not provided_fields & expected_fields: | ||
| return | ||
| # Fields overlap — backfill connection_type so it's persisted | ||
| namespace_meta["connection_type"] = connection_type |
There was a problem hiding this comment.
i thought we'd be able to get rid of this block if we ensure that all of our namespace meta's have connection_type's, which they now seem to (🙏)?
maybe there's some other reason this is needed? but it just doesn't seem like a valid code path/scenario for normal app runtime. if it's just to make a test pass, i'd recommend we find an alternative, this just feels like too much fragile special-casing to have in our app code. the whole point of the connection_type mechanism is so that we don't have to do this!
i.e. are there really 'legacy' namespace_meta's lurking around...?
There was a problem hiding this comment.
Agreed — removed this entire block in the same change. All namespace metas now have connection_type via the __init_subclass__ registration mechanism, so this heuristic is no longer needed. The alembic migration (4ebe0766021b) already backfilled connection_type for existing datasets.
|
Thanks for the second pass! All points addressed:
|
- RDSPostgresNamespaceMeta: inherit from SQLNamespaceMeta, rename database_id → database_name and database_instance_id → database_instance_name - Remove mismatch check and legacy field-overlap/backfill block from namespace validation (can't happen in normal app runtime) - Add clarifying comment on SnowflakeNamespaceMeta.database_name override - Update connector and tests to use new field names Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Created PRs to start the id -> name migration as well #7589 and https://github.com/ethyca/fidesplus/pull/3203 |
The NamespaceMetaValidationStep was validating dataset namespace metadata against the connection config's type, causing failures when datasets with BigQuery namespace metadata were submitted through a Postgres connection (e.g. in the bulk create test). Now skips validation when the namespace's connection_type doesn't match the connection config's type. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Ticket ENG-2635
Description Of Changes
Add namespace support for Postgres connectors, enabling a single connection to handle multiple schemas for DSR execution. This follows the established Snowflake/BigQuery pattern exactly.
In cases where 200-300 datasets representing different Postgres schemas. Previously, each schema required a separate integration for DSRs. With namespace support, a single Postgres connection can execute DSRs across all schemas by using fully-qualified table names (
"schema"."table"or"database"."schema"."table").This also removes the stubbed
retrieve_data()/mask_data()fromRDSPostgresConnector, unblocking DSR execution for RDS Postgres connections.PR 1 of 3 — Core Postgres + RDS Postgres namespace support. Google Cloud SQL Postgres (PR 2) and
set_schema()reconciliation (PR 3) follow separately.Code Changes
src/fides/api/schemas/namespace_meta/postgres_namespace_meta.py— NewPostgresNamespaceMetaschema withdatabase_name(optional) andschema(required), auto-registered viaNamespaceMeta.__init_subclass__. Returns empty fallback fields for backward compat (Postgres defaults to public schema).src/fides/api/service/connectors/query_configs/postgres_query_config.py— Addednamespace_meta_schema,generate_table_name()for schema-qualified SQL, updatedget_formatted_query_string()andget_update_stmt()to use itsrc/fides/api/service/connectors/postgres_connector.py—query_config()now fetches namespace_meta from DB, addedget_qualified_table_name()for unquoted names in error handlingsrc/fides/api/service/connectors/rds_postgres_connector.py— Removed stubbedretrieve_data()/mask_data()(inheritsSQLConnector's implementations),query_config()passes namespace_meta, addedget_qualified_table_name()src/fides/api/service/connectors/sql_connector.py— AddedNoneguard inget_namespace_meta()for dry-run contexts where no DB session is availablesrc/fides/service/dataset/validation_steps/namespace_meta.py— AddedNoneguard for connection secrets; skip namespace validation when namespace_meta fields don't overlap with the connection type's schema (cross-type dataset linking)clients/admin-ui/src/features/integrations/integration-type-info/rdsPostgresInfo.tsx— Added "DSR Automation" to RDS Postgres tagstests/ops/service/connectors/test_postgres_query_config.py— New tests mirroring Snowflake test structure: parametrized namespace query generation, invalid meta validation, namespaced update statementstests/service/dataset_service/test_namespace_meta_validation.py— Updated for Postgres namespace support: 5 new Postgres-specific tests, cross-type namespace skip test, updated unsupported type testSteps to Confirm
1. Run namespace query config tests
Expected: All 9 tests pass (5 parametrized query generation, 1 invalid meta, 3 update statements).
2. Run namespace validation tests
Expected: All 15 tests pass, including 5 Postgres-specific and 1 cross-type namespace skip test.
3. Verify backward compatibility (no namespace = no breakage)
Expected: Existing Postgres query config tests pass unchanged — datasets without
namespace_metaproduce unqualified table names as before.4. Verify dataset creation with namespace_meta via API
Create a Postgres connection, then create a dataset with
namespace_meta, and link it:Expected: All return 200/201. The
datasetconfigresponse shows"succeeded": [...]with 1 entry.5. Verify namespace_meta is persisted and returned
Expected: The response includes
"fides_meta": {"namespace": {"schema": "billing"}}in thectl_datasetof the linked dataset config.6. Verify backward compat via API (no namespace_meta required for Postgres)
Expected: Both return 200/201 — no validation error. Postgres connections do NOT require namespace_meta (backward compatible, defaults to public schema).
7. Check RDS Postgres Admin UI tag
Navigate to the Integrations page in the Admin UI and find "RDS Postgres".
Expected: RDS Postgres shows "DSR Automation", "Discovery", and "Detection" tags.
Pre-Merge Checklist
CHANGELOG.mdupdatedmaindowngrade()migration is correct and works🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Improvements
Removed
Tests