Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 10 additions & 5 deletions transfers/minor_trace_chemistry_transfer.py
Original file line number Diff line number Diff line change
Expand Up @@ -114,7 +114,8 @@ def _transfer_hook(self, session: Session) -> None:
"""
Override transfer hook to use batch upsert for idempotent transfers.

Uses ON CONFLICT DO UPDATE on nma_GlobalID (the legacy UUID PK, now UNIQUE).
Uses ON CONFLICT DO UPDATE on (chemistry_sample_info_id, analyte),
matching uq_minor_trace_chemistry_sample_analyte.
"""
df = self.cleaned_df

Expand All @@ -129,8 +130,12 @@ def _transfer_hook(self, session: Session) -> None:
logger.warning("No valid rows to transfer")
return

# Dedupe by nma_GlobalID to avoid PK conflicts.
rows = self._dedupe_rows(row_dicts)
# Dedupe by the same logical key used by the table unique constraint.
rows = self._dedupe_rows(
row_dicts,
key=["chemistry_sample_info_id", "analyte"],
include_missing=True,
)
Comment on lines +133 to +138
Copy link

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

analyte is nullable on NMA_MinorTraceChemistry, but the new dedupe/upsert key includes it and include_missing=True keeps rows where analyte is NULL. In Postgres, ON CONFLICT (chemistry_sample_info_id, analyte) will not match existing rows when analyte is NULL, so re-running the transfer will keep inserting additional NULL-analyte rows (non-idempotent growth). Consider treating missing analyte as an error/skip (capture it in _row_to_dict), or use a different conflict key/fallback path (e.g., upsert on nma_GlobalID when analyte is missing), or enforce analyte as NOT NULL in the schema if that’s valid for the dataset.

Copilot uses AI. Check for mistakes.
logger.info(f"Upserting {len(rows)} MinorTraceChemistry records")

insert_stmt = insert(NMA_MinorTraceChemistry)
Expand All @@ -139,9 +144,9 @@ def _transfer_hook(self, session: Session) -> None:
for i in range(0, len(rows), self.batch_size):
chunk = rows[i : i + self.batch_size]
logger.info(f"Upserting batch {i}-{i+len(chunk)-1} ({len(chunk)} rows)")
# Upsert on nma_GlobalID (legacy UUID PK, now UNIQUE)
# Upsert on unique logical key (chemistry_sample_info_id, analyte)
stmt = insert_stmt.values(chunk).on_conflict_do_update(
index_elements=["nma_GlobalID"],
index_elements=["chemistry_sample_info_id", "analyte"],
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Handle nma_GlobalID conflicts in this upsert

Switching ON CONFLICT to ("chemistry_sample_info_id", "analyte") leaves collisions on the still-unique nma_GlobalID unhandled, so reruns can fail with duplicate key instead of updating rows. This is reachable because _row_to_dict always supplies nma_GlobalID but allows analyte to be None (which will not match the composite unique constraint), and the model still declares nma_GlobalID as unique (db/nma_legacy.py), so idempotent transfers break when a record has null analyte (or when analyte changes for an existing GlobalID).

Useful? React with 👍 / 👎.

Comment on lines +147 to +149
Copy link

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the conflict target is now (chemistry_sample_info_id, analyte), any insert row that violates another unique constraint (notably the nma_GlobalID unique constraint) will raise an error rather than being handled by the upsert. If the source extract can contain duplicate GlobalID values (even transiently), this will make the transfer fail. Consider either pre-deduping on nma_GlobalID as well, or adding a defensive validation that nma_GlobalID is unique within row_dicts before executing the batches.

Copilot uses AI. Check for mistakes.
Comment on lines 144 to +149
Copy link

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change modifies the idempotency key for upserts, but there’s no test coverage asserting that re-running the transfer updates existing (chemistry_sample_info_id, analyte) rows rather than inserting duplicates (especially around NULL analytes). Consider adding a unit/integration test that runs _transfer_hook twice against a test DB and asserts row counts remain stable and values update as expected.

Copilot uses AI. Check for mistakes.
set_={
"chemistry_sample_info_id": excluded.chemistry_sample_info_id,
"nma_chemistry_sample_info_uuid": excluded.nma_chemistry_sample_info_uuid,
Expand Down
Loading