Skip to content

transfer-fix#501

Merged
jirhiker merged 1 commit intostagingfrom
transfer-fix
Feb 13, 2026
Merged

transfer-fix#501
jirhiker merged 1 commit intostagingfrom
transfer-fix

Conversation

@jirhiker
Copy link
Copy Markdown
Member

Why

This PR addresses the following problem / context:

  • Use bullet points here

How

Implementation summary - the following was changed / added / removed:

  • Use bullet points here

Notes

Any special considerations, workarounds, or follow-up work to note?

  • Use bullet points here

Copilot AI review requested due to automatic review settings February 13, 2026 06:05
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: be78f15ba7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

# Upsert on unique logical key (chemistry_sample_info_id, analyte)
stmt = insert_stmt.values(chunk).on_conflict_do_update(
index_elements=["nma_GlobalID"],
index_elements=["chemistry_sample_info_id", "analyte"],
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Handle nma_GlobalID conflicts in this upsert

Switching ON CONFLICT to ("chemistry_sample_info_id", "analyte") leaves collisions on the still-unique nma_GlobalID unhandled, so reruns can fail with duplicate key instead of updating rows. This is reachable because _row_to_dict always supplies nma_GlobalID but allows analyte to be None (which will not match the composite unique constraint), and the model still declares nma_GlobalID as unique (db/nma_legacy.py), so idempotent transfers break when a record has null analyte (or when analyte changes for an existing GlobalID).

Useful? React with 👍 / 👎.

@jirhiker jirhiker merged commit ee62cea into staging Feb 13, 2026
11 checks passed
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the Minor/Trace Chemistry transfer to align its deduplication and Postgres upsert conflict target with the table’s logical uniqueness constraint (chemistry_sample_info_id, analyte) rather than the legacy nma_GlobalID.

Changes:

  • Switches _transfer_hook upsert conflict target from nma_GlobalID to (chemistry_sample_info_id, analyte).
  • Updates row deduplication to use the same logical key as the table’s unique constraint.
  • Adjusts inline documentation/comments to reflect the new upsert key.

Comment on lines +133 to +138
# Dedupe by the same logical key used by the table unique constraint.
rows = self._dedupe_rows(
row_dicts,
key=["chemistry_sample_info_id", "analyte"],
include_missing=True,
)
Copy link

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

analyte is nullable on NMA_MinorTraceChemistry, but the new dedupe/upsert key includes it and include_missing=True keeps rows where analyte is NULL. In Postgres, ON CONFLICT (chemistry_sample_info_id, analyte) will not match existing rows when analyte is NULL, so re-running the transfer will keep inserting additional NULL-analyte rows (non-idempotent growth). Consider treating missing analyte as an error/skip (capture it in _row_to_dict), or use a different conflict key/fallback path (e.g., upsert on nma_GlobalID when analyte is missing), or enforce analyte as NOT NULL in the schema if that’s valid for the dataset.

Copilot uses AI. Check for mistakes.
Comment on lines +147 to +149
# Upsert on unique logical key (chemistry_sample_info_id, analyte)
stmt = insert_stmt.values(chunk).on_conflict_do_update(
index_elements=["nma_GlobalID"],
index_elements=["chemistry_sample_info_id", "analyte"],
Copy link

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the conflict target is now (chemistry_sample_info_id, analyte), any insert row that violates another unique constraint (notably the nma_GlobalID unique constraint) will raise an error rather than being handled by the upsert. If the source extract can contain duplicate GlobalID values (even transiently), this will make the transfer fail. Consider either pre-deduping on nma_GlobalID as well, or adding a defensive validation that nma_GlobalID is unique within row_dicts before executing the batches.

Copilot uses AI. Check for mistakes.
Comment on lines 144 to +149
for i in range(0, len(rows), self.batch_size):
chunk = rows[i : i + self.batch_size]
logger.info(f"Upserting batch {i}-{i+len(chunk)-1} ({len(chunk)} rows)")
# Upsert on nma_GlobalID (legacy UUID PK, now UNIQUE)
# Upsert on unique logical key (chemistry_sample_info_id, analyte)
stmt = insert_stmt.values(chunk).on_conflict_do_update(
index_elements=["nma_GlobalID"],
index_elements=["chemistry_sample_info_id", "analyte"],
Copy link

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change modifies the idempotency key for upserts, but there’s no test coverage asserting that re-running the transfer updates existing (chemistry_sample_info_id, analyte) rows rather than inserting duplicates (especially around NULL analytes). Consider adding a unit/integration test that runs _transfer_hook twice against a test DB and asserts row counts remain stable and values update as expected.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants