Skip to content

feat: MinorTraceChemistry backfill job (#600)#640

Open
kbighorse wants to merge 27 commits intostagingfrom
600-minor-trace-chemistry-backfill
Open

feat: MinorTraceChemistry backfill job (#600)#640
kbighorse wants to merge 27 commits intostagingfrom
600-minor-trace-chemistry-backfill

Conversation

@kbighorse
Copy link
Copy Markdown
Contributor

Summary

  • Implement backfill job to migrate legacy NMA_MinorTraceChemistry records into the Ocotillo schema (Observation, Sample, Parameter, Notes tables)
  • Add BDD step definitions and feature scenarios for the backfill
  • Merge staging to pick up test database connection fix and latest changes
  • Merge alembic migration heads

Closes #600

Test plan

🤖 Generated with Claude Code

kbighorse and others added 21 commits February 27, 2026 19:37
Add chemistry backfill that maps legacy NMA_Radionuclides rows into
Observation records keyed on nma_pk_chemistryresults (idempotent upsert).

- Schema: add chemistry columns to Observation and Sample models/schemas
- Migration: add columns + unique constraints, drop stale NMA_Radionuclides.thing_id
- Lexicon: add pCi/L unit and Chemistry Observation note type
- Backfill: resolve Samples via nma_pk_chemistrysample, create Parameters,
  AnalysisMethods, Notes; detect_flag from legacy Symbol qualifier
- Tests: BDD step definitions for all 7 scenarios, feature file fixes
- Orchestrator: wire radionuclides step into backfill pipeline

Refs #558

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Wrap cleanup chain in try/except to prevent orphan data on failure (#566)
- Scope AnalysisMethod cleanup to test-created IDs only (#567)

Also addresses #565 (scalar_one hardening) and #568 (remove unused
parameter_ids tracker) which were applied to the new files in the
preceding feature commit.

Refs #558

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add per-row savepoint isolation so one bad row doesn't abort the run
- Skip rows with missing AnalysisDate instead of inserting epoch sentinel
- Remove unused batch_size parameter from backfill signatures and CLI
- Fix migration downgrade to backfill thing_id before setting NOT NULL
- Track analysis_method_ids in test step defs for scoped cleanup

Refs #558

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…leanup

- Migration upgrade now checks information_schema before dropping thing_id
  column, guarding against environments where it was already removed
- run_backfill.sh no longer references removed --batch-size parameter
- Test observation cleanup scoped to scenario sample_ids instead of
  blanket nma_pk_chemistryresults IS NOT NULL query

Refs #558, #570

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… tracking

- Migration upgrade dynamically discovers FK names via sa.inspect()
  with guard for missing name entries
- Per-row exception handler now logs full traceback via logger.exception()
  before appending summary to result.errors
- Analysis method tracking scoped to scenario sample_ids
- Removed incorrect issue reference from test comment

Refs #558

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ow catch

- Volume/volume_unit test assertions query by scenario sample_ids
- existing_keys set lowercases DB values to match global_id_str normalization
- Per-row catch broadened to Exception with logger.exception() for tracebacks
- Analysis method tracking scoped to scenario samples
- Migration FK drop guarded against missing name entries
- Resolve stash merge conflicts in sample volume steps

Refs #558

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Snapshot AnalysisMethod IDs before/after backfill to track only
actually-created methods, preventing deletion of pre-existing rows.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Delete context._backfill_created in finally block to prevent
ID accumulation across scenarios causing redundant deletes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add ORDER BY id to legacy query so lowest-PK row wins when multiple
  radionuclide rows map to the same Sample (volume/volume_unit)
- Skip overwrites with warning when a conflict is detected
- Reject unknown CLI args in backfill entrypoint instead of silently
  ignoring them

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Skip pg_cron extension creation gracefully when unavailable instead
of hard-failing, unblocking local dev and test environments.

Refs #576

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Skip pg_cron extension creation gracefully when unavailable instead
of hard-failing, unblocking local dev and test environments.

Also skip search vector trigger sync for tables that don't exist
yet (e.g. asset) to avoid NoSuchTableError during test setup.

Refs #576

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Resolves test database connection issue (conftest load_dotenv override)
and merges alembic migration heads.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add backfill logic and test scenarios for migrating legacy
NMA_MinorTraceChemistry records into the Ocotillo schema.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 3, 2026 17:27
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an idempotent backfill path for migrating legacy NMA chemistry results (Radionuclides + MinorTraceChemistry) into the Ocotillo Observation/Sample schema, along with BDD coverage and the supporting schema/lexicon/migrations.

Changes:

  • Implement chemistry backfill jobs that upsert Observation records keyed by nma_pk_chemistryresults and populate Sample chemistry audit/volume fields.
  • Add Behave step definitions + feature updates to validate idempotency, field mapping, orphan skipping, notes creation, and linkage integrity.
  • Extend DB models/schemas/lexicon and add Alembic migrations (including a migration-head merge).

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
transfers/backfill/chemistry_backfill.py New backfill implementation for Radionuclides + MinorTraceChemistry into Observation/Sample, including upsert + notes creation.
tests/features/steps/chemistry-backfill.py New reusable Behave steps to create legacy fixtures, run backfills, and assert mappings/linkage.
tests/features/nma-chemistry-radionuclides-refactor.feature Scenario tweaks to ensure required fixtures/fields exist for radionuclides scenarios.
tests/features/nma-chemistry-minortracechemistry-refactor.feature Scenario tweaks to ensure required fixtures/fields exist for minor/trace chemistry scenarios.
tests/features/environment.py Adds per-scenario cleanup for chemistry-backfill-created data to support running without DB rebuild.
db/sample.py Adds nma_pk_chemistrysample, volume, volume_unit to Sample.
db/observation.py Adds nma_pk_chemistryresults, detect_flag, uncertainty, analysis_agency to Observation.
schemas/sample.py Exposes new sample chemistry/audit fields on the API response schema.
schemas/observation.py Exposes new observation chemistry/audit fields on the API response schema.
core/lexicon.json Adds lexicon terms for pCi/L and note type Chemistry Observation.
alembic/versions/545a5b77e5e8_add_chemistry_backfill_columns_to_.py Adds new columns/constraints + updates observation_version; drops stale legacy column.
alembic/versions/1f1bdf75e0c0_merge_migration_heads_after_staging_.py Merge migration heads.
run_backfill.sh Adjusts invocation of the backfill runner script.

…nalysisDate

- Add default_unit param to _get_or_create_parameter so minor trace
  uses ug/L instead of inheriting pCi/L from radionuclides
- Log warning when legacy row has NULL AnalysisDate

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 3, 2026 17:40
kbighorse and others added 2 commits April 3, 2026 10:40
- Fix Mapped types to use `| None` for nullable columns (observation, sample)
- Skip rows with NULL AnalysisDate (observation_datetime is NOT NULL)
- Default unit to ug/L when legacy row has no Units (unit is NOT NULL)
- Restore $@ forwarding in run_backfill.sh
- Verify well still exists in DB before reusing in BDD _ensure_well()

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
All 8 radionuclides scenarios were missing AnalysisDate, which the
backfill requires. Now 16/16 scenarios pass across both chemistry
feature files.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 6 comments.

Copilot AI review requested due to automatic review settings April 3, 2026 17:53
@kbighorse kbighorse review requested due to automatic review settings April 3, 2026 17:53
kbighorse and others added 2 commits April 3, 2026 10:54
- Fix date→datetime promotion: use isinstance checks instead of
  hasattr(tzinfo) which doesn't work for plain date objects
- Use param.default_unit for unit fallback instead of hardcoded ug/L
- Parse BDD AnalysisDate as date (not datetime) to match DB column type
- Guard after_scenario cleanup to skip DB session when no fixtures created

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 3, 2026 17:57
…NULL AnalysisDate

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 2 comments.

with session_ctx() as session:
post_ids = {row[0] for row in session.execute(select(AnalysisMethod.id)).all()}
new_ids = post_ids - pre_ids
context._backfill_created["analysis_method_ids"] = list(new_ids)
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_track_created_analysis_methods overwrites context._backfill_created["analysis_method_ids"] each time a backfill is run. In scenarios that run the job twice (idempotency), the second run will typically create zero new AnalysisMethods and this assignment will clear the IDs from the first run, so after_scenario won’t delete the AnalysisMethod rows created earlier in the scenario. Consider appending/unioning into the existing list instead of replacing it (e.g., extend with new_ids while keeping prior values).

Suggested change
context._backfill_created["analysis_method_ids"] = list(new_ids)
existing_ids = set(context._backfill_created["analysis_method_ids"])
context._backfill_created["analysis_method_ids"] = list(existing_ids | new_ids)

Copilot uses AI. Check for mistakes.
{
"categories": [
"unit"
],
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ug/L is used as the default unit for MinorTraceChemistry Parameters/Observations, but it is not present in core/lexicon.json (and Parameter.default_unit / Observation.unit are lexicon-backed). This will cause FK/validation failures when running the backfill against a freshly initialized database (outside the BDD test seeding, which inserts ug/L manually). Add a ug/L unit term to the lexicon (or switch to an existing unit term already seeded).

Suggested change
],
],
"term": "ug/L",
"definition": "Micrograms per Liter"
},
{
"categories": [
"unit"
],

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Refactor legacy MinorTraceChemistry into the Ocotillo schema via backfill job

3 participants