Skip to content

kas-water-level-import#621

Merged
ksmuczynski merged 21 commits intostagingfrom
kas-water-level-import
Mar 31, 2026
Merged

kas-water-level-import#621
ksmuczynski merged 21 commits intostagingfrom
kas-water-level-import

Conversation

@ksmuczynski
Copy link
Copy Markdown
Contributor

@ksmuczynski ksmuczynski commented Mar 28, 2026

Summary

This updates the standalone groundwater-level CSV importer to support the new source CSV shape while keeping the current staging-compatible API and CLI entry points.

The work was implemented in stages to keep behavior changes isolated and reviewable:

  • schema and normalization
  • validation and measuring-point resolution
  • structured groundwater persistence and idempotency
  • best-effort row savepoints
  • API and CLI partial-success reporting
  • real-file cleanup and stabilization
  • BDD coverage alignment

Why

This PR addresses the following problem / context:

  • The standalone water-level importer needed to support the current source CSV shape and groundwater-specific persistence rules.
  • The original import flow did not handle partial-success behavior cleanly across service, API, CLI, and BDD coverage.
  • Real-file validation exposed production-relevant issues that were not covered by earlier test data, including a float vs Decimal crash and a nested savepoint warning.
  • Existing BDD coverage had drifted from current importer behavior and vocabulary expectations.

How

Implementation summary - the following was changed / added / removed:

Schema And Validation

  • Added a dedicated standalone water-level CSV schema with header alias support and datetime normalization.
  • Updated validation to enforce current groundwater rules, including data_quality enum validation and measuring_person matching supplied field staff.
  • Resolved measuring point height from CSV mp_height when provided, otherwise from existing well measuring-point history.

Persistence And Import Behavior

  • Aligned persistence with groundwater semantics:
    • sample_matrix="groundwater"
    • groundwater parameter lookup by matrix
    • mapped CSV level_status into Observation.groundwater_level_reason
    • persisted groundwater-specific observation fields, including groundwater_level_reason and nma_data_quality
  • Made imports idempotent with deterministic sample naming.
  • Switched row persistence to best-effort per-row savepoints so valid rows import even when other rows fail.
  • Fixed issues found during real-file validation:
    • coerced DB-backed numeric values to avoid float vs Decimal failures
    • replaced nested savepoint context-manager flow to eliminate the SQLAlchemy warning

API And CLI Reporting

  • Updated API and CLI behavior so partial-success imports are reported as completed with issues rather than total failure.

BDD Coverage

  • Updated BDD coverage to reflect current importer behavior, canonical headers, current vocabularies, alias-header support, and measuring_person validation.
  • Fixed shared CSV-step compatibility by populating file_content in the water-level test context.

Related Fix

  • Fixed well transfer seeding by excluding computed monitoring_status from Thing creation.

Notes

Any special considerations, workarounds, or follow-up work to note?

  • Real-file validation was performed with /Users/kelsey/Downloads/water_level_template_existing_well_20260327.csv.
  • Dev validation used a seeded subset of wells, and the real import correctly produced mixed results with 5 valid imported rows and many expected unknown-well failures.
  • Rerunning the same real file was confirmed to be idempotent.
  • The updated water-level BDD feature now passes with behave tests/features/water-level-csv.feature --tags=@backend.

Introduce a dedicated schema for the standalone water-level importer and move header/value normalization into the schema layer.

Changes:
- add WaterLevelCsvRow to schemas/water_level_csv.py
- support source CSV headers for field_staff_2, field_staff_3, water_level_date_time, and measuring_person
- preserve compatibility aliases for measurement_date_time, sampler,and mp_height_ft
- normalize blank optional values to None
- normalize naive water-level datetimes from America/Denver to UTC
- canonicalize sample method aliases such as electric tape and steel tape
- ignore source-only columns hold(not saved) and cut(not saved)
- wire services/water_level_csv.py to use the shared schema
- add focused schema tests for aliasing and datetime normalization

Impact:
- the standalone importer now understands the new source CSV shape without changing persistence behavior yet
- datetime handling is explicit and consistent instead of relying on mixed input formats downstream
- later stages can add validation and persistence rules on top of a stable, reusable normalization layer
Tighten standalone water-level validation and add well-backed measuring point height resolution ahead of the persistence refactor.

Changes:
- enforce measuring_person, datetime ordering, and conditional level_status validation in WaterLevelCsvRow
- reject negative depth_to_water_ft values while allowing negative mp_height values for surveyed reference points below land surface
- validate data_quality against the DataQuality enum and normalize shorthand level_status values to canonical groundwater terms
- resolve measuring_point_height from CSV input or the well's current MeasuringPointHistory-derived value
- reject rows where depth_to_water_ft implies a water level below the bottom of the well
- update schema, service, and API tests for the new rules
Switch the standalone importer from placeholder inserts to deterministic, structured groundwater persistence.

Changes:
- replace random UUID sample names with deterministic well-inventory-style sample names based on well name and measurement timestamp
- look up existing standalone groundwater-level samples by deterministic sample name and update them on rerun instead of inserting duplicates
- persist samples with sample_matrix="groundwater"
- resolve the groundwater-level parameter by name and groundwater matrix, creating it when missing
- write structured groundwater observation fields instead of relying on note strings
- persist resolved_mp_height to Observation.measuring_point_height
- persist level_status to Observation.groundwater_level_reason
- persist data_quality to Observation.nma_data_quality
- persist water_level_notes directly to sample and observation notes
- add tests for deterministic sample naming, idempotent reruns, and API-level structured groundwater persistence

Impact:
- rerunning the same standalone water-level CSV is now idempotent
- imported groundwater observations are stored with the intended groundwater semantics instead of the earlier placeholder behavior
- downstream queries can rely on structured observation fields rather than extracting meaning from generated note text
Change the standalone water-level importer from all-or-nothing writes to best-effort row imports using per-row savepoints.

Changes:
- stop skipping all persistence when some rows have validation errors
- wrap each standalone water-level row import in a nested transaction so one bad row does not roll back successful rows
- continue processing remaining rows after row-level persistence errors
- record row-specific persistence errors in the existing validation_errors output
- report total_rows_imported as the number of successfully written rows even when the file contains failures
- add mixed-row service coverage proving a valid row still imports when another row in the same file fails

Impact:
- standalone water-level CSV imports are now operationally resilient in the same way as well inventory imports
- users can import good rows from partially bad files without manually splitting the source CSV first
- later stages can refine API and CLI reporting without changing the core best-effort import behavior
Update the standalone water-level API and CLI contract to reflect the best-effort row import behavior.

Changes:
- treat mixed-result standalone water-level imports as successful when at least one row imports
- keep total-failure imports as non-zero/400 outcomes when no rows are written
- preserve row-level validation and persistence errors in the existing validation_errors payload
- update the CLI status banner to distinguish clean success from completed-with-issues using summary data instead of exit code alone
- add API coverage for partial-success bulk uploads
- add CLI coverage for completed-with-issues reporting
- update end-to-end CLI water-level expectations to current importer semantics from earlier stages

Impact:
- API callers no longer see a hard failure when valid rows were imported successfully from a mixed file
- CLI users get more accurate status reporting for partial-success imports without losing warning visibility
- the external contract now matches the row-level atomicity and best-effort behavior of the importer itself
Filtered well transfers failed while seeding dev data because CreateWell dumps included monitoring_status, but Thing exposes that as a computed status-history property rather than a writable ORM field.

Exclude it from Thing creation so the transfer can persist wells and let monitoring status continue to be recorded through StatusHistory.
Address the remaining issues found during validation against a real standalone water-level CSV.

Changes:
- coerce measuring point height and well depth values to float before depth-to-water validation so DB-backed Decimal values do not crash the importer
- replace the nested savepoint context-manager flow with explicit savepoint handling to avoid the SQLAlchemy "nested transaction already deassociated from connection" warning
- add regression coverage for measuring-point history values coming back as Decimal

Impact:
- real water-level CSV imports no longer fail on float-vs-Decimal depth checks
- best-effort row imports complete without the savepoint warning seen in mixed-result runs
- rerunning the same real file remains idempotent
…er behavior

- update standalone water-level BDD scenarios for partial-success imports
- switch step data to current canonical headers and groundwater vocabularies
- add coverage for legacy alias headers and measuring_person validation
- fix shared CSV step compatibility by populating file_content in test context
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the standalone groundwater-level CSV importer to support the new source CSV shape while preserving existing API/CLI entry points, with improved idempotency and partial-success behavior.

Changes:

  • Introduces a dedicated WaterLevelCsvRow schema (header aliases, datetime normalization, enum/lexicon canonicalization, row-level constraints).
  • Refactors persistence to be groundwater-specific, idempotent (deterministic sample naming), and best-effort per-row via savepoints with partial-success reporting.
  • Fixes well transfer seeding by excluding monitoring_status from Thing creation and aligns tests/BDD steps with current importer behavior.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
transfers/well_transfer.py Excludes monitoring_status from Thing creation inputs during well transfer seeding.
services/water_level_csv.py Reworks CSV normalization/validation and implements idempotent, per-row savepoint persistence with partial-success exit codes.
schemas/water_level_csv.py Adds a dedicated pydantic schema for the new CSV shape (aliases, normalization, validation).
cli/cli.py Updates CLI output to report “completed with issues” on partial-success imports.
tests/test_well_transfer.py Adds coverage ensuring monitoring_status is excluded from Thing kwargs.
tests/test_water_level_csv_service.py Adds service-level tests for MP height resolution, depth validation, idempotency, and partial-success.
tests/test_water_level_csv_schema.py Adds schema normalization/aliasing and validation tests.
tests/test_observation.py Updates API bulk-upload expectations and adds partial-success API test.
tests/test_cli_commands.py Adds CLI partial-success reporting test and aligns CLI E2E expectations with new semantics.
tests/features/water-level-csv.feature Updates BDD scenarios for new headers/aliases and partial-success behavior.
tests/features/steps/water-levels-csv.py Updates BDD step implementation to generate/validate new CSV shape and lexicon values.

… during bulk upload

- Add test to validate that bulk water-level uploads do not remove existing unrelated observations
- Update testing utility with `get_parameter_id` for parameter resolution
- Ensure edge cases where new observations coexist with pre-existing observations function as expected
…efully

- Move `begin_nested` savepoint call inside the try block for proper error handling.
- Ensure session expiration when savepoint creation fails.
- Add test coverage for savepoint creation failure scenarios.
Avoid loading GCS-backed elevation cache and CSV-backed measuring point data during `WellTransferer` construction.

This prevents unit tests that only instantiate the transferer from requiring Google Application Default Credentials, while preserving existing behavior for real transfer paths that actually need those dependencies.
Copilot AI review requested due to automatic review settings March 30, 2026 21:36
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 3 comments.

…s and better error handling

- Use a nested transaction around parallel aquifer creation so a duplicate-key race only rolls back the aquifer insert attempt instead of the full in-flight well/location work.
- Catch IntegrityError for the expected unique-constraint race, re-query the existing aquifer, and let unexpected flush failures surface normally for debugging and error handling.
- Add unit coverage for the aquifer race path and update contact transfer tests to match the current helper contract.
…rkers

Load cached elevations once before starting parallel well transfer workers. This avoids multiple workers loading the same cache at the same time and makes sure they all use the same shared in-memory data during the transfer.
Copilot AI review requested due to automatic review settings March 31, 2026 03:58
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.

Accepted codex's commit suggestion. Seems benign, mostly focused on clean up

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 31, 2026 04:17
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 1 comment.

Comment on lines 60 to +66
field_event_dt: datetime
measurement_dt: datetime
mp_height: float
depth_to_water_ft: float
level_status: str
data_quality: str
mp_height: float | None
resolved_mp_height: float | int | None
existing_mp_height: float | int | None
mp_height_differs_from_history: bool
depth_to_water_ft: float | None
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_ValidatedRow carries mp_height, existing_mp_height, and mp_height_differs_from_history, but those values aren’t referenced anywhere after row validation (only resolved_mp_height is used for persistence). If these are intended for warning/reporting, consider emitting them into validation_errors/payload; otherwise remove them from _ValidatedRow to reduce unused state and keep the importer logic easier to follow.

Copilot uses AI. Check for mistakes.
… history

Report a row-level warning when a CSV mp_height differs from the well's existing measuring point height while still importing the row and using the CSV value.

This makes the existing measuring point comparison visible to users instead of carrying unused state through validation. Update service and API tests to cover warning-only imports and mixed warning/error uploads.
Copilot AI review requested due to automatic review settings March 31, 2026 04:38
@ksmuczynski ksmuczynski merged commit e4d94ae into staging Mar 31, 2026
10 checks passed
@ksmuczynski ksmuczynski deleted the kas-water-level-import branch March 31, 2026 04:41
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated no new comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants