Conversation
Introduce a dedicated schema for the standalone water-level importer and move header/value normalization into the schema layer. Changes: - add WaterLevelCsvRow to schemas/water_level_csv.py - support source CSV headers for field_staff_2, field_staff_3, water_level_date_time, and measuring_person - preserve compatibility aliases for measurement_date_time, sampler,and mp_height_ft - normalize blank optional values to None - normalize naive water-level datetimes from America/Denver to UTC - canonicalize sample method aliases such as electric tape and steel tape - ignore source-only columns hold(not saved) and cut(not saved) - wire services/water_level_csv.py to use the shared schema - add focused schema tests for aliasing and datetime normalization Impact: - the standalone importer now understands the new source CSV shape without changing persistence behavior yet - datetime handling is explicit and consistent instead of relying on mixed input formats downstream - later stages can add validation and persistence rules on top of a stable, reusable normalization layer
Tighten standalone water-level validation and add well-backed measuring point height resolution ahead of the persistence refactor. Changes: - enforce measuring_person, datetime ordering, and conditional level_status validation in WaterLevelCsvRow - reject negative depth_to_water_ft values while allowing negative mp_height values for surveyed reference points below land surface - validate data_quality against the DataQuality enum and normalize shorthand level_status values to canonical groundwater terms - resolve measuring_point_height from CSV input or the well's current MeasuringPointHistory-derived value - reject rows where depth_to_water_ft implies a water level below the bottom of the well - update schema, service, and API tests for the new rules
Switch the standalone importer from placeholder inserts to deterministic, structured groundwater persistence. Changes: - replace random UUID sample names with deterministic well-inventory-style sample names based on well name and measurement timestamp - look up existing standalone groundwater-level samples by deterministic sample name and update them on rerun instead of inserting duplicates - persist samples with sample_matrix="groundwater" - resolve the groundwater-level parameter by name and groundwater matrix, creating it when missing - write structured groundwater observation fields instead of relying on note strings - persist resolved_mp_height to Observation.measuring_point_height - persist level_status to Observation.groundwater_level_reason - persist data_quality to Observation.nma_data_quality - persist water_level_notes directly to sample and observation notes - add tests for deterministic sample naming, idempotent reruns, and API-level structured groundwater persistence Impact: - rerunning the same standalone water-level CSV is now idempotent - imported groundwater observations are stored with the intended groundwater semantics instead of the earlier placeholder behavior - downstream queries can rely on structured observation fields rather than extracting meaning from generated note text
Change the standalone water-level importer from all-or-nothing writes to best-effort row imports using per-row savepoints. Changes: - stop skipping all persistence when some rows have validation errors - wrap each standalone water-level row import in a nested transaction so one bad row does not roll back successful rows - continue processing remaining rows after row-level persistence errors - record row-specific persistence errors in the existing validation_errors output - report total_rows_imported as the number of successfully written rows even when the file contains failures - add mixed-row service coverage proving a valid row still imports when another row in the same file fails Impact: - standalone water-level CSV imports are now operationally resilient in the same way as well inventory imports - users can import good rows from partially bad files without manually splitting the source CSV first - later stages can refine API and CLI reporting without changing the core best-effort import behavior
Update the standalone water-level API and CLI contract to reflect the best-effort row import behavior. Changes: - treat mixed-result standalone water-level imports as successful when at least one row imports - keep total-failure imports as non-zero/400 outcomes when no rows are written - preserve row-level validation and persistence errors in the existing validation_errors payload - update the CLI status banner to distinguish clean success from completed-with-issues using summary data instead of exit code alone - add API coverage for partial-success bulk uploads - add CLI coverage for completed-with-issues reporting - update end-to-end CLI water-level expectations to current importer semantics from earlier stages Impact: - API callers no longer see a hard failure when valid rows were imported successfully from a mixed file - CLI users get more accurate status reporting for partial-success imports without losing warning visibility - the external contract now matches the row-level atomicity and best-effort behavior of the importer itself
Filtered well transfers failed while seeding dev data because CreateWell dumps included monitoring_status, but Thing exposes that as a computed status-history property rather than a writable ORM field. Exclude it from Thing creation so the transfer can persist wells and let monitoring status continue to be recorded through StatusHistory.
Address the remaining issues found during validation against a real standalone water-level CSV. Changes: - coerce measuring point height and well depth values to float before depth-to-water validation so DB-backed Decimal values do not crash the importer - replace the nested savepoint context-manager flow with explicit savepoint handling to avoid the SQLAlchemy "nested transaction already deassociated from connection" warning - add regression coverage for measuring-point history values coming back as Decimal Impact: - real water-level CSV imports no longer fail on float-vs-Decimal depth checks - best-effort row imports complete without the savepoint warning seen in mixed-result runs - rerunning the same real file remains idempotent
…er behavior - update standalone water-level BDD scenarios for partial-success imports - switch step data to current canonical headers and groundwater vocabularies - add coverage for legacy alias headers and measuring_person validation - fix shared CSV step compatibility by populating file_content in test context
There was a problem hiding this comment.
Pull request overview
Updates the standalone groundwater-level CSV importer to support the new source CSV shape while preserving existing API/CLI entry points, with improved idempotency and partial-success behavior.
Changes:
- Introduces a dedicated
WaterLevelCsvRowschema (header aliases, datetime normalization, enum/lexicon canonicalization, row-level constraints). - Refactors persistence to be groundwater-specific, idempotent (deterministic sample naming), and best-effort per-row via savepoints with partial-success reporting.
- Fixes well transfer seeding by excluding
monitoring_statusfromThingcreation and aligns tests/BDD steps with current importer behavior.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| transfers/well_transfer.py | Excludes monitoring_status from Thing creation inputs during well transfer seeding. |
| services/water_level_csv.py | Reworks CSV normalization/validation and implements idempotent, per-row savepoint persistence with partial-success exit codes. |
| schemas/water_level_csv.py | Adds a dedicated pydantic schema for the new CSV shape (aliases, normalization, validation). |
| cli/cli.py | Updates CLI output to report “completed with issues” on partial-success imports. |
| tests/test_well_transfer.py | Adds coverage ensuring monitoring_status is excluded from Thing kwargs. |
| tests/test_water_level_csv_service.py | Adds service-level tests for MP height resolution, depth validation, idempotency, and partial-success. |
| tests/test_water_level_csv_schema.py | Adds schema normalization/aliasing and validation tests. |
| tests/test_observation.py | Updates API bulk-upload expectations and adds partial-success API test. |
| tests/test_cli_commands.py | Adds CLI partial-success reporting test and aligns CLI E2E expectations with new semantics. |
| tests/features/water-level-csv.feature | Updates BDD scenarios for new headers/aliases and partial-success behavior. |
| tests/features/steps/water-levels-csv.py | Updates BDD step implementation to generate/validate new CSV shape and lexicon values. |
… during bulk upload - Add test to validate that bulk water-level uploads do not remove existing unrelated observations - Update testing utility with `get_parameter_id` for parameter resolution - Ensure edge cases where new observations coexist with pre-existing observations function as expected
…efully - Move `begin_nested` savepoint call inside the try block for proper error handling. - Ensure session expiration when savepoint creation fails. - Add test coverage for savepoint creation failure scenarios.
Avoid loading GCS-backed elevation cache and CSV-backed measuring point data during `WellTransferer` construction. This prevents unit tests that only instantiate the transferer from requiring Google Application Default Credentials, while preserving existing behavior for real transfer paths that actually need those dependencies.
…s and better error handling - Use a nested transaction around parallel aquifer creation so a duplicate-key race only rolls back the aquifer insert attempt instead of the full in-flight well/location work. - Catch IntegrityError for the expected unique-constraint race, re-query the existing aquifer, and let unexpected flush failures surface normally for debugging and error handling. - Add unit coverage for the aquifer race path and update contact transfer tests to match the current helper contract.
…rs in CSV processing
…rkers Load cached elevations once before starting parallel well transfer workers. This avoids multiple workers loading the same cache at the same time and makes sure they all use the same shared in-memory data during the transfer.
Accepted codex's commit suggestion. Seems benign, mostly focused on clean up Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
| field_event_dt: datetime | ||
| measurement_dt: datetime | ||
| mp_height: float | ||
| depth_to_water_ft: float | ||
| level_status: str | ||
| data_quality: str | ||
| mp_height: float | None | ||
| resolved_mp_height: float | int | None | ||
| existing_mp_height: float | int | None | ||
| mp_height_differs_from_history: bool | ||
| depth_to_water_ft: float | None |
There was a problem hiding this comment.
_ValidatedRow carries mp_height, existing_mp_height, and mp_height_differs_from_history, but those values aren’t referenced anywhere after row validation (only resolved_mp_height is used for persistence). If these are intended for warning/reporting, consider emitting them into validation_errors/payload; otherwise remove them from _ValidatedRow to reduce unused state and keep the importer logic easier to follow.
… history Report a row-level warning when a CSV mp_height differs from the well's existing measuring point height while still importing the row and using the CSV value. This makes the existing measuring point comparison visible to users instead of carrying unused state through validation. Update service and API tests to cover warning-only imports and mixed warning/error uploads.
…-water-level-import
Summary
This updates the standalone groundwater-level CSV importer to support the new source CSV shape while keeping the current staging-compatible API and CLI entry points.
The work was implemented in stages to keep behavior changes isolated and reviewable:
Why
This PR addresses the following problem / context:
floatvsDecimalcrash and a nested savepoint warning.How
Implementation summary - the following was changed / added / removed:
Schema And Validation
data_qualityenum validation andmeasuring_personmatching supplied field staff.mp_heightwhen provided, otherwise from existing well measuring-point history.Persistence And Import Behavior
sample_matrix="groundwater"level_statusintoObservation.groundwater_level_reasongroundwater_level_reasonandnma_data_qualityfloatvsDecimalfailuresAPI And CLI Reporting
BDD Coverage
measuring_personvalidation.file_contentin the water-level test context.Related Fix
monitoring_statusfromThingcreation.Notes
Any special considerations, workarounds, or follow-up work to note?
/Users/kelsey/Downloads/water_level_template_existing_well_20260327.csv.behave tests/features/water-level-csv.feature --tags=@backend.