kas-water-level-import by ksmuczynski · Pull Request #621 · DataIntegrationGroup/OcotilloAPI

ksmuczynski · 2026-03-28T01:06:48Z

Summary

This updates the standalone groundwater-level CSV importer to support the new source CSV shape while keeping the current staging-compatible API and CLI entry points.

The work was implemented in stages to keep behavior changes isolated and reviewable:

schema and normalization
validation and measuring-point resolution
structured groundwater persistence and idempotency
best-effort row savepoints
API and CLI partial-success reporting
real-file cleanup and stabilization
BDD coverage alignment

Why

This PR addresses the following problem / context:

The standalone water-level importer needed to support the current source CSV shape and groundwater-specific persistence rules.
The original import flow did not handle partial-success behavior cleanly across service, API, CLI, and BDD coverage.
Real-file validation exposed production-relevant issues that were not covered by earlier test data, including a float vs Decimal crash and a nested savepoint warning.
Existing BDD coverage had drifted from current importer behavior and vocabulary expectations.

How

Implementation summary - the following was changed / added / removed:

Schema And Validation

Added a dedicated standalone water-level CSV schema with header alias support and datetime normalization.
Updated validation to enforce current groundwater rules, including data_quality enum validation and measuring_person matching supplied field staff.
Resolved measuring point height from CSV mp_height when provided, otherwise from existing well measuring-point history.

Persistence And Import Behavior

Aligned persistence with groundwater semantics:
- sample_matrix="groundwater"
- groundwater parameter lookup by matrix
- mapped CSV level_status into Observation.groundwater_level_reason
- persisted groundwater-specific observation fields, including groundwater_level_reason and nma_data_quality
Made imports idempotent with deterministic sample naming.
Switched row persistence to best-effort per-row savepoints so valid rows import even when other rows fail.
Fixed issues found during real-file validation:
- coerced DB-backed numeric values to avoid float vs Decimal failures
- replaced nested savepoint context-manager flow to eliminate the SQLAlchemy warning

API And CLI Reporting

Updated API and CLI behavior so partial-success imports are reported as completed with issues rather than total failure.

BDD Coverage

Updated BDD coverage to reflect current importer behavior, canonical headers, current vocabularies, alias-header support, and measuring_person validation.
Fixed shared CSV-step compatibility by populating file_content in the water-level test context.

Related Fix

Fixed well transfer seeding by excluding computed monitoring_status from Thing creation.

Notes

Any special considerations, workarounds, or follow-up work to note?

Real-file validation was performed with /Users/kelsey/Downloads/water_level_template_existing_well_20260327.csv.
Dev validation used a seeded subset of wells, and the real import correctly produced mixed results with 5 valid imported rows and many expected unknown-well failures.
Rerunning the same real file was confirmed to be idempotent.
The updated water-level BDD feature now passes with behave tests/features/water-level-csv.feature --tags=@backend.

Introduce a dedicated schema for the standalone water-level importer and move header/value normalization into the schema layer. Changes: - add WaterLevelCsvRow to schemas/water_level_csv.py - support source CSV headers for field_staff_2, field_staff_3, water_level_date_time, and measuring_person - preserve compatibility aliases for measurement_date_time, sampler,and mp_height_ft - normalize blank optional values to None - normalize naive water-level datetimes from America/Denver to UTC - canonicalize sample method aliases such as electric tape and steel tape - ignore source-only columns hold(not saved) and cut(not saved) - wire services/water_level_csv.py to use the shared schema - add focused schema tests for aliasing and datetime normalization Impact: - the standalone importer now understands the new source CSV shape without changing persistence behavior yet - datetime handling is explicit and consistent instead of relying on mixed input formats downstream - later stages can add validation and persistence rules on top of a stable, reusable normalization layer

Tighten standalone water-level validation and add well-backed measuring point height resolution ahead of the persistence refactor. Changes: - enforce measuring_person, datetime ordering, and conditional level_status validation in WaterLevelCsvRow - reject negative depth_to_water_ft values while allowing negative mp_height values for surveyed reference points below land surface - validate data_quality against the DataQuality enum and normalize shorthand level_status values to canonical groundwater terms - resolve measuring_point_height from CSV input or the well's current MeasuringPointHistory-derived value - reject rows where depth_to_water_ft implies a water level below the bottom of the well - update schema, service, and API tests for the new rules

Switch the standalone importer from placeholder inserts to deterministic, structured groundwater persistence. Changes: - replace random UUID sample names with deterministic well-inventory-style sample names based on well name and measurement timestamp - look up existing standalone groundwater-level samples by deterministic sample name and update them on rerun instead of inserting duplicates - persist samples with sample_matrix="groundwater" - resolve the groundwater-level parameter by name and groundwater matrix, creating it when missing - write structured groundwater observation fields instead of relying on note strings - persist resolved_mp_height to Observation.measuring_point_height - persist level_status to Observation.groundwater_level_reason - persist data_quality to Observation.nma_data_quality - persist water_level_notes directly to sample and observation notes - add tests for deterministic sample naming, idempotent reruns, and API-level structured groundwater persistence Impact: - rerunning the same standalone water-level CSV is now idempotent - imported groundwater observations are stored with the intended groundwater semantics instead of the earlier placeholder behavior - downstream queries can rely on structured observation fields rather than extracting meaning from generated note text

Change the standalone water-level importer from all-or-nothing writes to best-effort row imports using per-row savepoints. Changes: - stop skipping all persistence when some rows have validation errors - wrap each standalone water-level row import in a nested transaction so one bad row does not roll back successful rows - continue processing remaining rows after row-level persistence errors - record row-specific persistence errors in the existing validation_errors output - report total_rows_imported as the number of successfully written rows even when the file contains failures - add mixed-row service coverage proving a valid row still imports when another row in the same file fails Impact: - standalone water-level CSV imports are now operationally resilient in the same way as well inventory imports - users can import good rows from partially bad files without manually splitting the source CSV first - later stages can refine API and CLI reporting without changing the core best-effort import behavior

Update the standalone water-level API and CLI contract to reflect the best-effort row import behavior. Changes: - treat mixed-result standalone water-level imports as successful when at least one row imports - keep total-failure imports as non-zero/400 outcomes when no rows are written - preserve row-level validation and persistence errors in the existing validation_errors payload - update the CLI status banner to distinguish clean success from completed-with-issues using summary data instead of exit code alone - add API coverage for partial-success bulk uploads - add CLI coverage for completed-with-issues reporting - update end-to-end CLI water-level expectations to current importer semantics from earlier stages Impact: - API callers no longer see a hard failure when valid rows were imported successfully from a mixed file - CLI users get more accurate status reporting for partial-success imports without losing warning visibility - the external contract now matches the row-level atomicity and best-effort behavior of the importer itself

Filtered well transfers failed while seeding dev data because CreateWell dumps included monitoring_status, but Thing exposes that as a computed status-history property rather than a writable ORM field. Exclude it from Thing creation so the transfer can persist wells and let monitoring status continue to be recorded through StatusHistory.

Address the remaining issues found during validation against a real standalone water-level CSV. Changes: - coerce measuring point height and well depth values to float before depth-to-water validation so DB-backed Decimal values do not crash the importer - replace the nested savepoint context-manager flow with explicit savepoint handling to avoid the SQLAlchemy "nested transaction already deassociated from connection" warning - add regression coverage for measuring-point history values coming back as Decimal Impact: - real water-level CSV imports no longer fail on float-vs-Decimal depth checks - best-effort row imports complete without the savepoint warning seen in mixed-result runs - rerunning the same real file remains idempotent

…er behavior - update standalone water-level BDD scenarios for partial-success imports - switch step data to current canonical headers and groundwater vocabularies - add coverage for legacy alias headers and measuring_person validation - fix shared CSV step compatibility by populating file_content in test context

Copilot

Pull request overview

Updates the standalone groundwater-level CSV importer to support the new source CSV shape while preserving existing API/CLI entry points, with improved idempotency and partial-success behavior.

Changes:

Introduces a dedicated WaterLevelCsvRow schema (header aliases, datetime normalization, enum/lexicon canonicalization, row-level constraints).
Refactors persistence to be groundwater-specific, idempotent (deterministic sample naming), and best-effort per-row via savepoints with partial-success reporting.
Fixes well transfer seeding by excluding monitoring_status from Thing creation and aligns tests/BDD steps with current importer behavior.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
transfers/well_transfer.py	Excludes `monitoring_status` from `Thing` creation inputs during well transfer seeding.
services/water_level_csv.py	Reworks CSV normalization/validation and implements idempotent, per-row savepoint persistence with partial-success exit codes.
schemas/water_level_csv.py	Adds a dedicated pydantic schema for the new CSV shape (aliases, normalization, validation).
cli/cli.py	Updates CLI output to report “completed with issues” on partial-success imports.
tests/test_well_transfer.py	Adds coverage ensuring `monitoring_status` is excluded from `Thing` kwargs.
tests/test_water_level_csv_service.py	Adds service-level tests for MP height resolution, depth validation, idempotency, and partial-success.
tests/test_water_level_csv_schema.py	Adds schema normalization/aliasing and validation tests.
tests/test_observation.py	Updates API bulk-upload expectations and adds partial-success API test.
tests/test_cli_commands.py	Adds CLI partial-success reporting test and aligns CLI E2E expectations with new semantics.
tests/features/water-level-csv.feature	Updates BDD scenarios for new headers/aliases and partial-success behavior.
tests/features/steps/water-levels-csv.py	Updates BDD step implementation to generate/validate new CSV shape and lexicon values.

services/water_level_csv.py

… during bulk upload - Add test to validate that bulk water-level uploads do not remove existing unrelated observations - Update testing utility with `get_parameter_id` for parameter resolution - Ensure edge cases where new observations coexist with pre-existing observations function as expected

…efully - Move `begin_nested` savepoint call inside the try block for proper error handling. - Ensure session expiration when savepoint creation fails. - Add test coverage for savepoint creation failure scenarios.

Avoid loading GCS-backed elevation cache and CSV-backed measuring point data during `WellTransferer` construction. This prevents unit tests that only instantiate the transferer from requiring Google Application Default Credentials, while preserving existing behavior for real transfer paths that actually need those dependencies.

Copilot

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 3 comments.

transfers/well_transfer.py

services/water_level_csv.py

…s and better error handling - Use a nested transaction around parallel aquifer creation so a duplicate-key race only rolls back the aquifer insert attempt instead of the full in-flight well/location work. - Catch IntegrityError for the expected unique-constraint race, re-query the existing aquifer, and let unexpected flush failures surface normally for debugging and error handling. - Add unit coverage for the aquifer race path and update contact transfer tests to match the current helper contract.

…rs in CSV processing

…rkers Load cached elevations once before starting parallel well transfer workers. This avoids multiple workers loading the same cache at the same time and makes sure they all use the same shared in-memory data during the transfer.

Copilot

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.

transfers/well_transfer.py

services/water_level_csv.py

Accepted codex's commit suggestion. Seems benign, mostly focused on clean up Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

… workers

Copilot

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 1 comment.

Copilot · 2026-03-31T04:23:35Z

services/water_level_csv.py

    field_event_dt: datetime
    measurement_dt: datetime
-    mp_height: float
-    depth_to_water_ft: float
-    level_status: str
-    data_quality: str
+    mp_height: float | None
+    resolved_mp_height: float | int | None
+    existing_mp_height: float | int | None
+    mp_height_differs_from_history: bool
+    depth_to_water_ft: float | None


_ValidatedRow carries mp_height, existing_mp_height, and mp_height_differs_from_history, but those values aren’t referenced anywhere after row validation (only resolved_mp_height is used for persistence). If these are intended for warning/reporting, consider emitting them into validation_errors/payload; otherwise remove them from _ValidatedRow to reduce unused state and keep the importer logic easier to follow.

… history Report a row-level warning when a CSV mp_height differs from the well's existing measuring point height while still importing the row and using the CSV value. This makes the existing measuring point comparison visible to users instead of carrying unused state through validation. Update service and API tests to cover warning-only imports and mixed warning/error uploads.

…-water-level-import

…imator setup

Copilot

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated no new comments.

ksmuczynski added 8 commits March 26, 2026 12:46

ksmuczynski requested review from Copilot, jacob-a-brown and jirhiker March 28, 2026 01:06

Formatting changes

0785cac

Copilot started reviewing on behalf of ksmuczynski March 28, 2026 01:07 View session

Copilot AI reviewed Mar 28, 2026

View reviewed changes

services/water_level_csv.py Outdated Show resolved Hide resolved

services/water_level_csv.py Outdated Show resolved Hide resolved

services/water_level_csv.py Show resolved Hide resolved

jirhiker approved these changes Mar 28, 2026

View reviewed changes

ksmuczynski added 4 commits March 30, 2026 14:42

fix(water-level-import): skip persistence when no valid rows exist

9741d41

Copilot AI review requested due to automatic review settings March 30, 2026 21:36

Copilot started reviewing on behalf of ksmuczynski March 30, 2026 21:36 View session

Copilot AI reviewed Mar 30, 2026

View reviewed changes

transfers/well_transfer.py Show resolved Hide resolved

transfers/well_transfer.py Outdated Show resolved Hide resolved

services/water_level_csv.py Outdated Show resolved Hide resolved

ksmuczynski added 3 commits March 30, 2026 21:28

fix(water-level-import): refine handling of alias and canonical heade…

623206e

…rs in CSV processing

Copilot AI review requested due to automatic review settings March 31, 2026 03:58

Copilot started reviewing on behalf of ksmuczynski March 31, 2026 03:58 View session

Copilot AI reviewed Mar 31, 2026

View reviewed changes

transfers/well_transfer.py Show resolved Hide resolved

services/water_level_csv.py Show resolved Hide resolved

services/water_level_csv.py Outdated Show resolved Hide resolved

jirhiker approved these changes Mar 31, 2026

View reviewed changes

Update services/water_level_csv.py

a3ce92c

Accepted codex's commit suggestion. Seems benign, mostly focused on clean up Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings March 31, 2026 04:17

Copilot started reviewing on behalf of ksmuczynski March 31, 2026 04:18 View session

fix(well-transfer): preload measuring point estimator before parallel…

91aaaf4

… workers

Copilot AI reviewed Mar 31, 2026

View reviewed changes

ksmuczynski added 3 commits March 30, 2026 22:32

Merge remote-tracking branch 'origin/kas-water-level-import' into kas…

6ccdc7c

…-water-level-import

test(well-transfer): isolate parallel preload tests from external est…

256af68

…imator setup

Copilot AI review requested due to automatic review settings March 31, 2026 04:38

Copilot started reviewing on behalf of ksmuczynski March 31, 2026 04:39 View session

ksmuczynski merged commit e4d94ae into staging Mar 31, 2026
10 checks passed

ksmuczynski deleted the kas-water-level-import branch March 31, 2026 04:41

Copilot AI reviewed Mar 31, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kas-water-level-import#621

kas-water-level-import#621
ksmuczynski merged 21 commits intostagingfrom
kas-water-level-import

ksmuczynski commented Mar 28, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 31, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ksmuczynski commented Mar 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

How

Schema And Validation

Persistence And Import Behavior

API And CLI Reporting

BDD Coverage

Related Fix

Notes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ksmuczynski commented Mar 28, 2026 •

edited

Loading