NO TICKET fix(importers): prevent duplicate well-name collisions in well inventory and water-level imports by ksmuczynski · Pull Request #633 · DataIntegrationGroup/OcotilloAPI

ksmuczynski · 2026-04-01T20:57:06Z

Summary

This PR hardens the CSV importers around duplicate water well names.

It updates the well inventory importer so it:

preserves the current re-import/idempotency behavior
reuses an existing well when the row clearly matches a prior import
fails the row if a water well with the same Thing.name already exists elsewhere in the database
avoids silently creating duplicate Thing records

It also keeps the water-level importer defensive so ambiguous well lookups no longer crash the CLI and instead return a row-level validation error.

Why

This PR addresses the following problem / context:

I found that duplicate Thing.name records for water well could exist in the database, and at least one duplicate appears to have been introduced by the well inventory import path.

That caused two issues:

the well inventory importer created a second Thing with the same well name
the water-level importer crashed with MultipleResultsFound when trying to resolve a well by name

How

Implementation summary - the following was changed / added / removed:

Added a database-wide duplicate-name check in the well inventory import flow after the existing re-import detection logic
Fail the row with a clear well_name_point_id validation error when the well already exists
Kept the water-level duplicate-well guard so ambiguous lookups fail gracefully instead of aborting the command
Added regression coverage for both behaviors

Why This Approach

The well inventory importer still needs to support safe reruns of the same import, so the existing _find_existing_imported_well() logic is evaluated first.

Only if the row does not match a prior import do we check whether a well with the same name already exists in the database and block creation.

That gives us:

idempotent re-imports
no silent duplicate Thing creation
clearer operator-facing errors

Notes

Any special considerations, workarounds, or follow-up work to note?

This change does not clean up existing duplicate Thing.name rows already in the database. Any pre-existing duplicates should be identified and resolved separately.
This PR intentionally treats well inventory as a create-oriented importer, not a general update/upsert workflow for existing wells. If updating existing wells via CSV is needed, that should be scoped as a separate feature with explicit merge rules.

…orts Preserve well inventory re-import idempotency while blocking creation of a new water well when the same `Thing.name` already exists in the database. Also keep the water-level importer defensive against ambiguous well lookups so duplicate well names produce row-level validation errors instead of crashing the CLI with `MultipleResultsFound`. This reduces the risk of creating duplicate `Thing` records and makes import failures clearer for operators.

Copilot

Pull request overview

Hardens the well inventory and water-level CSV import flows against duplicate Thing.name collisions for water well records, preventing silent duplicate creation and ensuring ambiguous well lookups fail as row-level validation errors instead of crashing.

Changes:

Well inventory import: after existing “re-import/idempotency” detection, block creation when a water well with the same Thing.name already exists elsewhere in the DB.
Water-level import: catch MultipleResultsFound during well resolution and surface a row-level validation error instead of aborting.
Adds regression tests covering both the “existing DB well name” inventory case and the “duplicate name matches” water-level case.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File	Description
`services/well_inventory_csv.py`	Adds DB-wide duplicate-name guard and maps the resulting row error to `well_name_point_id`.
`services/water_level_csv.py`	Catches `MultipleResultsFound` from ambiguous well-name lookups and records a row-level validation error.
`tests/test_well_inventory.py`	Adds coverage asserting inventory import fails when the well name already exists in the DB.
`tests/test_water_level_csv_service.py`	Adds coverage ensuring water-level import reports duplicate well-name matches instead of crashing.

Copilot · 2026-04-01T21:01:26Z

tests/test_well_inventory.py

+
+        with session_ctx() as session:
+            session.add(Thing(name=row["well_name_point_id"], thing_type="water well"))
+            session.commit()
+
+        file_path = tmp_path / "well-inventory-existing-db-well.csv"
+        with file_path.open("w", encoding="utf-8", newline="") as f:
+            writer = csv.DictWriter(f, fieldnames=list(row.keys()))
+            writer.writeheader()
+            writer.writerow(row)
+
+        result = well_inventory_csv(file_path)
+
+        assert result.exit_code == 1, result.stderr
+        errors = result.payload.get("validation_errors", [])
+        assert errors
+        assert errors[0]["field"] == "well_name_point_id"
+        assert (
+            errors[0]["error"]
+            == "Well already exists in database for well_name_point_id 'TEST-0001'"
+        )
+
+        with session_ctx() as session:
+            things = (
+                session.query(Thing)
+                .filter(
+                    Thing.name == row["well_name_point_id"],
+                    Thing.thing_type == "water well",
+                )
+                .all()
+            )
+            assert len(things) == 1


This test inserts a Thing into the shared session-scoped test database but never deletes it. Because the test suite shares DB state across tests (see tests/conftest.py session fixture), leaving this row behind can make other well-inventory tests flaky by causing unexpected name collisions. Clean up the created Thing (e.g., capture its id and delete it in a finally, or use a fixture that yields and deletes).

Suggested change

with session_ctx() as session:

session.add(Thing(name=row["well_name_point_id"], thing_type="water well"))

session.commit()

file_path = tmp_path / "well-inventory-existing-db-well.csv"

with file_path.open("w", encoding="utf-8", newline="") as f:

writer = csv.DictWriter(f, fieldnames=list(row.keys()))

writer.writeheader()

writer.writerow(row)

result = well_inventory_csv(file_path)

assert result.exit_code == 1, result.stderr

errors = result.payload.get("validation_errors", [])

assert errors

assert errors[0]["field"] == "well_name_point_id"

assert (

errors[0]["error"]

== "Well already exists in database for well_name_point_id 'TEST-0001'"

)

with session_ctx() as session:

things = (

session.query(Thing)

.filter(

Thing.name == row["well_name_point_id"],

Thing.thing_type == "water well",

)

.all()

)

assert len(things) == 1

thing_name = row["well_name_point_id"]

try:

with session_ctx() as session:

session.add(Thing(name=thing_name, thing_type="water well"))

session.commit()

file_path = tmp_path / "well-inventory-existing-db-well.csv"

with file_path.open("w", encoding="utf-8", newline="") as f:

writer = csv.DictWriter(f, fieldnames=list(row.keys()))

writer.writeheader()

writer.writerow(row)

result = well_inventory_csv(file_path)

assert result.exit_code == 1, result.stderr

errors = result.payload.get("validation_errors", [])

assert errors

assert errors[0]["field"] == "well_name_point_id"

assert (

errors[0]["error"]

== "Well already exists in database for well_name_point_id 'TEST-0001'"

)

with session_ctx() as session:

things = (

session.query(Thing)

.filter(

Thing.name == thing_name,

Thing.thing_type == "water well",

)

.all()

)

assert len(things) == 1

finally:

# Clean up the Thing created for this test to avoid leaking state

with session_ctx() as session:

(

session.query(Thing)

.filter(

Thing.name == thing_name,

Thing.thing_type == "water well",

)

.delete(synchronize_session=False)

)

session.commit()

Copilot · 2026-04-01T21:01:26Z

services/well_inventory_csv.py

+                            if error_text.startswith(
+                                "Well already exists in database for well_name_point_id "
+                            ):
+                                field = "well_name_point_id"
+                            else:
+                                field = _extract_field_from_value_error(error_text)


Mapping the validation field by checking error_text.startswith(...) is brittle and couples control flow to an exact error-message string. Consider raising a structured/typed exception (or returning a structured error) that carries the target field explicitly, so future message wording changes don’t silently regress field attribution.

Suggested change

if error_text.startswith(

"Well already exists in database for well_name_point_id "

):

field = "well_name_point_id"

else:

field = _extract_field_from_value_error(error_text)

field = _extract_field_from_value_error(error_text)

ksmuczynski requested review from Copilot, jacob-a-brown and jirhiker April 1, 2026 20:57

Copilot started reviewing on behalf of ksmuczynski April 1, 2026 20:57 View session

Copilot AI reviewed Apr 1, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NO TICKET fix(importers): prevent duplicate well-name collisions in well inventory and water-level imports#633

NO TICKET fix(importers): prevent duplicate well-name collisions in well inventory and water-level imports#633
ksmuczynski wants to merge 1 commit intostagingfrom
kas-reinforce-duplicate-logic-check

ksmuczynski commented Apr 1, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 1, 2026

Uh oh!

Copilot AI Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ksmuczynski commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

How

Why This Approach

Notes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ksmuczynski commented Apr 1, 2026 •

edited

Loading