Skip to content

Well Inventory Import Reference.md

Kelsey Smuczynski edited this page Mar 25, 2026 · 1 revision

Well Inventory Import Reference

This page documents the well inventory CSV ingestion behavior in detail, covering the import workflow, transaction model, field mappings, validation rules, and operational notes.

For a broader overview of all CSV import surfaces, see CSV Ingestion and Validation.


Workflow Overview

1. CSV read

  • The CLI reads the source file as bytes
  • UTF-8 with BOM is supported via utf-8-sig decoding so the first header is parsed correctly

2. File validation

  • File must be a .csv
  • File cannot be empty and must contain at least one data row
  • Delimiter must be a comma
  • Duplicate header names are rejected
  • Maximum row count: 2000

3. Row parsing and schema validation

  • Each row is parsed into WellInventoryRow in schemas/well_inventory.py
  • well_name_point_id supports auto-generation if blank or if a placeholder token such as NM-XXXX or WL-XXXX is provided
  • Validation is best-effort: valid rows continue, invalid rows are collected into validation_errors, and one bad row does not abort the full file

4. Import execution

  • The import runs inside one outer DB transaction
  • Each row is processed inside a nested transaction (savepoint)
  • A failed row rolls back only its own work; a successful row remains pending until the final commit
  • Monitoring Plan groups are created lazily — a fully failed project does not leave behind an empty group

5. Final commit and response

The outer transaction commits once after all rows are processed. The response includes:

  • summary.total_rows_processed
  • summary.total_rows_imported
  • summary.validation_errors_or_warnings
  • validation_errors
  • wells

6. CLI progress output

Interactive terminal sessions emit progress messages during validation and import (validation progress, project-level and per-row import starts, periodic counts, final completion counts). Progress output is suppressed for non-interactive sessions.


Transaction and Rerun Behavior

The importer is best-effort, single-commit, and idempotent on rerun.

Best-effort row handling — rows are imported independently via savepoints; a row-level failure does not abort other rows.

Single final commit — successful rows are not visible to other DB sessions until session.commit(). Long-running imports appear to "land all at once" from another DB connection.

Idempotent reruns — if a row appears to have been imported already, the importer skips recreating the related records. Rerun behavior is skip existing, not update existing. Re-importing a CSV with new values for already-imported rows does not backfill or overwrite those fields.


Field Mapping Reference

For fields that persist to multiple places, the same source column is repeated on multiple rows so each destination remains easy to scan.

Source CSV Column Schema Field DB Table DB Column Validation / Normalization Persistence Behavior
project project group name Required. Finds an existing Monitoring Plan group or creates one lazily when the first row for that project imports successfully.
well_name_point_id well_name_point_id thing name Required unless blank or placeholder token triggers auto-generation. Must be unique per file. Primary well identifier.
site_name site_name thing_id_link alternate_id Blank → None. Stored as NMBGMR alternate ID with relation same_as.
ose_well_record_id ose_well_record_id thing_id_link alternate_id Blank → None. Stored as NMOSE alternate ID with relation same_as.
date_time date_time thing first_visit_date Required. Must be timezone-naive ISO datetime. Converted to America/Denver, then stored as UTC-aware. Sets the well's first visit date.
date_time date_time field_event event_date Required. Used as the initial well inventory field-event timestamp.
utm_easting, utm_northing, utm_zone utm_easting, utm_northing, utm_zone location point Required. utm_zone must be 12N or 13N. Coordinates must resolve inside New Mexico. Transformed from UTM to WGS84 and stored as the location geometry.
elevation_ft elevation_ft location elevation Optional. Blank → None. Converted from feet to meters. Missing values currently store as 0.0.
public_availability_acknowledgement public_availability_acknowledgement location release_status Blank → None. True → public, False → private, blank/missing → draft.
elevation_method elevation_method data_provenance collection_method Lexicon-backed, case-insensitive, whitespace-tolerant. Blank → None. Creates location elevation provenance. Missing value falls back to Unknown.
total_well_depth_ft total_well_depth_ft thing well_depth Optional. Blank → None. Passed through CreateWell.
depth_source depth_source data_provenance origin_type Lexicon-backed, case-insensitive, whitespace-tolerant. Blank → None. Creates well depth provenance. Also used when building the historic depth note.
date_drilled date_drilled thing well_completion_date Optional. Blank → None. Must be past or today. Passed through CreateWell.
completion_source completion_source data_provenance origin_type Lexicon-backed, case-insensitive, whitespace-tolerant. Blank → None. Creates completion-date provenance.
measuring_point_height_ft measuring_point_height_ft measuring_point_history measuring_point_height Optional. Blank → None. If both measuring_point_height_ft and mp_height are present and unequal, the row fails. Primary well-level measuring point height.
measuring_point_description measuring_point_description measuring_point_history measuring_point_description Blank → None. Stored with the measuring-point history row.
well_purpose well_purpose well_purpose purpose Lexicon-backed. Blank → None. Adds a well purpose row.
well_purpose_2 well_purpose_2 well_purpose purpose Lexicon-backed. Blank → None. Adds a second well purpose row when present.
well_status / well_hole_status well_status status_history status_value Alias supported via well_hole_status. Lexicon-backed. Blank → None. Creates Well Status history.
monitoring_status monitoring_status status_history status_value Lexicon-backed. Blank → None. Creates Monitoring Status history.
monitoring_frequency monitoring_frequency monitoring_frequency_history monitoring_frequency Lexicon-backed. Blank → None. Legacy Complete normalized to monitoring_frequency = None and monitoring_status = Not currently monitored. Creates monitoring-frequency history only when a real frequency remains after normalization.
is_open is_open status_history status_value Blank → None. Creates Open Status history as Open or Closed.
datalogger_possible datalogger_possible status_history status_value Blank → None. Creates Datalogger Suitability Status history.
specific_location_of_well specific_location_of_well note content Blank → None. Stored as Access note on the well.
contact_special_requests_notes contact_special_requests_notes note content Blank → None. Stored as General note on the well and on each created contact.
sampling_scenario_notes sampling_scenario_notes note content Blank → None. Stored as Sampling Procedure note on the well.
well_measuring_notes well_measuring_notes note content Blank → None. Stored as Sampling Procedure note on the well.
well_notes well_notes note content Blank → None. Stored as General note on the well.
water_notes water_notes note content Blank → None. Stored as Water note on the well.
directions_to_site directions_to_site note content Blank → None. Stored as Directions note on the location.
historic_depth_to_water_ft historic_depth_to_water_ft note content Optional. Blank → None. When present, a formatted Historical note is created on the well using depth_source as the source text.
field_staff field_staff contact name Required. Reused or created as NMBGMR Field Event Participant contact.
field_staff field_staff field_event_participant contact_id Required. Linked to field event with participant role Lead.
field_staff_2 / field_staff_3 field_staff_2 / field_staff_3 field_event_participant contact_id Blank → None. Linked to field event with participant role Participant.
contact_n_name, contact_n_organization, contact_n_role, contact_n_type contact_n_* contact name, organization, role, contact_type If any contact data is present, at least one of name/organization is required, plus role and type. Organization is lexicon-backed. Creates or reuses a contact, then links it to the well.
contact_n_phone_i, contact_n_phone_i_type contact_n_phone_i, contact_n_phone_i_type phone phone_number, phone_type Phone format validated. Type required when phone is present. Added under the created contact.
contact_n_email_i, contact_n_email_i_type contact_n_email_i, contact_n_email_i_type email email, email_type Email format validated. Type required when email is present. Added under the created contact.
contact_n_address_i_* contact_n_address_i_* address address fields If any address fields are present, line 1, city, state, postal code, and type are all required. State and ZIP are validated. Added under the created contact.
result_communication_preference result_communication_preference note content Blank → None. Stored as Communication note on each created contact.
repeat_measurement_permission repeat_measurement_permission permission_history permission_allowed Blank → None. Requires a contact. Creates Water Level Sample permission.
sampling_permission sampling_permission permission_history permission_allowed Blank → None. Requires a contact. Creates Water Chemistry Sample permission.
datalogger_installation_permission datalogger_installation_permission permission_history permission_allowed Blank → None. Requires a contact. Creates Datalogger Installation permission.
measurement_date_time / water_level_date_time measurement_date_time sample sample_date Alias supported via water_level_date_time. Required only when depth_to_water_ft is provided. Creates a groundwater-level sample when present.
measurement_date_time / water_level_date_time measurement_date_time observation observation_datetime Alias supported. Creates a groundwater-level observation when present.
sample_method sample_method sample sample_method Lexicon-backed. Blank → None. Missing value falls back to Unknown.
water_level_notes water_level_notes sample notes Blank → None. Stored on the sample.
water_level_notes water_level_notes observation notes Blank → None. Stored on the observation.
depth_to_water_ft depth_to_water_ft observation value Blank → None. Requires measurement_date_time when provided. Persisted as the groundwater-level measurement. Blank values still allow attempted measurement records when measurement_date_time exists.
mp_height / mp_height_ft mp_height observation measuring_point_height Alias supported via mp_height_ft. Blank → None. Used for groundwater-level observation measuring point height.
level_status level_status observation groundwater_level_reason Lexicon-backed. Blank → None. Stored as the observation reason/status.
data_quality data_quality observation nma_data_quality Lexicon-backed. Blank → None. Stored on the observation when provided.

Additional tables touched indirectly by the importer:

  • thing_contact_association
  • group_thing_association
  • location_thing_association
  • field_activity (well inventory)
  • field_activity (groundwater level, when a measurement event exists)

Validation and Normalization Rules

Required fields

  • project
  • well_name_point_id (or a supported auto-generation placeholder)
  • date_time
  • field_staff
  • utm_easting, utm_northing, utm_zone

Fields such as site_name, elevation_ft, elevation_method, measuring_point_height_ft, and depth_to_water_ft are optional.

Blank and missing values

  • Many optional CSV fields normalize blank strings to None
  • This includes optional lexicon-backed fields, contact organization, well status, and depth-to-water
  • Whitespace-only lexicon values are treated as blank

Lexicon-backed fields

The following fields are validated against lexicon-backed enums. Matching is case-insensitive and whitespace-tolerant, but strict with respect to the underlying allowed terms:

  • elevation_method
  • depth_source
  • well_pump_type
  • monitoring_status
  • well_status
  • sample_method
  • data_quality
  • monitoring_frequency

Legacy monitoring frequency normalization

The legacy source value monitoring_frequency = Complete is special-cased before validation:

  • monitoring_frequencyNone
  • monitoring_statusNot currently monitored

This prevents Complete from being treated as a real lexicon term.

Contact rules

For each contact block (contact_1_*, contact_2_*):

  • If any contact data is provided, at least one of contact_n_name or contact_n_organization is required, along with contact_n_role and contact_n_type
  • If a phone is provided, its phone type is required
  • If an email is provided, its email type is required
  • If any address fields are provided, the full required set must be present: line 1, type, city, state, and postal code

Water level rules

  • measurement_date_time is required only when depth_to_water_ft is provided
  • Blank depth_to_water_ft is treated as missing, not invalid
  • A row may still create sample/observation records when the attempted depth_to_water_ft is blank, as long as a measurement event exists

Coordinates

  • utm_zone must be 12N or 13N
  • UTM coordinates must resolve to a location within New Mexico

Database and Lexicon Considerations

The source CSV can validate successfully and still fail during persistence if the target DB is missing required lexicon terms. This is especially relevant for contact organizations, status values, and monitoring frequencies. If the DB has not been initialized or updated with the current lexicon set, imports can fail with foreign key errors such as contact_organization_fkey.

For nullable lexicon-backed fields, the importer prefers None over a default value unless the business explicitly wants a specific stored term.


Operational Notes

Test database behavior — BDD tests default to ocotilloapi_test. After schema or lexicon changes, rebuilding the test DB may be necessary to avoid stale state.

Expected summary semantics — the final summary reports processed rows, imported rows, and rows with issues. For reruns, a row may count as successfully handled even when the importer skips recreating already-imported records.


Known Gaps

  • Reruns do not backfill newly introduced fields onto already-imported rows — existing data is skipped, not updated
  • Monitoring-status reasons and other future field additions may require one-time backfill logic, read-path updates, and explicit rerun/update policy decisions

Clone this wiki locally