Skip to content

fix: improve error handling and logging for recording interval estimation feat: add auto-generation prefix extraction for well IDs with new regex support#531

Merged
jirhiker merged 46 commits intostagingfrom
transfer-fix-review-feedback
Feb 16, 2026
Merged

fix: improve error handling and logging for recording interval estimation feat: add auto-generation prefix extraction for well IDs with new regex support#531
jirhiker merged 46 commits intostagingfrom
transfer-fix-review-feedback

Conversation

@jirhiker
Copy link
Copy Markdown
Member

@jirhiker jirhiker commented Feb 16, 2026

Summary

This PR improves well inventory auto-generation prefix handling and hardens transfer logging/error behavior for sensors and wells.

Changes

  1. Well inventory auto-generation prefix extraction
  • Added _extract_autogen_prefix support for:
    • direct prefixes like XY-
    • placeholder tokens like WL-XXXX, SAC-XXXX
    • optional whitespace normalization around hyphen (e.g. ABC -xxxx -> ABC-)
    • blank input defaulting to NM-
  • Expanded tests to cover supported/unsupported forms, including upper-bound rejection for 4-letter prefixes (ABCD- -> None).
  1. ConstructionMethod normalization in well transfer
  • In /Users/jross/Programming/DIG/OcotilloAPI/transfers/well_transfer.py, row.ConstructionMethod is now stripped before lexicon lookup.
  • This prevents lookup misses caused by leading/trailing whitespace in legacy CSV values.
  1. Sensor recording interval error handling/logging
  • In /Users/jross/Programming/DIG/OcotilloAPI/transfers/sensor_transfer.py, recording interval estimation failure handling was tightened:
    • clearer capture/log message when interval estimation fails
    • error context now explicitly states estimation failure and includes estimator error details
  • Improves observability when RecordingInterval is invalid/missing and fallback estimation cannot derive a value.

Why

  • Reduce false negatives from messy source data (ConstructionMethod whitespace).
  • Improve diagnostics for sensor transfer failures.
  • Make well ID auto-generation rules explicit, testable, and resilient to input variation.

Validation

  • Added/updated unit coverage in /Users/jross/Programming/DIG/OcotilloAPI/tests/test_well_inventory.py.
  • Local targeted test run was attempted but blocked in this environment by missing utm dependency during test collection.

jirhiker and others added 30 commits February 14, 2026 23:19
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…tion

feat: add auto-generation prefix extraction for well IDs with new regex support
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3c11d05927

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request aims to improve error handling and logging for recording interval estimation in sensor transfers, and add auto-generation prefix extraction for well IDs with new regex support. However, the implementation is incomplete and contains critical bugs.

Changes:

  • Added error message improvements for recording interval estimation failures in sensor transfers
  • Introduced new autogen prefix extraction function for well IDs with support for 2-3 letter uppercase prefixes
  • Added .strip() call for ConstructionMethod to handle whitespace in well transfer processing
  • Commented out the old _step method in well_transfer.py (145 lines)

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.

File Description
transfers/sensor_transfer.py Enhanced error logging for recording interval estimation with more detailed error messages, but introduced inverted conditional logic
services/well_inventory_csv.py Added incomplete autogen prefix extraction function with new regex patterns and duplicate imports
tests/test_well_inventory.py Added comprehensive test cases for the new prefix extraction feature that will fail due to incomplete implementation
transfers/well_transfer.py Commented out old _step method and added whitespace handling for ConstructionMethod field
Comments suppressed due to low confidence (1)

transfers/sensor_transfer.py:215

  • The condition logic appears to be inverted. The code logs success and captures an "estimated" error when recording_interval is None, but logs a critical error when recording_interval has a value. This is backwards - when recording_interval is None (the error case), the else block at line 217 should execute, and when recording_interval has a value (success), the if block should execute. The condition should be "if recording_interval is not None:" instead of "if recording_interval is None:".
            if recording_interval is None:
                recording_interval_unit = unit
                logger.info(
                    f"name={sensor.name}, serial_no={sensor.serial_no}. "
                    f"estimated recording interval: {recording_interval} {unit}"
                )
                self._capture_error(
                    pointid,
                    f"Estimated recording interval={recording_interval} {unit}. Is this correct?",
                    "RecordingInterval",
                )

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings February 16, 2026 19:34
jirhiker and others added 2 commits February 16, 2026 12:37
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

Comments suppressed due to low confidence (2)

transfers/well_transfer.py:420

  • This large block of commented-out code (142 lines) should be removed rather than left in the codebase. Commented-out code creates maintenance burden and confusion. If this code needs to be preserved for reference, it should be retrievable from version control history.
    # def _step(self, session: Session, df: pd.DataFrame, i: int, row: pd.Series):
    #
    #     try:
    #         first_visit_date = get_first_visit_date(row)
    #         well_purposes = (
    #             [] if isna(row.CurrentUse) else self._extract_well_purposes(row)
    #         )
    #         well_casing_materials = (
    #             [] if isna(row.CasingDescription) else extract_casing_materials(row)
    #         )
    #         well_pump_type = extract_well_pump_type(row)
    #
    #         wcm = None
    #         if notna(row.ConstructionMethod):
    #             wcm = self._get_lexicon_value(
    #                 row, f"LU_ConstructionMethod:{row.ConstructionMethod}", "Unknown"
    #             )
    #
    #         mpheight = row.MPHeight
    #         mpheight_description = row.MeasuringPoint
    #         if mpheight is None:
    #             mphs = self._measuring_point_estimator.estimate_measuring_point_height(
    #                 row
    #             )
    #             if mphs:
    #                 try:
    #                     mpheight = mphs[0][0]
    #                     mpheight_description = mphs[1][0]
    #                 except IndexError:
    #                     if self.verbose:
    #                         logger.warning(
    #                             f"Measuring point height estimation failed for well {row.PointID}, {mphs}"
    #                         )
    #
    #         data = CreateWell(
    #             location_id=0,
    #             name=row.PointID,
    #             first_visit_date=first_visit_date,
    #             hole_depth=row.HoleDepth,
    #             well_depth=row.WellDepth,
    #             well_casing_diameter=(
    #                 row.CasingDiameter * 12 if row.CasingDiameter else None
    #             ),
    #             well_casing_depth=row.CasingDepth,
    #             release_status="public" if row.PublicRelease else "private",
    #             measuring_point_height=mpheight,
    #             measuring_point_description=mpheight_description,
    #             notes=(
    #                 [{"content": row.Notes, "note_type": "General"}]
    #                 if row.Notes
    #                 else []
    #             ),
    #             well_completion_date=row.CompletionDate,
    #             well_driller_name=row.DrillerName,
    #             well_construction_method=wcm,
    #             well_pump_type=well_pump_type,
    #         )
    #
    #         CreateWell.model_validate(data)
    #     except ValidationError as e:
    #         self._capture_validation_error(row.PointID, e)
    #         return
    #
    #     well = None
    #     try:
    #         well_data = data.model_dump(exclude=EXCLUDED_FIELDS)
    #         well_data["thing_type"] = "water well"
    #         well_data["nma_pk_welldata"] = row.WellID
    #         well_data["nma_pk_location"] = row.LocationId
    #
    #         well = Thing(**well_data)
    #         session.add(well)
    #
    #         if well_purposes:
    #             for wp in well_purposes:
    #                 # TODO: add validation logic here
    #                 if wp in WellPurposeEnum:
    #                     wp_obj = WellPurpose(thing=well, purpose=wp)
    #                     session.add(wp_obj)
    #                 else:
    #                     logger.critical(f"{well.name}. Invalid well purpose: {wp}")
    #
    #         if well_casing_materials:
    #             for wcm in well_casing_materials:
    #                 # TODO: add validation logic here
    #                 if wcm in WellCasingMaterialEnum:
    #                     wcm_obj = WellCasingMaterial(thing=well, material=wcm)
    #                     session.add(wcm_obj)
    #                 else:
    #                     logger.critical(
    #                         f"{well.name}. Invalid well casing material: {wcm}"
    #                     )
    #     except Exception as e:
    #         if well is not None:
    #             session.expunge(well)
    #
    #         self._capture_error(row.PointID, str(e), "UnknownField")
    #
    #         logger.critical(f"Error creating well for {row.PointID}: {e}")
    #         return
    #
    #     try:
    #         location, elevation_method, notes = make_location(
    #             row, self._cached_elevations
    #         )
    #         session.add(location)
    #         # session.flush()
    #         self._added_locations[row.PointID] = (elevation_method, notes)
    #     except Exception as e:
    #         import traceback
    #
    #         traceback.print_exc()
    #         self._capture_error(row.PointID, str(e), str(e), "Location")
    #         logger.critical(f"Error making location for {row.PointID}: {e}")
    #
    #         return
    #
    def _extract_well_purposes(self, row) -> list[str]:
        cu = row.CurrentUse

        if isna(cu):
            return []
        else:
            purposes = []
            for cui in cu:
                if cui == "A":
                    # skip "Open, unequipped well" as that gets mapped to the status_history table
                    continue
                p = self._get_lexicon_value(row, f"LU_CurrentUse:{cui}")
                if p is not None:
                    purposes.append(p)
            return purposes

    def _add_formation_zone(self, row, well, formations):
        # --- Set Formation Completion (NOT depth-based stratigraphy) ---
        # This simply records which formation the well was completed in.
        # For detailed depth-interval stratigraphy, see stratigraphy_transfer.py

        formation_code = row.FormationZone

transfers/well_transfer.py:649

  • The PR description states that ConstructionMethod normalization prevents lookup misses caused by whitespace, but the implementation only strips leading/trailing whitespace. It does not handle internal whitespace that might exist in legacy CSV values (e.g., "Hand Dug" vs "HandDug"). Consider whether internal whitespace normalization is also needed for robust matching.
                name=row.PointID,
                first_visit_date=first_visit_date,
                hole_depth=row.HoleDepth,
                well_depth=row.WellDepth,

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings February 16, 2026 19:39
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

Comments suppressed due to low confidence (1)

transfers/sensor_transfer.py:209

  • The success/failure branches for recording interval estimation are inverted. When estimate_recording_interval fails it appears to return recording_interval=None (with an error), but the current code treats None as a successful estimate and treats non-None as a failure. Swap the condition to check for a non-None interval (and only log/capture the "Estimated recording interval" message on success).
            if recording_interval is not None:
                recording_interval_unit = unit
                logger.info(
                    f"name={sensor.name}, serial_no={sensor.serial_no}. "
                    f"estimated recording interval: {recording_interval} {unit}"

jirhiker and others added 9 commits February 16, 2026 12:44
Improve well-inventory CLI feedback and validation handling; add real-user CSV feature coverage
…tion

feat: add auto-generation prefix extraction for well IDs with new regex support
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@jirhiker jirhiker merged commit db0dc8f into staging Feb 16, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants