Skip to content

transfer-fix#520

Merged
jirhiker merged 1 commit intostagingfrom
transfer-fix
Feb 15, 2026
Merged

transfer-fix#520
jirhiker merged 1 commit intostagingfrom
transfer-fix

Conversation

@jirhiker
Copy link
Copy Markdown
Member

Why

This PR addresses the following problem / context:

  • Use bullet points here

How

Implementation summary - the following was changed / added / removed:

  • Use bullet points here

Notes

Any special considerations, workarounds, or follow-up work to note?

  • Use bullet points here

Copilot AI review requested due to automatic review settings February 15, 2026 04:31
@jirhiker jirhiker merged commit 95bfb60 into staging Feb 15, 2026
11 checks passed
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 35b15e4af5

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

field_event_participants.append(contact)
self._last_contacts_reused_count += 1
else:
owner_contact_id = self._owner_contact_id_by_pointid.get(row.PointID)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid static owner-contact lookup during parallel transfer

This lookup now depends on self._owner_contact_id_by_pointid, which is built once in __init__ and never refreshed, so owner fallbacks can become stale while the transfer is running. In this repo, Contacts and WaterLevels are launched together in parallel (transfers/transfer.py lines 509-513), so if contact/association rows are committed after WaterLevels starts, owner-measured rows will be treated as missing contacts and skipped even though the owner association exists later in the same run.

Useful? React with 👍 / 👎.

Comment on lines +103 to +106
self._thing_id_by_pointid = {
name: thing_id
for name, thing_id in session.query(Thing.name, Thing.id).all()
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Detect duplicate Thing names before caching IDs

Building self._thing_id_by_pointid with a dict silently overwrites earlier rows when multiple Thing records share the same name; unlike the previous per-group one_or_none() query, this no longer fails fast on ambiguous matches and can attach all water-level rows for that PointID to an arbitrary Thing. Since Thing.name is not unique in the model, this is a real data-integrity regression whenever duplicate names exist.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the WaterLevels transfer to reduce per-row ORM overhead by caching lookup data and using batched inserts for FieldEvent → FieldActivity → Sample → Observation creation.

Changes:

  • Add in-memory caches for Thing IDs (by PointID) and fallback “owner” contact IDs to avoid repeated DB queries.
  • Replace per-row ORM object creation with batched INSERT ... RETURNING flows to create FieldEvents/Participants/Activities/Samples/Observations more efficiently.
  • Refactor observation construction by extracting shared parsing into _get_observation_parts and adding an insert-row builder.

Comment on lines +456 to 461
def _get_observation_parts(
self, row: pd.Series
) -> tuple[float | None, float | None, str | None]:
if pd.isna(row.MPHeight):
if pd.notna(row.DepthToWater) and pd.notna(row.DepthToWaterBGS):
logger.warning(
Copy link

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_get_observation_parts can compute measuring_point_height = None (e.g., when MPHeight and DepthToWater are NULL). Later in this helper the code adds DepthToWaterBGS + measuring_point_height, which will raise a TypeError when DepthToWaterBGS is present but measuring_point_height is None. Consider treating missing MP height as 0 for that calculation (as the log message suggests), or guard the addition when measuring_point_height is None.

Copilot uses AI. Check for mistakes.
Comment on lines 404 to 406
session.commit()
session.expunge_all()
stats["groups_processed"] += 1
Copy link

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Contacts created during a group are cached in self._created_contacts before the group transaction is committed. If any of the subsequent batch inserts fail and the transaction is rolled back, those cached Contact objects/IDs may not exist in the DB anymore, but later rows/groups can still reuse them and attempt to insert FieldEventParticipant rows referencing missing contact_id values (FK failures or confusing errors). Consider only adding contacts to the cache after a successful commit, or tracking which contacts were created in the current group and removing them from the cache on rollback.

Copilot uses AI. Check for mistakes.
Comment on lines +456 to 461
def _get_observation_parts(
self, row: pd.Series
) -> tuple[float | None, float | None, str | None]:
if pd.isna(row.MPHeight):
if pd.notna(row.DepthToWater) and pd.notna(row.DepthToWaterBGS):
logger.warning(
Copy link

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This helper now centralizes the value/measuring_point_height derivation and has a couple of tricky NULL combinations (e.g., MPHeight NULL + DepthToWater NULL + DepthToWaterBGS present). There are existing unit tests for _make_observation, but none that cover these edge cases in _get_observation_parts; adding a focused test here would help prevent regressions and catch the TypeError scenario.

Copilot uses AI. Check for mistakes.
@TylerAdamMartinez TylerAdamMartinez deleted the transfer-fix branch February 26, 2026 18:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants