Skip to content

refactor: consolidate transferer classes to use ChemistryTransferer a…#478

Merged
marissafichera merged 4 commits intostagingfrom
transfer-fix
Feb 10, 2026
Merged

refactor: consolidate transferer classes to use ChemistryTransferer a…#478
marissafichera merged 4 commits intostagingfrom
transfer-fix

Conversation

@jirhiker
Copy link
Copy Markdown
Member

@jirhiker jirhiker commented Feb 9, 2026

…nd remove unused caching logic

Why

This PR addresses the following problem / context:

  • Use bullet points here

How

Implementation summary - the following was changed / added / removed:

  • Use bullet points here

Notes

Any special considerations, workarounds, or follow-up work to note?

  • Use bullet points here

Copilot AI review requested due to automatic review settings February 9, 2026 10:23
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refactors multiple transferers to centralize row de-duplication and chemistry-specific transfer behavior, while also updating lexicon initialization to use bulk database operations.

Changes:

  • Consolidates per-transferer _dedupe_rows implementations into Transferer._dedupe_rows with support for composite keys and optional pass-through of missing keys.
  • Introduces ChemistryTransferer base class to share ChemistrySampleInfo FK caching, filtering, and type-safe row parsing helpers across chemistry transferers.
  • Reworks lexicon initialization to upsert categories/terms and create associations using bulk SQL operations.

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
transfers/weather_photos.py Removes local _dedupe_rows in favor of base Transferer implementation.
transfers/weather_data.py Uses centralized _dedupe_rows with include_missing=True for OBJECTID-less rows.
transfers/waterlevelscontinuous_pressure_daily.py Removes local _dedupe_rows in favor of base Transferer implementation.
transfers/transferer.py Adds generalized _dedupe_rows and introduces shared ChemistryTransferer utilities and caching.
transfers/surface_water_photos.py Removes local _dedupe_rows in favor of base Transferer implementation.
transfers/surface_water_data.py Switches to centralized _dedupe_rows with include_missing=True.
transfers/radionuclides.py Migrates to ChemistryTransferer and tuple-based iteration with shared safe converters.
transfers/ngwmn_views.py Uses centralized _dedupe_rows with composite conflict key and include_missing=True.
transfers/minor_trace_chemistry_transfer.py Removes local _dedupe_rows in favor of base Transferer implementation.
transfers/major_chemistry.py Migrates to ChemistryTransferer, removing duplicated caching/filtering/safe parsing logic.
transfers/hydraulicsdata.py Switches to centralized _dedupe_rows default key behavior and removes local dedupe.
transfers/field_parameters_transfer.py Migrates to ChemistryTransferer, removing duplicated caching/filtering/safe parsing logic.
transfers/chemistry_sampleinfo.py Removes local _dedupe_rows in favor of base Transferer implementation.
transfers/associated_data.py Removes local _dedupe_rows in favor of base Transferer implementation.
core/initializers.py Refactors lexicon initialization to bulk upsert categories/terms and bulk insert associations.

Comment on lines +170 to +190
association_rows = []
for term_dict in terms:
term_id = existing_terms.get(term_dict["term"])
if term_id is None:
continue
for category in term_dict["categories"]:
category_id = existing_categories.get(category)
if category_id is None:
continue
key = (term_id, category_id)
if key in existing_links:
continue
association_rows.append(
{"term_id": term_id, "category_id": category_id}
)

session.rollback()
if association_rows:
session.execute(
insert(LexiconTermCategoryAssociation).values(association_rows)
)
session.commit()
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

association_rows can contain duplicate (term_id, category_id) pairs within the same run (e.g., if the JSON repeats a category for a term, or terms are duplicated), and this insert has no conflict handling—potentially causing a unique-constraint failure. Consider deduping in-memory with a seen set while building association_rows, or using an ON CONFLICT DO NOTHING strategy if the association table has a unique constraint.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c8163c75bf

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings February 9, 2026 10:30
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 15 out of 15 changed files in this pull request and generated 2 comments.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings February 9, 2026 10:32
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 15 out of 15 changed files in this pull request and generated 4 comments.

Comment on lines +152 to +174
"""Dedupe rows by unique key(s) to avoid ON CONFLICT loops. Later rows win."""
deduped: dict[Any, dict[str, Any]] = {}
passthrough: list[dict[str, Any]] = []
key_list = key if isinstance(key, list) else [key]

for row in rows:
if len(key_list) == 1:
row_key = row.get(key_list[0])
else:
row_key = tuple(row.get(k) for k in key_list)

# Treat None and any pd.isna(...) value (e.g., NaN) as missing keys
if isinstance(row_key, tuple):
is_missing = any(pd.isna(k) for k in row_key)
else:
is_missing = pd.isna(row_key)

if is_missing:
if include_missing:
passthrough.append(row)
continue

deduped[row_key] = row
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring says “Later rows win”, which is true for the value, but dict overwrites do not change insertion order—so the returned list order for duplicated keys corresponds to the first occurrence, not the last. If downstream batching/logging depends on row order, consider rebuilding the insertion order for overwritten keys (e.g., delete+reinsert the key) or clarifying in the docstring that only the values are from the latest row.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add the SamplePointID to this transfer?

Copy link
Copy Markdown
Contributor

@marissafichera marissafichera left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approving but it'd be great if you went ahead and transferred the SamplePointID from the minor and trace chemistry table. can happen later though

@marissafichera marissafichera merged commit 3889fc1 into staging Feb 10, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants