refactor: consolidate transferer classes to use ChemistryTransferer a… by jirhiker · Pull Request #478 · DataIntegrationGroup/OcotilloAPI

jirhiker · 2026-02-09T10:23:24Z

…nd remove unused caching logic

Why

This PR addresses the following problem / context:

Use bullet points here

How

Implementation summary - the following was changed / added / removed:

Use bullet points here

Notes

Any special considerations, workarounds, or follow-up work to note?

Use bullet points here

…nd remove unused caching logic

Copilot

Pull request overview

Refactors multiple transferers to centralize row de-duplication and chemistry-specific transfer behavior, while also updating lexicon initialization to use bulk database operations.

Changes:

Consolidates per-transferer _dedupe_rows implementations into Transferer._dedupe_rows with support for composite keys and optional pass-through of missing keys.
Introduces ChemistryTransferer base class to share ChemistrySampleInfo FK caching, filtering, and type-safe row parsing helpers across chemistry transferers.
Reworks lexicon initialization to upsert categories/terms and create associations using bulk SQL operations.

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
transfers/weather_photos.py	Removes local `_dedupe_rows` in favor of base `Transferer` implementation.
transfers/weather_data.py	Uses centralized `_dedupe_rows` with `include_missing=True` for OBJECTID-less rows.
transfers/waterlevelscontinuous_pressure_daily.py	Removes local `_dedupe_rows` in favor of base `Transferer` implementation.
transfers/transferer.py	Adds generalized `_dedupe_rows` and introduces shared `ChemistryTransferer` utilities and caching.
transfers/surface_water_photos.py	Removes local `_dedupe_rows` in favor of base `Transferer` implementation.
transfers/surface_water_data.py	Switches to centralized `_dedupe_rows` with `include_missing=True`.
transfers/radionuclides.py	Migrates to `ChemistryTransferer` and tuple-based iteration with shared safe converters.
transfers/ngwmn_views.py	Uses centralized `_dedupe_rows` with composite conflict key and `include_missing=True`.
transfers/minor_trace_chemistry_transfer.py	Removes local `_dedupe_rows` in favor of base `Transferer` implementation.
transfers/major_chemistry.py	Migrates to `ChemistryTransferer`, removing duplicated caching/filtering/safe parsing logic.
transfers/hydraulicsdata.py	Switches to centralized `_dedupe_rows` default key behavior and removes local dedupe.
transfers/field_parameters_transfer.py	Migrates to `ChemistryTransferer`, removing duplicated caching/filtering/safe parsing logic.
transfers/chemistry_sampleinfo.py	Removes local `_dedupe_rows` in favor of base `Transferer` implementation.
transfers/associated_data.py	Removes local `_dedupe_rows` in favor of base `Transferer` implementation.
core/initializers.py	Refactors lexicon initialization to bulk upsert categories/terms and bulk insert associations.

transfers/transferer.py

transfers/radionuclides.py

Copilot · 2026-02-09T10:26:02Z

core/initializers.py

+        association_rows = []
+        for term_dict in terms:
+            term_id = existing_terms.get(term_dict["term"])
+            if term_id is None:
+                continue
+            for category in term_dict["categories"]:
+                category_id = existing_categories.get(category)
+                if category_id is None:
+                    continue
+                key = (term_id, category_id)
+                if key in existing_links:
+                    continue
+                association_rows.append(
+                    {"term_id": term_id, "category_id": category_id}
                )

-                session.rollback()
+        if association_rows:
+            session.execute(
+                insert(LexiconTermCategoryAssociation).values(association_rows)
+            )
+            session.commit()


association_rows can contain duplicate (term_id, category_id) pairs within the same run (e.g., if the JSON repeats a category for a term, or terms are duplicated), and this insert has no conflict handling—potentially causing a unique-constraint failure. Consider deduping in-memory with a seen set while building association_rows, or using an ON CONFLICT DO NOTHING strategy if the association table has a unique constraint.

transfers/transferer.py

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c8163c75bf

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

transfers/radionuclides.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 15 out of 15 changed files in this pull request and generated 2 comments.

transfers/transferer.py

transfers/radionuclides.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 15 out of 15 changed files in this pull request and generated 4 comments.

transfers/transferer.py

Copilot · 2026-02-09T10:35:13Z

transfers/transferer.py

+        """Dedupe rows by unique key(s) to avoid ON CONFLICT loops. Later rows win."""
+        deduped: dict[Any, dict[str, Any]] = {}
+        passthrough: list[dict[str, Any]] = []
+        key_list = key if isinstance(key, list) else [key]
+
+        for row in rows:
+            if len(key_list) == 1:
+                row_key = row.get(key_list[0])
+            else:
+                row_key = tuple(row.get(k) for k in key_list)
+
+            # Treat None and any pd.isna(...) value (e.g., NaN) as missing keys
+            if isinstance(row_key, tuple):
+                is_missing = any(pd.isna(k) for k in row_key)
+            else:
+                is_missing = pd.isna(row_key)
+
+            if is_missing:
+                if include_missing:
+                    passthrough.append(row)
+                continue
+
+            deduped[row_key] = row


The docstring says “Later rows win”, which is true for the value, but dict overwrites do not change insertion order—so the returned list order for duplicated keys corresponds to the first occurrence, not the last. If downstream batching/logging depends on row order, consider rebuilding the insertion order for overwritten keys (e.g., delete+reinsert the key) or clarifying in the docstring that only the values are from the latest row.

core/initializers.py

marissafichera · 2026-02-10T00:17:00Z

transfers/minor_trace_chemistry_transfer.py

can you add the SamplePointID to this transfer?

marissafichera

approving but it'd be great if you went ahead and transferred the SamplePointID from the minor and trace chemistry table. can happen later though

refactor: consolidate transferer classes to use ChemistryTransferer a…

c8163c7

…nd remove unused caching logic

Copilot AI review requested due to automatic review settings February 9, 2026 10:23

Copilot AI reviewed Feb 9, 2026

View reviewed changes

chatgpt-codex-connector bot reviewed Feb 9, 2026

View reviewed changes

transfers/radionuclides.py Show resolved Hide resolved

Update transfers/transferer.py

1349df4

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings February 9, 2026 10:30

Copilot AI reviewed Feb 9, 2026

View reviewed changes

transfers/transferer.py Show resolved Hide resolved

transfers/radionuclides.py Outdated Show resolved Hide resolved

Update transfers/radionuclides.py

3620703

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings February 9, 2026 10:32

Update transfers/transferer.py

decf815

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot AI reviewed Feb 9, 2026

View reviewed changes

jirhiker requested a review from marissafichera February 9, 2026 23:56

marissafichera reviewed Feb 10, 2026

View reviewed changes

transfers/minor_trace_chemistry_transfer.py

Copy link
Copy Markdown

Contributor

marissafichera Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add the SamplePointID to this transfer?

marissafichera approved these changes Feb 10, 2026

View reviewed changes

marissafichera merged commit 3889fc1 into staging Feb 10, 2026
6 checks passed

Conversation

jirhiker commented Feb 9, 2026

Why

How

Notes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

marissafichera Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

marissafichera left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants