fix(transfers): normalize OwnerKey joins with mapper and collision guard by ksmuczynski · Pull Request #482 · DataIntegrationGroup/OcotilloAPI

ksmuczynski · 2026-02-10T18:42:55Z

Why

This PR addresses the following problem / context:

Contact transfers can silently drop rows when OwnerKey casing differs between OwnersData and OwnerLink
We need a safe, traceable way to resolve case-only collisions without editing source CSVs

How

Implementation summary - the following was changed / added / removed:

Added a mapping file to unify/standardize known OwnerKey spelling variants
Made the OwnersData ↔ OwnerLink match tolerant to differences in letter casing
Added a safety check that stops the transfer and shows examples when matches are ambiguous
Documented the mapping file in README.md

Notes

Any special considerations, workarounds, or follow-up work to note?

Update transfers/data/owners_ownerkey_mapper.json when new OwnerKey collisions are discovered.
Collision guard will raise if standardization still leaves ambiguous keys.
Resolves Contact transfer drops rows when OwnerKey case doesn’t match OwnerLink #479.
After the next transfer, WL-0359 should be linked to owner "John Kadlecek".

- add owners_ownerkey_mapper.json for canonical OwnerKey mapping - apply canonicalization + casefold normalization before OwnerLink join - fail fast on normalization collisions with actionable logging - document the mapping file in README

Copilot

Pull request overview

Adds case-insensitive, collision-guarded OwnerKey joining for contact transfers by introducing an optional canonicalization mapper and normalizing keys prior to joining.

Changes:

Introduced owners_ownerkey_mapper.json and loader hook in ContactTransfer.__init__.
Normalized OwnerKey values with canonicalization + strip().casefold() and added a collision guard.
Updated join to use ownerkey_norm and documented mapper usage in README.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File	Description
transfers/data/owners_ownerkey_mapper.json	Adds a canonicalization mapping to resolve known `OwnerKey` case/spelling variants.
transfers/contact_transfer.py	Loads mapper, canonicalizes + normalizes OwnerKeys, checks collisions, and joins on normalized key.
README.md	Documents remediation steps when collisions are detected.

transfers/contact_transfer.py

Copilot · 2026-02-10T18:44:18Z

transfers/contact_transfer.py

+            odf["ownerkey_canonical"] = odf[owner_key_col].map(
+                lambda v: self._ownerkey_mapper.get(v, v)
+            )
+            ldf["ownerkey_canonical"] = ldf[link_owner_key_col].map(
+                lambda v: self._ownerkey_mapper.get(v, v)
+            )


Using .map(lambda ...) applies a Python callback per row, which is slower on large DataFrames. Prefer a vectorized approach such as mapping with the dict directly and filling missing values (e.g., s.map(mapper).fillna(s)), applied to both series.

Suggested change

odf["ownerkey_canonical"] = odf[owner_key_col].map(

lambda v: self._ownerkey_mapper.get(v, v)

)

ldf["ownerkey_canonical"] = ldf[link_owner_key_col].map(

lambda v: self._ownerkey_mapper.get(v, v)

)

mapped_odf = odf[owner_key_col].map(self._ownerkey_mapper)

odf["ownerkey_canonical"] = mapped_odf.fillna(odf[owner_key_col])

mapped_ldf = ldf[link_owner_key_col].map(self._ownerkey_mapper)

ldf["ownerkey_canonical"] = mapped_ldf.fillna(ldf[link_owner_key_col])

- Drop overlapping OwnerLink columns before joining on normalized OwnerKey to prevent “columns overlap” errors during contact transfer.

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

transfers/contact_transfer.py

…ation - Replaced next(...endswith()) logic with explicit name matching and count validation for OwnerKey. - Impact: Prevents silent data corruption caused by non-deterministic column selection when multiple similar keys exist.

Logs the expected path on FileNotFoundError to make missing mappings visible and easier to diagnose. Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

…tact' into kas-bdms-537-resolve-missing-contact

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

transfers/contact_transfer.py

Copilot · 2026-02-11T18:10:15Z

transfers/contact_transfer.py

+            col for col in odf.columns if col.lower().endswith("ownerkey")
+        )
+        link_owner_key_col = next(
+            col for col in ldf.columns if col.lower().endswith("ownerkey")
+        )


next(...) without a default will raise StopIteration with a confusing traceback if no matching column exists. Consider providing a default (e.g., None) and raising a ValueError that clearly explains which expected column pattern was missing and what columns were found.

Suggested change

col for col in odf.columns if col.lower().endswith("ownerkey")

)

link_owner_key_col = next(

col for col in ldf.columns if col.lower().endswith("ownerkey")

)

(col for col in odf.columns if col.lower().endswith("ownerkey")),

None,

)

if owner_key_col is None:

raise ValueError(

"Expected a column whose name ends with 'ownerkey' "

"(case-insensitive) in OwnersData, but found columns: "

f"{list(odf.columns)}"

)

link_owner_key_col = next(

(col for col in ldf.columns if col.lower().endswith("ownerkey")),

None,

)

if link_owner_key_col is None:

raise ValueError(

"Expected a column whose name ends with 'ownerkey' "

"(case-insensitive) in OwnerLink, but found columns: "

f"{list(ldf.columns)}"

)

transfers/contact_transfer.py

Copilot · 2026-02-11T18:10:15Z

transfers/contact_transfer.py

+        odf["ownerkey_norm"] = (
+            odf["ownerkey_canonical"]
+            .fillna("")
+            .astype(str)
+            .str.strip()
+            .str.casefold()
+            .replace({"": pd.NA})
+        )
+        ldf["ownerkey_norm"] = (
+            ldf["ownerkey_canonical"]
+            .fillna("")
+            .astype(str)
+            .str.strip()
+            .str.casefold()
+            .replace({"": pd.NA})
+        )


The helper columns ownerkey_canonical and ownerkey_norm are added to odf and can leak into subsequent processing/output. If they’re only intended for joining, consider dropping them after the join (or renaming to clearly-internal names like _ownerkey_norm) to keep the resulting dataframe schema stable.

Use vectorized replacement instead of per-row lambdas. Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

transfers/contact_transfer.py

Stops if multiple case-variant OwnerKey columns are present. Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

Copilot · 2026-02-11T18:54:32Z

transfers/contact_transfer.py

        with open(co_to_org_mapper_path, "r") as f:
            self._co_to_org_mapper = json.load(f)

+        ownerkey_mapper_path = get_transfers_data_path("owners_ownerkey_mapper.json")
+        try:
+            with open(ownerkey_mapper_path, "r") as f:
+                self._ownerkey_mapper = json.load(f)
+        except FileNotFoundError:
+            logger.warning(
+                "Owner key mapper file not found at '%s'; proceeding with empty owner key mapping.",
+                ownerkey_mapper_path,
+            )
+            self._ownerkey_mapper = {}


JSON files should be opened with an explicit encoding to avoid platform-dependent defaults. Use encoding=\"utf-8\" for both mapper reads.

Copilot · 2026-02-11T18:54:32Z

transfers/contact_transfer.py

+        odf["ownerkey_norm"] = (
+            odf["ownerkey_canonical"]
+            .fillna("")
+            .astype(str)
+            .str.strip()
+            .str.casefold()
+            .replace({"": pd.NA})
+        )
+        ldf["ownerkey_norm"] = (
+            ldf["ownerkey_canonical"]
+            .fillna("")
+            .astype(str)
+            .str.strip()
+            .str.casefold()
+            .replace({"": pd.NA})
+        )


The normalization logic is duplicated for odf and ldf, which increases the chance of future drift (e.g., one side adding a transform but not the other). Consider extracting this into a small helper (e.g., _normalize_ownerkey(series)) and using it for both frames.

transfers/contact_transfer.py

ksmuczynski requested review from Copilot, jacob-a-brown and jirhiker February 10, 2026 18:42

Copilot AI reviewed Feb 10, 2026

View reviewed changes

fix(transfers): avoid column collisions in contact OwnerLink join

dad386c

- Drop overlapping OwnerLink columns before joining on normalized OwnerKey to prevent “columns overlap” errors during contact transfer.

ksmuczynski requested a review from Copilot February 11, 2026 00:18

Copilot AI reviewed Feb 11, 2026

View reviewed changes

transfers/contact_transfer.py Outdated Show resolved Hide resolved

transfers/contact_transfer.py Show resolved Hide resolved

ksmuczynski and others added 2 commits February 11, 2026 11:01

fix(transfers): warn when owner key mapper is missing

19d80b2

Logs the expected path on FileNotFoundError to make missing mappings visible and easier to diagnose. Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings February 11, 2026 18:09

Merge remote-tracking branch 'origin/kas-bdms-537-resolve-missing-con…

419eddf

…tact' into kas-bdms-537-resolve-missing-contact

Copilot AI reviewed Feb 11, 2026

View reviewed changes

perf(transfers): speed up owner key mapping

1e4b777

Use vectorized replacement instead of per-row lambdas. Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings February 11, 2026 18:37

Copilot AI reviewed Feb 11, 2026

View reviewed changes

fix(transfers): avoid unclear owner key choice

5b10a05

Stops if multiple case-variant OwnerKey columns are present. Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings February 11, 2026 18:52

Copilot AI reviewed Feb 11, 2026

View reviewed changes

jirhiker approved these changes Feb 12, 2026

View reviewed changes

jirhiker merged commit 5e5ec61 into staging Feb 12, 2026
6 checks passed

TylerAdamMartinez deleted the kas-bdms-537-resolve-missing-contact branch February 26, 2026 18:25

-            col for col in odf.columns if col.lower().endswith("ownerkey")
-        )
-        link_owner_key_col = next(
-            col for col in ldf.columns if col.lower().endswith("ownerkey")
-        )
+            (col for col in odf.columns if col.lower().endswith("ownerkey")),
+            None,
+        )
+        if owner_key_col is None:
+            raise ValueError(
+                "Expected a column whose name ends with 'ownerkey' "
+                "(case-insensitive) in OwnersData, but found columns: "
+                f"{list(odf.columns)}"
+            )
+        link_owner_key_col = next(
+            (col for col in ldf.columns if col.lower().endswith("ownerkey")),
+            None,
+        )
+        if link_owner_key_col is None:
+            raise ValueError(
+                "Expected a column whose name ends with 'ownerkey' "
+                "(case-insensitive) in OwnerLink, but found columns: "
+                f"{list(ldf.columns)}"
+            )

Conversation

ksmuczynski commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why

How

Notes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ksmuczynski commented Feb 10, 2026 •

edited

Loading