Skip to content

fix(transfers): normalize OwnerKey joins with mapper and collision guard#482

Merged
jirhiker merged 7 commits intostagingfrom
kas-bdms-537-resolve-missing-contact
Feb 12, 2026
Merged

fix(transfers): normalize OwnerKey joins with mapper and collision guard#482
jirhiker merged 7 commits intostagingfrom
kas-bdms-537-resolve-missing-contact

Conversation

@ksmuczynski
Copy link
Copy Markdown
Contributor

@ksmuczynski ksmuczynski commented Feb 10, 2026

Why

This PR addresses the following problem / context:

  • Contact transfers can silently drop rows when OwnerKey casing differs between OwnersData and OwnerLink
  • We need a safe, traceable way to resolve case-only collisions without editing source CSVs

How

Implementation summary - the following was changed / added / removed:

  1. Added a mapping file to unify/standardize known OwnerKey spelling variants
  2. Made the OwnersData ↔ OwnerLink match tolerant to differences in letter casing
  3. Added a safety check that stops the transfer and shows examples when matches are ambiguous
  4. Documented the mapping file in README.md

Notes

Any special considerations, workarounds, or follow-up work to note?

- add owners_ownerkey_mapper.json for canonical OwnerKey mapping
- apply canonicalization + casefold normalization before OwnerLink join
- fail fast on normalization collisions with actionable logging
- document the mapping file in README
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds case-insensitive, collision-guarded OwnerKey joining for contact transfers by introducing an optional canonicalization mapper and normalizing keys prior to joining.

Changes:

  • Introduced owners_ownerkey_mapper.json and loader hook in ContactTransfer.__init__.
  • Normalized OwnerKey values with canonicalization + strip().casefold() and added a collision guard.
  • Updated join to use ownerkey_norm and documented mapper usage in README.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
transfers/data/owners_ownerkey_mapper.json Adds a canonicalization mapping to resolve known OwnerKey case/spelling variants.
transfers/contact_transfer.py Loads mapper, canonicalizes + normalizes OwnerKeys, checks collisions, and joins on normalized key.
README.md Documents remediation steps when collisions are detected.

Comment on lines +96 to +101
odf["ownerkey_canonical"] = odf[owner_key_col].map(
lambda v: self._ownerkey_mapper.get(v, v)
)
ldf["ownerkey_canonical"] = ldf[link_owner_key_col].map(
lambda v: self._ownerkey_mapper.get(v, v)
)
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using .map(lambda ...) applies a Python callback per row, which is slower on large DataFrames. Prefer a vectorized approach such as mapping with the dict directly and filling missing values (e.g., s.map(mapper).fillna(s)), applied to both series.

Suggested change
odf["ownerkey_canonical"] = odf[owner_key_col].map(
lambda v: self._ownerkey_mapper.get(v, v)
)
ldf["ownerkey_canonical"] = ldf[link_owner_key_col].map(
lambda v: self._ownerkey_mapper.get(v, v)
)
mapped_odf = odf[owner_key_col].map(self._ownerkey_mapper)
odf["ownerkey_canonical"] = mapped_odf.fillna(odf[owner_key_col])
mapped_ldf = ldf[link_owner_key_col].map(self._ownerkey_mapper)
ldf["ownerkey_canonical"] = mapped_ldf.fillna(ldf[link_owner_key_col])

Copilot uses AI. Check for mistakes.
- Drop overlapping OwnerLink columns before joining on normalized OwnerKey
to prevent “columns overlap” errors during contact transfer.
@ksmuczynski ksmuczynski requested a review from Copilot February 11, 2026 00:18
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

ksmuczynski and others added 2 commits February 11, 2026 11:01
…ation

- Replaced next(...endswith()) logic with explicit name matching and count validation for OwnerKey.
- Impact: Prevents silent data corruption caused by non-deterministic column selection when multiple similar keys exist.
Logs the expected path on FileNotFoundError to make missing mappings visible and easier to diagnose.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings February 11, 2026 18:09
…tact' into kas-bdms-537-resolve-missing-contact
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

Comment on lines +93 to +97
col for col in odf.columns if col.lower().endswith("ownerkey")
)
link_owner_key_col = next(
col for col in ldf.columns if col.lower().endswith("ownerkey")
)
Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

next(...) without a default will raise StopIteration with a confusing traceback if no matching column exists. Consider providing a default (e.g., None) and raising a ValueError that clearly explains which expected column pattern was missing and what columns were found.

Suggested change
col for col in odf.columns if col.lower().endswith("ownerkey")
)
link_owner_key_col = next(
col for col in ldf.columns if col.lower().endswith("ownerkey")
)
(col for col in odf.columns if col.lower().endswith("ownerkey")),
None,
)
if owner_key_col is None:
raise ValueError(
"Expected a column whose name ends with 'ownerkey' "
"(case-insensitive) in OwnersData, but found columns: "
f"{list(odf.columns)}"
)
link_owner_key_col = next(
(col for col in ldf.columns if col.lower().endswith("ownerkey")),
None,
)
if link_owner_key_col is None:
raise ValueError(
"Expected a column whose name ends with 'ownerkey' "
"(case-insensitive) in OwnerLink, but found columns: "
f"{list(ldf.columns)}"
)

Copilot uses AI. Check for mistakes.
Comment on lines +110 to +125
odf["ownerkey_norm"] = (
odf["ownerkey_canonical"]
.fillna("")
.astype(str)
.str.strip()
.str.casefold()
.replace({"": pd.NA})
)
ldf["ownerkey_norm"] = (
ldf["ownerkey_canonical"]
.fillna("")
.astype(str)
.str.strip()
.str.casefold()
.replace({"": pd.NA})
)
Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The helper columns ownerkey_canonical and ownerkey_norm are added to odf and can leak into subsequent processing/output. If they’re only intended for joining, consider dropping them after the join (or renaming to clearly-internal names like _ownerkey_norm) to keep the resulting dataframe schema stable.

Copilot uses AI. Check for mistakes.
Use vectorized replacement instead of per-row lambdas.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings February 11, 2026 18:37
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

Stops if multiple case-variant OwnerKey columns are present.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings February 11, 2026 18:52
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

Comment on lines 82 to +94
with open(co_to_org_mapper_path, "r") as f:
self._co_to_org_mapper = json.load(f)

ownerkey_mapper_path = get_transfers_data_path("owners_ownerkey_mapper.json")
try:
with open(ownerkey_mapper_path, "r") as f:
self._ownerkey_mapper = json.load(f)
except FileNotFoundError:
logger.warning(
"Owner key mapper file not found at '%s'; proceeding with empty owner key mapping.",
ownerkey_mapper_path,
)
self._ownerkey_mapper = {}
Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JSON files should be opened with an explicit encoding to avoid platform-dependent defaults. Use encoding=\"utf-8\" for both mapper reads.

Copilot uses AI. Check for mistakes.
Comment on lines +131 to +146
odf["ownerkey_norm"] = (
odf["ownerkey_canonical"]
.fillna("")
.astype(str)
.str.strip()
.str.casefold()
.replace({"": pd.NA})
)
ldf["ownerkey_norm"] = (
ldf["ownerkey_canonical"]
.fillna("")
.astype(str)
.str.strip()
.str.casefold()
.replace({"": pd.NA})
)
Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The normalization logic is duplicated for odf and ldf, which increases the chance of future drift (e.g., one side adding a transform but not the other). Consider extracting this into a small helper (e.g., _normalize_ownerkey(series)) and using it for both frames.

Copilot uses AI. Check for mistakes.
@jirhiker jirhiker merged commit 5e5ec61 into staging Feb 12, 2026
6 checks passed
@TylerAdamMartinez TylerAdamMartinez deleted the kas-bdms-537-resolve-missing-contact branch February 26, 2026 18:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Contact transfer drops rows when OwnerKey case doesn’t match OwnerLink

3 participants