Skip to content

fix: resolve duplicate uniqueId for identical and pending transactions#840

Open
Saharariel wants to merge 1 commit intodaniel-hauser:mainfrom
Saharariel:fix/#827
Open

fix: resolve duplicate uniqueId for identical and pending transactions#840
Saharariel wants to merge 1 commit intodaniel-hauser:mainfrom
Saharariel:fix/#827

Conversation

@Saharariel
Copy link
Copy Markdown

@Saharariel Saharariel commented Mar 28, 2026

Problem

Two distinct bugs both cause a:

⚠️ Duplicate uniqueId detected warning

and result in transactions being silently dropped or merged:

  1. Pending transactions (e.g. Max) : pending transactions report chargedAmount: 0 regardless of the actual amount.
    Two pending transactions from the same merchant on the same day, even with different amounts, would produce identical uniqueIds.

  2. Legitimately identical transactions : buying the same item twice on the same day (same merchant, same amount, no bank-assigned identifier) produces identical uniqueIds.
    The second transaction gets dropped in Google Sheets or overwrites the first in SQL.

Both issues only occur when identifier is not set by the bank. When identifier is present, the existing logic works correctly.

Fix

For pending transactions: fall back to originalAmount when chargedAmount is 0, since originalAmount reflects the real transaction amount even while pending.

For identical-looking transactions: add disambiguateDuplicateIds at the batch level - after all transactions are assembled, any colliding uniqueIds get a _1, _2 suffix appended.
The first occurrence keeps its original uniqueId (backwards-compatible with existing stored data).

Reproduction

// Before fix — both produce the same uniqueId:
pending ₪50 → 2024-11-01_max_1234_0_COFFEE SHOP_
pending ₪30 → 2024-11-01_max_1234_0_COFFEE SHOP_ ← collision

// After fix:
pending ₪50 → 2024-11-01_max_1234_50_COFFEE SHOP_
pending ₪30 → 2024-11-01_max_1234_30_COFFEE SHOP_ ← distinct

Notes

  • The _1 suffix for identical transactions is assigned by position within the scraped batch. If the bank returns the same transactions in a different order on a re-run, the suffix could shift.
    This is an inherent limitation when the bank provides no identifier.
  • Pending transactions are skipped by YNAB and Google Sheets, so the pending fix only affects SQL storage.

Fixes #827

Summary by CodeRabbit

  • Bug Fixes

    • Fixed transaction ID generation for pending items with zero charged amount so distinct original amounts no longer collide.
    • Added batch-level duplicate-ID disambiguation to ensure distinct transactions are preserved and uniquely identified.
  • Tests

    • Added tests covering duplicate-ID resolution, edge cases with pre-suffixed IDs, and charged vs. original amount behavior.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 28, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b264b3ca-e823-4538-b1c7-492295645284

📥 Commits

Reviewing files that changed from the base of the PR and between 304dc15 and 63a80b8.

⛔ Files ignored due to path filters (1)
  • src/bot/__snapshots__/messages.test.ts.snap is excluded by !**/*.snap
📒 Files selected for processing (3)
  • src/bot/storage/index.ts
  • src/bot/storage/utils.test.ts
  • src/bot/storage/utils.ts
🚧 Files skipped from review as they are similar to previous changes (3)
  • src/bot/storage/index.ts
  • src/bot/storage/utils.ts
  • src/bot/storage/utils.test.ts

📝 Walkthrough

Walkthrough

Added batch-level duplicate-ID disambiguation and adjusted unique-ID generation: for pending transactions with chargedAmount === 0 the ID uses originalAmount; after transaction creation batches are post-processed to append numeric suffixes to duplicate uniqueIds.

Changes

Cohort / File(s) Summary
Deduplication & ID logic
src/bot/storage/utils.ts
Added disambiguateDuplicateIds(txns) to ensure batch-unique TransactionRow.uniqueId by appending _1, _2, ... to later collisions; changed transactionUniqueId(...) to use tx.originalAmount for pending transactions when tx.chargedAmount === 0 (otherwise keeps chargedAmount).
Storage pipeline integration
src/bot/storage/index.ts
resultsToTransactions now calls disambiguateDuplicateIds(txns) before returning transaction list.
Tests
src/bot/storage/utils.test.ts
New tests for disambiguateDuplicateIds (preserve uniques, suffix collisions, interleaved duplicates, avoid clashing with pre-suffixed IDs) and expanded transactionUniqueId coverage including pending zero-amount and non-pending zero-amount cases.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 I nibbled logic, found the clue,

Pending amounts now show what’s true.
Duplicates wear tiny tags—_1, _2—
Hopping safe through every queue,
A tidy trail of IDs anew.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main changes: fixing duplicate uniqueId issues for identical and pending transactions, which is the core objective of this PR.
Linked Issues check ✅ Passed The PR implementation fully addresses issue #827: pending transactions now use originalAmount when chargedAmount is 0, and disambiguateDuplicateIds resolves collisions while preserving first occurrence.
Out of Scope Changes check ✅ Passed All changes are directly scoped to resolving duplicate uniqueId issues as specified in #827; no unrelated modifications are present.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/bot/storage/utils.test.ts`:
- Around line 304-356: Add tests covering pre-suffixed collisions and the
non-pending zero-amount boundary: add a case that calls disambiguateDuplicateIds
with transactionRow items having uniqueId values ["a", "a_1", "a"] and assert
the resulting uniqueId array is fully unique (e.g. ["a","a_1","a_2"]) to ensure
pre-suffixed IDs don’t cause collisions; also add a test that constructs a
transactionRow with transactionUniqueId set and chargedAmount: 0 with pending:
false and assert the resulting uniqueId behavior matches the intended contract
(use transactionUniqueId in that non-pending zero-amount case). Ensure you
reference the existing helpers and function names (disambiguateDuplicateIds,
transactionRow, transactionUniqueId, chargedAmount, pending) when adding the two
tests.

In `@src/bot/storage/utils.ts`:
- Around line 67-77: The disambiguation in disambiguateDuplicateIds can still
produce collisions (e.g., ["coffee","coffee_1","coffee"] -> duplicate
"coffee_1"); fix it by making the function track already emitted uniqueIds (a
Set) and, when a base repeats, loop/increment the suffix (using counts.get(base)
as the starting counter) until the candidate `${base}_${n}` is not present in
the seen set (and also not equal to any original tx.uniqueId that has already
been emitted), then emit that candidate and update both counts and seen; keep
returning the original tx when no suffix is needed.
- Around line 44-47: The current amount fallback always replaces a zero
tx.chargedAmount with tx.originalAmount, which changes IDs for non-pending rows;
update the logic where amount is computed so the fallback to tx.originalAmount
only occurs for pending transactions (e.g., check tx.status === 'pending' or a
pending flag such as tx.isPending) and preserve chargedAmount==0 for non-pending
transactions. Locate the amount assignment (const amount = tx.chargedAmount ||
tx.originalAmount) and change it to conditionally use tx.originalAmount only
when the transaction is pending and chargedAmount is zero, otherwise keep
tx.chargedAmount.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4bddfc4b-2443-4165-b785-f15551872d55

📥 Commits

Reviewing files that changed from the base of the PR and between 4457866 and 304dc15.

📒 Files selected for processing (3)
  • src/bot/storage/index.ts
  • src/bot/storage/utils.test.ts
  • src/bot/storage/utils.ts

Comment thread src/bot/storage/utils.test.ts
Comment thread src/bot/storage/utils.ts
Comment thread src/bot/storage/utils.ts
- Use originalAmount as fallback when chargedAmount is 0 only for pending
  transactions, preserving chargedAmount for non-pending rows
- Add disambiguateDuplicateIds to append _1, _2 suffixes for transactions
  that are legitimately identical within a batch (e.g. buying coffee twice
  on the same day with no bank-assigned identifier)
- Pre-populate seen set to avoid collisions with pre-suffixed natural IDs

Fixes daniel-hauser#827
@daniel-hauser
Copy link
Copy Markdown
Owner

Thanks for the PR, i prefer fixing the identifier in the scraper code (e.g. eshaham/israeli-bank-scrapers#1052) rather than here.

Specifically for Max, i added the arn as the identifier long time ago (eshaham/israeli-bank-scrapers#687). can you confirm in max devtools you are getting the same arn for different max transactions?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

transactionUniqueId is not unique for double transaction

2 participants