Skip to content

diff: count version multiplicity instead of set-flattening#61

Open
Bisa wants to merge 1 commit into
manic-systems:mainfrom
Bisa:fix/multiset-diff
Open

diff: count version multiplicity instead of set-flattening#61
Bisa wants to merge 1 commit into
manic-systems:mainfrom
Bisa:fix/multiset-diff

Conversation

@Bisa
Copy link
Copy Markdown

@Bisa Bisa commented May 22, 2026

Fixes #60.

generate_diffs_from_paths built HashMap<Version, usize> counts via
count_versions, then immediately discarded the counts by collecting
just the keys into a HashSet. From there, unique_old / unique_new
were computed as set differences, so {v ×1} vs {v ×2} flattened to
{v} vs {v} and the entry hit the continue branch — making the
package disappear from the diff even though the closure genuinely
gained (or lost) a copy.

In practice this triggers when an overlay derivation produces the
same (pname, version) as an existing one in the closure — e.g. a
locally pinned package shadowing upstream. The SIZE/DIFF lines
(which come from the closure walk, not the deduped name set) still
update correctly, so the report shows a size delta with no
corresponding package row.

Fix

Replace the HashSet-based diff with multiplicity-aware accounting
over the union of versions. For each Version seen in either map,
common_count accumulates min(old, new) and the surplus on each
side goes into unique_old / unique_new. The downstream
classification branches (Added / Removed / Changed) keep
working unchanged.

Tests

Three new tests cover the regression:

  • generate_diffs_duplicate_added(old=[v], new=[v, v]) must
    produce a diff entry.
  • generate_diffs_duplicate_removed(old=[v, v], new=[v]) must
    produce a diff entry.
  • generate_diffs_duplicate_alongside_existing_versions
    (old=[v1, v2], new=[v1, v2, v2]), the real-world reproducer.

All three fail on main (result.len() == 0) and pass after the fix.

One existing test changes its expected status:
generate_diffs_version_deduplication, input
(old=[1.0.0 ×3], new=[2.0.0]). It previously asserted
Changed(Upgraded) because the set-flatten reduced the input to
[1.0.0] vs [2.0.0] before the matcher saw it — only the clean
upgrade was visible to determine_change_status. With counts
preserved, the Hungarian matcher pairs one 1.0.0 with 2.0.0
(upgrade) and the two unmatched 1.0.0s register as downgrades,
so the entry is now Changed(UpgradeDowngrade) — i.e. what
actually happens to the closure. The test wasn't wrong before;
its input was lossy by the time it reached the classifier.

Verification

$ nix develop -c cargo test
test result: ok. 94 passed; 0 failed; ...
$ nix develop -c cargo fmt --check
(clean)

cargo clippy was run too; no new lints in the touched code
(existing warnings in src/version.rs etc. are untouched).

Disclosure

This PR (and the linked issue) were drafted with Claude Opus 4.7
acting as a writing / implementation assistant. The bug was real
(hit on my own NixOS system); the analysis, fix design, and tests
were derived from reading src/diff.rs at 1.4.2 and verifying the
failure mode with cargo test before applying the change.

generate_diffs_from_paths built HashMap<Version, usize> counts via
count_versions, then discarded the counts by collecting just the keys
into a HashSet. {v ×1} vs {v ×2} flattened to {v} vs {v}, the
difference came up empty, and the entry hit the `continue` branch —
so a package with an added (or dropped) duplicate disappeared from
the diff even though the closure genuinely gained or lost a copy.

Replace the HashSet diff with multiplicity-aware accounting over the
union of versions: for each Version seen in either map, common_count
accumulates min(old, new) and the surplus on each side goes into
unique_old / unique_new. Downstream classification branches keep
working unchanged.

Three new tests cover the regression. One existing test,
generate_diffs_version_deduplication (input old=[1.0.0 ×3],
new=[2.0.0]), flips from Upgraded to UpgradeDowngrade: the old
assertion held because the set-flatten reduced the input to
[1.0.0] vs [2.0.0] before the matcher saw it — only the clean
upgrade was visible. With counts preserved the matcher sees one
paired upgrade plus two unmatched 1.0.0 removals, which is what
the closure actually did.

Fixes manic-systems#60

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

dix collapses same-(name, version) duplicates, hiding genuine closure changes

1 participant