diff: count version multiplicity instead of set-flattening#61
Open
Bisa wants to merge 1 commit into
Open
Conversation
generate_diffs_from_paths built HashMap<Version, usize> counts via
count_versions, then discarded the counts by collecting just the keys
into a HashSet. {v ×1} vs {v ×2} flattened to {v} vs {v}, the
difference came up empty, and the entry hit the `continue` branch —
so a package with an added (or dropped) duplicate disappeared from
the diff even though the closure genuinely gained or lost a copy.
Replace the HashSet diff with multiplicity-aware accounting over the
union of versions: for each Version seen in either map, common_count
accumulates min(old, new) and the surplus on each side goes into
unique_old / unique_new. Downstream classification branches keep
working unchanged.
Three new tests cover the regression. One existing test,
generate_diffs_version_deduplication (input old=[1.0.0 ×3],
new=[2.0.0]), flips from Upgraded to UpgradeDowngrade: the old
assertion held because the set-flatten reduced the input to
[1.0.0] vs [2.0.0] before the matcher saw it — only the clean
upgrade was visible. With counts preserved the matcher sees one
paired upgrade plus two unmatched 1.0.0 removals, which is what
the closure actually did.
Fixes manic-systems#60
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #60.
generate_diffs_from_pathsbuiltHashMap<Version, usize>counts viacount_versions, then immediately discarded the counts by collectingjust the keys into a
HashSet. From there,unique_old/unique_newwere computed as set differences, so
{v ×1}vs{v ×2}flattened to{v}vs{v}and the entry hit thecontinuebranch — making thepackage disappear from the diff even though the closure genuinely
gained (or lost) a copy.
In practice this triggers when an overlay derivation produces the
same
(pname, version)as an existing one in the closure — e.g. alocally pinned package shadowing upstream. The
SIZE/DIFFlines(which come from the closure walk, not the deduped name set) still
update correctly, so the report shows a size delta with no
corresponding package row.
Fix
Replace the
HashSet-based diff with multiplicity-aware accountingover the union of versions. For each
Versionseen in either map,common_countaccumulatesmin(old, new)and the surplus on eachside goes into
unique_old/unique_new. The downstreamclassification branches (
Added/Removed/Changed) keepworking unchanged.
Tests
Three new tests cover the regression:
generate_diffs_duplicate_added—(old=[v], new=[v, v])mustproduce a diff entry.
generate_diffs_duplicate_removed—(old=[v, v], new=[v])mustproduce a diff entry.
generate_diffs_duplicate_alongside_existing_versions—(old=[v1, v2], new=[v1, v2, v2]), the real-world reproducer.All three fail on
main(result.len() == 0) and pass after the fix.One existing test changes its expected status:
generate_diffs_version_deduplication, input(old=[1.0.0 ×3], new=[2.0.0]). It previously assertedChanged(Upgraded)because the set-flatten reduced the input to[1.0.0]vs[2.0.0]before the matcher saw it — only the cleanupgrade was visible to
determine_change_status. With countspreserved, the Hungarian matcher pairs one
1.0.0with2.0.0(upgrade) and the two unmatched
1.0.0s register as downgrades,so the entry is now
Changed(UpgradeDowngrade)— i.e. whatactually happens to the closure. The test wasn't wrong before;
its input was lossy by the time it reached the classifier.
Verification
cargo clippywas run too; no new lints in the touched code(existing warnings in
src/version.rsetc. are untouched).Disclosure
This PR (and the linked issue) were drafted with Claude Opus 4.7
acting as a writing / implementation assistant. The bug was real
(hit on my own NixOS system); the analysis, fix design, and tests
were derived from reading
src/diff.rsat 1.4.2 and verifying thefailure mode with
cargo testbefore applying the change.