Add path-aware matching for duplicate package names #10
+1,235
−60
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
When comparing SBOMs, packages with the same name at different filesystem locations were being deduplicated, making it impossible to track version changes for packages embedded in multiple binaries (e.g., Go stdlib in 5 different executables).
Instead of using just the package name as a key, I changed the code to use
(name, path)tuples. For CycloneDX files, it looks for any property that has both "location" and "path" in the name - so it works with syft'ssyft:location:0:pathor similar conventions from other tools. When displaying differences, if a path exists, it shows the binary name likestdlib (service-a)so you know which binary it's in. SPDX formats don't have path metadata, so they just use an empty string to keep everything consistent.You can now track the same package appearing in different binaries independently. If you have Go stdlib embedded in 5 executables and one gets updated, you'll see exactly which one changed. It works with any SBOM generation tool that puts path info in properties (syft, grype, trivy, etc), but also handles SBOMs that don't have paths at all - it just falls back gracefully. The output only shows the binary name when it's actually useful, keeping things clean.
Added 42 tests.
Fixes #8