Skip to content

feat(contrib/trivy): preserve JAR Digest in Library conversion#2535

Draft
kotakanbe wants to merge 1 commit intomasterfrom
fix/contrib-trivy-preserve-jar-digest
Draft

feat(contrib/trivy): preserve JAR Digest in Library conversion#2535
kotakanbe wants to merge 1 commit intomasterfrom
fix/contrib-trivy-preserve-jar-digest

Conversation

@kotakanbe
Copy link
Copy Markdown
Member

Problem

contrib/trivy/pkg/converter.go converts Trivy native JSON into Vuls's models.ScanResult. For language packages (types.ClassLangPkg), it constructs models.Library from each input ftypes.Package:

libScanner.Libs = append(libScanner.Libs, models.Library{
    Name:     p.Name,
    Version:  p.Version,
    PURL:     getPURL(p),
    FilePath: p.FilePath,
    Dev:      p.Dev,
    // p.Digest is dropped
})

Trivy's java jar analyzer (pkg/fanal/analyzer/language/java/jar/jar.go) sets ftypes.Package.Digest to the JAR's SHA-1 hash ("sha1:<hex>") when scanning JAR files. The native Vuls LibraryScanner path already preserves this on models.Library.Digest at scanner/library.go:

libs = append(libs, models.Library{
    ...
    Digest:   string(lib.Digest),
    ...
})

The contrib converter was the last hop that received the Digest in its input but discarded it. Any consumer that takes Vuls's ScanResult JSON via the trivy-to-vuls route (rather than the native LibraryScanner path) loses access to the hash.

Why it matters

JAR SHA-1 has multiple downstream uses, all currently disabled for trivy-to-vuls-derived ScanResult JSON:

  1. Maven Central canonicalizationhttps://search.maven.org/solrsearch/select?q=1:<sha1> deterministically resolves a SHA-1 to (groupId, artifactId, version). Useful when an input PURL came in groupId-less from a different SBOM tool that the consumer is correlating against.
  2. Tamper detection — comparing observed SHA-1 with the Maven Central record verifies the JAR hasn't been re-packaged.
  3. De-duplication / artifact identity — same JAR bytes across hosts/images can be canonicalized.

The Vuls LibraryScanner path supports all of these because it preserves Digest. The trivy-to-vuls path silently doesn't.

Fix

One field on the models.Library struct literal:

libScanner.Libs = append(libScanner.Libs, models.Library{
    Name:     p.Name,
    Version:  p.Version,
    PURL:     getPURL(p),
    FilePath: p.FilePath,
+   Digest:   string(p.Digest),
    Dev:      p.Dev,
})

The vulnerability-derived Library construction at line 167-172 has no Digest source on types.DetectedVulnerability and is left unchanged; the flatten and unique libraries loop at line 277 deduplicates by Name+Version, so when the same library appears in both Vulnerabilities and Packages, the Package-derived entry (with Digest) wins.

Test

New TestConvert/JAR_digest_is_preserved_into_models.Library exercises ClassLangPkg with a log4j-core package carrying a representative SHA-1 and asserts the digest flows through to the output Library.

go test ./contrib/... and go vet ./contrib/... pass.

Compatibility

  • No behavioral change for non-JAR scanners — Trivy doesn't set Package.Digest for npm/gem/pypi/etc., so string("") == "" and Library.Digest stays empty.
  • No change for callers that don't read Library.Digest — the field already existed; this only populates it.

Related

  • Companion PR for the native LibraryScanner path bug: fix(detector/library): regenerate PURL after JAR SHA1 canonicalization #2533 (improveJARInfo updates Name/Version but not PURL after SHA-1 canonicalization).
  • Downstream tracking: vuls-saas/futurevuls-backend#1748 (consumes both Vuls ScanResult paths and SBOM-direct paths; needs DB columns and decoder logic to actually use the Digest).

Trivy's `ftypes.Package.Digest` carries the SHA-1 hash for JAR files
(set by Trivy's java jar analyzer alongside its own Maven Central
canonicalization). The trivy-to-vuls converter was the last hop in
this chain that already received Digest in its input but dropped it
when constructing models.Library, leaving downstream consumers (e.g.
the vuls-saas catalog enrichment pipelines, FutureVuls supply-chain
verification) without access to the hash.

The Vuls native LibraryScanner path already preserves Digest at
scanner/library.go convertLibWithScanner. This change brings the
contrib converter to the same level so all three intake paths
(LibraryScanner, contrib/trivy, SBOM ingestion in downstream
consumers) can rely on Digest being present whenever Trivy supplied it.

Test:
- New TestConvert/JAR_digest_is_preserved_into_models.Library covers
  the ClassLangPkg branch with a representative log4j-core sha1.
- The vulnerability-derived branch (line 167-172) still has no Digest
  source on types.DetectedVulnerability, so it remains unchanged; the
  flatten/unique loop deduplicates by Name+Version so Package-derived
  entries (with Digest) win when both branches produce the same lib.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the Trivy JSON → Vuls models.ScanResult converter to preserve Trivy-provided JAR SHA-1 digests in models.Library, aligning the contrib trivy-to-vuls path with the native LibraryScanner behavior so downstream consumers can reliably use artifact hashes.

Changes:

  • Preserve ftypes.Package.Digest when converting ClassLangPkg packages into models.Library.
  • Add a unit test ensuring a JAR package’s Digest flows through conversion into the output Library.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
contrib/trivy/pkg/converter.go Threads Package.Digest into models.Library.Digest for language package conversion.
contrib/trivy/pkg/converter_test.go Adds a regression test for JAR digest preservation in conversion output.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@kotakanbe kotakanbe marked this pull request as draft May 1, 2026 03:33
Copy link
Copy Markdown
Collaborator

@shino shino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trivy does not include SHA1 (or other JAR file's digests) in its result. PR to Trivy will improve the situation.
This PR diff looks nice but I leave it as open because it makes no actual difference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants