refactor(scan): extract shared pipeline helpers into tasks/_scan_pipeline#405
Merged
Merged
Conversation
…line Pull the self-contained terminal-state writers and the per-stage progress writer out of scan_source into a new tasks/_scan_pipeline module, so the upcoming SBOM-ingest Celery task can reuse them through a public seam instead of reaching into a sibling task module's privates. Behaviour-preserving — no functional change to the source scan: - mark_failed / record_terminal_failure / mark_succeeded moved verbatim - set_stage generalised to take an explicit `percent` (the source pipeline still owns _STAGE_PROGRESS and passes .get(stage); None keeps the prior percent, matching the original .get(stage, prior) fallback exactly) - scan_source keeps thin _mark_*/_set_stage aliases so its own call sites and the monkeypatch-based tests stay unchanged - _persist_components renamed to public persist_sbom_components in place (the ~700-line sub-helper cluster stays private, not relocated) make_line_callback was already public in tasks/_progress (extracted in a prior PR), so it is reused from there, not duplicated. No import cycle.
haksungjang
added a commit
that referenced
this pull request
Jun 13, 2026
image-scan kept HARD-failing on lodash 4.17.19 (CVE-2021-23337, CVE-2026-4800) and minimist 1.2.5 (CVE-2021-44906) even after the cdxgen 12.3.3→12.5.1 bump, which only rebuilt the cdxgen layer. A fresh local install of cdxgen 12.5.1 and of npm 11.14.1 — the image's only two npm-package installers — pulls neither package, and these CVEs were never in .trivyignore, yet image-scan passed on #404/#405. The vulnerable copies therefore live in a stale, earlier `scope=worker` cache layer (a non-deterministic npm-install resolution cached long ago), not in anything the current Dockerfile produces. Bumping the buildx GHA cache scope (worker → worker-v2) abandons the poisoned cache and forces a single clean rebuild; the new namespace caches the clean tree. Keeps the cdxgen 12.5.1 bump (latest 12.x, verified lodash/minimist-free).
haksungjang
added a commit
that referenced
this pull request
Jun 14, 2026
…rker image-scan (#407) A no-cache linux/amd64 rebuild of the worker image (image-scan gate) HARD-fails on three node-pkg findings vendored under cdxgen's global install tree: - lodash 4.17.19 CVE-2021-23337 (HIGH), CVE-2026-4800 (HIGH) - minimist 1.2.5 CVE-2021-44906 (CRITICAL) These are pulled by a platform-gated (cpu=x64/os=linux) transitive of cdxgen's dependency graph: a fresh `npm install -g @cyclonedx/cdxgen@12.3.3` on linux/amd64 resolves them, while the same install on arm64/macOS resolves neither — so they were masked by the cached worker layer (image-scan passed on #404/#405) and surfaced only once that GHA cache evicted and CI did a clean amd64 rebuild. It is a pre-existing, main-wide latent issue, unrelated to any one feature PR. Add .trivyignore entries following the file's policy (CVE + target + CVSS + reach analysis + re-evaluate date). All three are UNREACHED: cdxgen is invoked only for dependency enumeration with a fixed argv, never calls lodash.template on scanned-repo input, and the worker never invokes lodash/minimist directly. Re-evaluate when cdxgen ships a fixed vendored tree.
haksungjang
added a commit
that referenced
this pull request
Jun 14, 2026
* feat(scan): external CycloneDX SBOM ingest endpoint
Add POST /v1/projects/{id}/sbom-ingest so external tools (CI, cdxgen-based
scanners) can upload an already-generated CycloneDX SBOM; TRUSCA runs the
back half of the scan pipeline against it — persist components → trivy sbom
matching → findings — reusing the Scan model so ingested scans get ref-keyed
retention, the per-project active-scan guard, and the existing
Components/Vulnerabilities/Licenses UI and build gate for free.
This is NOT a Dependency-Track compatible surface: it is a TRUSCA-native
endpoint (Authorization: Bearer, field `sbom`, no autoCreate), not DT's
/api/v1/bom + X-Api-Key.
Endpoint / service (services/sbom_ingest_service.py, api/v1/sbom.py):
- multipart sbom + ref + release; 202 ScanPublic (kind="sbom").
- require_role_or_api_key("developer"); project-scoped key must match.
- Reuses trigger_scan's guards via an extracted prepare_scan_target
(existence/team 404/403 before archived 409 / cap 429 — authz before state).
- Synchronous adversarial validation of untrusted input: bounded read
(SBOM_INGEST_MAX_BYTES, 32 MiB → 413), content-type/filename allow-list
(415), JSON + CycloneDX structure whitelist (422), component cap
(SBOM_INGEST_MAX_COMPONENTS, 50k → 422), and an O(n) string-aware byte
nesting-depth pre-check so a deeply nested document is a clean 422 instead
of a RecursionError → 500 from json.loads. RFC 7807 throughout.
- Atomic: flush wins the active-scan race before the file is written; a 409
loser writes no file; commit-race deletes the file; enqueue failure → 503.
Celery task (tasks/ingest_sbom.py, enqueue branch + include):
- ingest_sbom_task reuses persist_sbom_components → run_trivy_sbom →
persist_trivy_findings → mark_succeeded (ref-keyed supersede). Preserves the
uploaded SBOM as a durable sbom_cyclonedx ScanArtifact for the signature
surface; containment-guards the path under workspace_root().
Security (Producer-Reviewer findings addressed):
- bind_audit_team before the scan INSERT so the audit row carries team_id.
- disk-write failure → 503 SbomIngestStorageError (retryable), not 422.
- release / original_filename length-capped + control-byte stripped.
Tests: pure adversarial validator unit suite (incl. depth-bomb regression),
endpoint permission×state matrix + new existence-hide-state 409 rows,
realistic multi-CVE fixture pipeline test. Docs: EN/KO ci-integration/sbom-upload.
* test(scan): regenerate OpenAPI snapshot for sbom-ingest endpoint
The OpenAPI contract snapshot test (test_openapi_no_drift) flagged the new
POST /v1/projects/{project_id}/sbom-ingest path. Add it to the committed
snapshot — path param project_id only (sbom/ref/release are requestBody).
* fix(worker): bump cdxgen 12.3.3 → 12.5.1 to bust stale image-scan node-pkg layer
image-scan (worker) HARD-failed on 3 node-pkg findings — lodash 4.17.19
(CVE-2021-23337, CVE-2026-4800) and minimist 1.2.5 (CVE-2021-44906) — that
live under @cyclonedx/cdxgen/node_modules. Reproduction in node:20-bookworm
shows cdxgen 11.x bundles both, while 12.3.3 AND 12.5.1 ship neither: a clean
build already lacks them, so the failure was a stale type=gha scope=worker
cache layer serving the pre-12.x install tree (same class as the earlier
php-symfony image-scan incident).
Bumping the version interpolated into the global npm install changes that
layer's cache key, forcing a fresh (clean) install — root-cause removal, not
a .trivyignore suppression (suppressing a package absent from a clean build
would wrongly mute a future regression). cdxgen invocation is unchanged across
12.3.3→12.5.1 and engines.node still allows ^20, so no scan regression. Fixes
main too (shared cache) once merged.
* fix(ci): bump worker image-scan GHA cache scope to force a clean rebuild
image-scan kept HARD-failing on lodash 4.17.19 (CVE-2021-23337, CVE-2026-4800)
and minimist 1.2.5 (CVE-2021-44906) even after the cdxgen 12.3.3→12.5.1 bump,
which only rebuilt the cdxgen layer. A fresh local install of cdxgen 12.5.1
and of npm 11.14.1 — the image's only two npm-package installers — pulls
neither package, and these CVEs were never in .trivyignore, yet image-scan
passed on #404/#405. The vulnerable copies therefore live in a stale, earlier
`scope=worker` cache layer (a non-deterministic npm-install resolution cached
long ago), not in anything the current Dockerfile produces.
Bumping the buildx GHA cache scope (worker → worker-v2) abandons the poisoned
cache and forces a single clean rebuild; the new namespace caches the clean
tree. Keeps the cdxgen 12.5.1 bump (latest 12.x, verified lodash/minimist-free).
* Revert "fix(ci): bump worker image-scan GHA cache scope to force a clean rebuild"
This reverts commit a17e5fa.
* Revert "fix(worker): bump cdxgen 12.3.3 → 12.5.1 to bust stale image-scan node-pkg layer"
This reverts commit 20a3040.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Behavior-preserving extraction of the source-scan pipeline's self-contained orchestration helpers into a new
tasks/_scan_pipeline.py, so the upcoming external SBOM-ingest Celery task can reuse them through a clean public seam — withoutfrom tasks.scan_source import _private_namecross-module reach.Prereq refactor for the SBOM ingest feature (#404 added the
sbomscan kind; the ingest endpoint/task land next).Changes
tasks/_scan_pipeline.pywith publicmark_failed,record_terminal_failure,mark_succeeded(moved verbatim) and a generalizedset_stage(scan_uuid, stage, percent).set_stagetakespercentexplicitly instead of reaching into the source-only_STAGE_PROGRESS.scan_source._set_stagestays as a thin wrapper passing_STAGE_PROGRESS.get(stage)—Nonekeeps the row's prior percent, matching the original.get(stage, prior)fallback exactly.scan_sourcekeeps thin_mark_*/_set_stagealiases so its own call sites and the existingmonkeypatch.setattr(scan_source, "_record_terminal_failure", …)tests stay unchanged._persist_components→ publicpersist_sbom_components(in-place rename; the ~700-line sub-helper cluster stays private, not relocated).tasks._scan_pipeline).Behavior preservation
completed_at,progress_percent or 0snapshot,supersede_prior_ref_scanscall, succeededpercent=100,succeeded/failedstep strings, commit-then-publish ordering, and thescan_stagelog event/fields are byte-identical.set_stage: known stage → mapped int (DB+log+publish); unknown stage →percent=None→ prior percent kept, log carriesNone— identical to the original both paths.make_line_callbackwas already public intasks/_progress(prior extraction), so it is reused there, not duplicated.Scope notes
scan_container.pyhas its own copies of these helpers — out of scope here, a natural follow-up consumer of the shared module (its_STAGE_PROGRESSdiffers).Verification
mypy .(full, 442 files): clean.ruff checkon changed files: clean.test(backend)is the gate; behavior preservation argued above.