fix(python): deterministic pip/poetry scans + move smoke fixtures to bomly-dev#177
Conversation
The pip detector resolves graphs from `pip inspect --local`, which reads the active interpreter's site-packages. Running --install-first as a plain `pip install` into the ambient interpreter captured unrelated runner tooling (poetry, build, keyring, virtualenv, date-versioned helpers), so smoke output drifted daily. Poetry's lock fast-path was also bypassed by --install-first, dropping transitive edges (depends_on became empty). - Isolate pip --install-first into a clean, project-scoped virtualenv (internal/detectors/python/venv.go) so pip install and pip inspect run against only the declared deps, not the ambient environment. - Add a pip requirements.lock fast-path (internal/detectors/python/piplock.go) mirroring poetry.lock: build the graph directly from a committed, fully-pinned, `# via`-annotated lock with no install/inspect. - Drop --install-first from the scan-python-poetry target so the committed poetry.lock fast-path runs (stable versions + transitive edges). - Document the decision in docs/ARCHITECTURE.md. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds two Python detector determinism mechanisms: a project-scoped virtualenv (keyed by working directory hash) for isolated ChangesPython Detector Determinism: Venv Isolation + Lock Fast-Path
Test Infrastructure and Documentation Updates
Estimated code review effort🎯 4 (Complex) | ⏱️ ~55 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
Bomly Diff SummaryCompared Overview
Dependency Changes✅ No dependency changes. Vulnerabilities✅ No vulnerability changes. License Changes✅ No license changes. Project Posture✅ No project posture changes (or Policy Findings✅ No policy differences were identified. |
|
Caution Failed to replace (edit) comment. This is likely due to insufficient permissions or the comment being deleted. Error details |
example-python-pip now ships a fully-pinned requirements.lock, so the pip detector resolves a deterministic graph via its lock fast-path: - scan-python-pip: pin to the lock commit and drop --install-first (the fast-path needs no install, and installing into the ambient interpreter was the source of the drift). - scan-python-pip-reachability: repoint to bomly-dev/example-python-pip at the same commit; its main.py keeps the jwt/django/rsa/requests call paths reachability needs, and the committed lock makes the graph stable. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Repoint every smoke/audit target that used an external demo repo to the
equivalent bomly-dev fixture (pinned by commit; all carry committed
lockfiles / go.sum so resolution is deterministic):
- scan-go-reachability -> example-go-gomod (x/text language.Parse
reached via main -> sub3.Baz)
- scan-java-maven-reachability -> example-java-maven (Main.java imports
commons-fileupload / xmlsec / jbcrypt / spring-web)
- scan-npm-reachability -> example-javascript-npm (js-yaml.load,
direct + transitive via `to`)
- scan-npm-scope-runtime -> example-javascript-npm (dev dep mocha
excluded under --scope runtime)
- scan-npm-audit -> example-javascript-npm (vulnerable deps)
Also scrub residual third-party branding from a maven detector test
fixture (com.srcclr -> com.bomly) and a prose template note.
Goldens for the repointed targets need regeneration.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Repoint the audit and lite go targets (scan-go-enrich/audit/audit-high, diff-go-audit, explain-go-enrich, lite-scan-go, lite-diff-go, lite-explain-go) from google/uuid to bomly-dev/example-go-gomod, matching the convention already used by the basic scan-go/diff-go/explain-go cases (tags v1.0.0 / v0.9.0..v1.0.0; explain targets golang.org/x/text). Every smoke/audit scan --url target now resolves to a bomly-dev example repo. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Mirror the node detector's lockfile_integration_test.go: drive each
Python lock fast-path end-to-end through ResolveGraph against real
manifest fixtures under testdata/lockfiles/{pip,poetry,uv,pipenv}, with
node-style helpers (requirePyPackage / requirePyEdge / requirePyScope /
requirePySingleRoot). Covers the binary-free parsers — requirements.lock
(the new pip fast-path), poetry.lock, uv.lock, Pipfile.lock — asserting
package set, transitive edges, single application root, and
runtime/development scope.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
There was a problem hiding this comment.
🧹 Nitpick comments (3)
internal/detectors/python/lockfile_integration_test.go (3)
111-115: ⚡ Quick winAssert the missing fourth transitive edge in the pip fixture test.
Line 111 says requests has four transitives, but Line 112-Line 115 only assert three edges. Add
requests -> charset-normalizerso the lock fast-path edge contract is fully covered.Suggested patch
// requests pulls its four transitive deps via "# via requests". requirePyEdge(t, g, "requests", "2.32.3", "certifi", "2024.8.30") + requirePyEdge(t, g, "requests", "2.32.3", "charset-normalizer", "3.4.0") requirePyEdge(t, g, "requests", "2.32.3", "idna", "3.10") requirePyEdge(t, g, "requests", "2.32.3", "urllib3", "2.2.3")🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/detectors/python/lockfile_integration_test.go` around lines 111 - 115, The test comment indicates that requests has four transitive dependencies, but the test only verifies three edges using requirePyEdge calls for certifi, idna, and urllib3. Add a fourth requirePyEdge call to assert the missing edge from requests version 2.32.3 to charset-normalizer to ensure the lock file fast-path edge contract is fully tested.
100-176: ⚡ Quick winAdd doc comments for exported test functions to satisfy repository Go standards.
TestPipRequirementsLockFixture,TestPoetryLockFixture,TestUVLockFixture, andTestPipenvLockFixtureare exported and should each have a short doc comment.As per coding guidelines, `**/*.go`: Every exported type and function must have a doc comment.Suggested patch
+// TestPipRequirementsLockFixture validates requirements.lock fast-path graph construction. func TestPipRequirementsLockFixture(t *testing.T) { @@ +// TestPoetryLockFixture validates poetry.lock + pyproject.toml fast-path graph construction. func TestPoetryLockFixture(t *testing.T) { @@ +// TestUVLockFixture validates uv.lock fast-path graph construction. func TestUVLockFixture(t *testing.T) { @@ +// TestPipenvLockFixture validates Pipfile.lock fast-path graph construction. func TestPipenvLockFixture(t *testing.T) {🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/detectors/python/lockfile_integration_test.go` around lines 100 - 176, Add documentation comments to the four exported test functions TestPipRequirementsLockFixture, TestPoetryLockFixture, TestUVLockFixture, and TestPipenvLockFixture. Each function requires a doc comment line placed directly above its declaration that begins with the function name followed by a brief description of what the test validates. Follow Go's standard documentation format where the comment starts with the function name as it appears in the code.Source: Coding guidelines
190-196: ⚡ Quick winMake the pipenv test enforce the documented flat-graph behavior and pluggy scope.
The comments on Line 190-Line 193 describe flat lock-only behavior and pluggy’s resulting scope, but the assertions do not verify either. Add explicit checks so this known limitation is regression-tested.
Suggested patch
// Scope is re-derived from the Pipfile's [packages] / [dev-packages]: // requests is runtime, pytest is development. pluggy is only a transitive // dependency of pytest, but Pipfile.lock is flat (no edge records it), so it // stays runtime — a known limitation of the lock-only fast-path. requirePyScope(t, g, "requests", "2.32.3", sdk.ScopeRuntime) requirePyScope(t, g, "pytest", "8.3.3", sdk.ScopeDevelopment) + requirePyScope(t, g, "pluggy", "1.5.0", sdk.ScopeRuntime) + + requestsDeps, err := g.DirectDependencies(pyStableID("requests", "2.32.3")) + if err != nil { + t.Fatalf("dependencies(requests@2.32.3): %v", err) + } + if len(requestsDeps) != 0 { + t.Errorf("expected flat Pipfile.lock graph; requests should have no direct deps, got %d", len(requestsDeps)) + }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/detectors/python/lockfile_integration_test.go` around lines 190 - 196, The test comments in this function describe how pluggy is a transitive dependency of pytest that stays runtime due to the flat lock-only behavior, but there is no assertion actually verifying pluggy's scope. Add an additional requirePyScope call (similar to the existing ones for requests and pytest) that explicitly checks pluggy's scope is sdk.ScopeRuntime to regression-test this documented limitation and prevent accidental changes to this behavior in the future.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@internal/detectors/python/lockfile_integration_test.go`:
- Around line 111-115: The test comment indicates that requests has four
transitive dependencies, but the test only verifies three edges using
requirePyEdge calls for certifi, idna, and urllib3. Add a fourth requirePyEdge
call to assert the missing edge from requests version 2.32.3 to
charset-normalizer to ensure the lock file fast-path edge contract is fully
tested.
- Around line 100-176: Add documentation comments to the four exported test
functions TestPipRequirementsLockFixture, TestPoetryLockFixture,
TestUVLockFixture, and TestPipenvLockFixture. Each function requires a doc
comment line placed directly above its declaration that begins with the function
name followed by a brief description of what the test validates. Follow Go's
standard documentation format where the comment starts with the function name as
it appears in the code.
- Around line 190-196: The test comments in this function describe how pluggy is
a transitive dependency of pytest that stays runtime due to the flat lock-only
behavior, but there is no assertion actually verifying pluggy's scope. Add an
additional requirePyScope call (similar to the existing ones for requests and
pytest) that explicitly checks pluggy's scope is sdk.ScopeRuntime to
regression-test this documented limitation and prevent accidental changes to
this behavior in the future.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: ef902472-9262-4b00-b3c8-65bf6f4b5e41
⛔ Files ignored due to path filters (7)
internal/detectors/python/testdata/lockfiles/pip/requirements.lockis excluded by!**/*.lock,!**/testdata/**internal/detectors/python/testdata/lockfiles/pipenv/Pipfileis excluded by!**/testdata/**internal/detectors/python/testdata/lockfiles/pipenv/Pipfile.lockis excluded by!**/*.lock,!**/testdata/**internal/detectors/python/testdata/lockfiles/poetry/poetry.lockis excluded by!**/*.lock,!**/testdata/**internal/detectors/python/testdata/lockfiles/poetry/pyproject.tomlis excluded by!**/testdata/**internal/detectors/python/testdata/lockfiles/uv/pyproject.tomlis excluded by!**/testdata/**internal/detectors/python/testdata/lockfiles/uv/uv.lockis excluded by!**/*.lock,!**/testdata/**
📒 Files selected for processing (1)
internal/detectors/python/lockfile_integration_test.go
Problem
The scheduled Smoke workflow fails daily on the Python jobs, and the suite still depended on third-party demo repos (Veracode, Snyk, ljharb, google/uuid). This PR fixes the Python determinism root causes, moves every external scan target onto bomly-dev's own example repos, and adds fixture-based detector tests.
Python drift (distinct from the EPSS fix in #176)
scan-python-pip— the pip detector resolves frompip inspect --local;--install-firstinstalled into the ambient CI interpreter, sopip inspectcaptured the runner's own tooling plus latest-resolved transitive deps → daily drift.scan-python-poetry—--install-firstbypassed the committedpoetry.lockfast-path intopoetry run pip inspect, dropping transitive edges (depends_onbecame[]).scan-python-pip-reachability— same ambient-capture cause in the dependency graph (e.g.idna 3.18instead of pinned2.8).Changes
Detector
--install-firstin a venv (internal/detectors/python/venv.go): clean, project-scoped virtualenv for bothpip installandpip inspect; ambient site-packages no longer leak in.requirements.lockfast-path (internal/detectors/python/piplock.go): build the graph from a committed, pinned,# via-annotated lock — no install, no inspect — mirroringpoetry.lock.Tests
internal/detectors/python/lockfile_integration_test.gomirrors the node detector's fixture-based integration tests: drives each binary-free lock fast-path end-to-end throughResolveGraphagainst real manifests undertestdata/lockfiles/{pip,poetry,uv,pipenv}, asserting package set, transitive edges, single application root, and runtime/development scope. Plus the existing unit tests for the lock parser and venv helpers.All smoke/audit scan targets → bomly-dev (pinned; each carries a committed lock / go.sum):
scan-python-poetrydrops--install-first;scan-python-pip+-reachability→example-python-pip(committedrequirements.lock).scan-go*/diff-go*/explain-go*/lite-*-go→example-go-gomod(wasgoogle/uuid).scan-java-maven-reachability→example-java-maven.scan-npm-reachability/-scope-runtime/-audit→example-javascript-npm.Cleanup: scrub residual third-party branding (
com.srcclr→com.bomlyin a maven detector test; a prose-template note).git grepforveracode|sourceclear|srcclr|snyk|ljharb|nodejs-goofis clean; no scan--urltarget points outsidebomly-dev.Docs: decision-log entry in
docs/ARCHITECTURE.md.Remaining
make smoke ARGS="-update") — one pass also folds in the GitHub Actionslocationsstaleness from Annotate SARIF diff output and GitHub Actions locations #168 and the EPSS normalizer from test(smoke): normalize volatile EPSS fields in golden comparison #176.test/smoke/testdata/sboms/go.{cdx,spdx}.jsonstill listgoogle/uuidas synthetic SBOM content (not repo scans) — intentionally left as-is.🤖 Generated with Claude Code
Summary by CodeRabbit
requirements.lockfast-path to build dependency graphs directly from pinned lockfiles.