Statistics provided by pypi and pepy.tech
Tests are performed against Python 3.11, 3.12, and 3.13
Container builds from 0.5.0 onward are multi-arch, digest-addressable, signed, attested, and reproducible for supported platforms.
A zero dependency lightweight static analyzer designed for adversarial-shape code in python to detect supply chain attacks before they reach your interpreter.
pydepgate inspects Python packages and environments for code that executes silently at interpreter startup. This was the attack class used by the March 2026 LiteLLM supply-chain compromise and catalogued as MITRE ATT&CK T1546.018.
Available on PyPI as pydepgate.
For sponsorship of this project, see the Sponsors button, or FUNDING.md
Python's interpreter runs several kinds of code automatically at startup, before any user script executes:
.pthfiles insite-packages/. Any line beginning withimportis passed toexec()bysite.pyduring interpreter initialization.sitecustomize.pyandusercustomize.py. Imported automatically if present.__init__.pytop-level code in any imported package.setup.py. Executed duringpip installfor source distributions.- Console-script entry points. Generated and executed by
pip install.
Each of these is a legitimate Python feature. Each has been used in
real-world supply-chain attacks. Existing Python security tooling
(pip-audit, safety, bandit) does not inspect these startup vectors.
The .pth vector in particular has been acknowledged as a security gap
in CPython issue #113659
but has no patch.
pip install pydepgateRequires Python 3.11 or later. No third-party runtime dependencies.
docker pull ghcr.io/nuclear-treestump/pydepgate:latest
docker pull ghcr.io/nuclear-treestump/pydepgate:0.X.Y
docker pull ghcr.io/nuclear-treestump/pydepgate:0.XThe official image is published for linux/amd64 and linux/arm64.
For CI and production package-intake workflows, prefer pinning by digest
rather than relying on a mutable tag.
From 0.5.0 onward, container releases are signed by digest, GitHub-attested, emitted with BuildKit provenance and SBOM attestations, built from verified PyPI wheel inputs, and reproducible for supported platforms.
Container tags, digests, verification commands, runtime properties, and local invocation patterns are documented in the Docker image guide.
CI-specific examples are in the CI integration guide.
# Scan a wheel
pydepgate scan some-package-1.0.0-py3-none-any.whl
# Scan an installed package by name
pydepgate scan litellm
# Scan a single file
pydepgate scan --single suspicious_module.py
# Look up what a signal means
pydepgate explain DENS010Exit code 0 is clean, 2 means at least one HIGH or CRITICAL finding.
The full exit code contract is in
docs/reference/exit-codes.md.
pydepgate cvedb update
pydepgate cvescan some-package.whl
pydepgate cvescan --save-to-db some-package.whlpydepgate scan looks for suspicious startup-vector behavior.
pydepgate cvescan checks package identity against the local OSV-backed
CVE database. Use both when you want behavioral and known-vulnerability
coverage.
Wheels, source distributions, and installed packages all use the same
positional scan invocation:
pydepgate scan some-package-1.0.0-py3-none-any.whl
pydepgate scan some-package-1.0.0.tar.gz
pydepgate scan litellmThe wheel and sdist paths read directly from disk. The installed-package
form is resolved via importlib.metadata against the active environment.
--single bypasses wheel/sdist/installed-package dispatch and analyzes
the file directly. Useful for iterating on test fixtures, ad-hoc
inspection of a suspicious file, or reproducing a finding without
restructuring the file into a package:
pydepgate scan --single suspicious_module.py
pydepgate scan --single fixture.pth
pydepgate scan --single garbage.py --as init_pyThe file kind is auto-detected from the filename. .pth files are
treated as pth; files named setup.py, __init__.py,
sitecustomize.py, or usercustomize.py are classified as their natural
kind; anything else defaults to setup_py (the most permissive context).
Override with --as: setup_py / init_py / pth / sitecustomize /
usercustomize / library_py.
The peek enricher attempts safe partial decoding of large encoded literals so you can see what's actually inside a flagged blob without ever executing it:
pydepgate scan some-package.whl --peek
pydepgate scan some-package.whl --peek --peek-chainPeek handles base64, hex, zlib, gzip, bzip2, and lzma chains up to a configurable depth, classifies the terminal payload, and emits ENC002 when the unwrap chain is nested. Pickle data is detected but never deserialized; decompression bombs are bounded by an in-flight byte budget.
Full peek flag reference: docs/cli/index.md.
--decode-payload-depth=N runs a recursive re-scan over decoded
payloads, catching the multi-layer attack shape used by LiteLLM 1.82.8
(a base64 outer payload whose decoded source contains a second base64
payload).
pydepgate scan --deep some-package.whl --peek \
--decode-payload-depth=3 \
--decode-iocs=full \
--decode-location ./forensicsOutput goes to a directory chosen with --decode-location (default
./decoded/). With --decode-iocs=full, the run produces an encrypted
ZIP archive (default password infected, the malware-research
convention) plus a plaintext IOC sidecar for grep-friendly hash
extraction.
Full decode pipeline reference, including the IOC mode matrix and end-to-end forensic example: docs/guides/decode-payloads.md.
--ci forces machine-readable JSON output and disables ANSI color. It
does not change --min-severity; combine the two when CI should block
only on HIGH and CRITICAL findings:
pydepgate scan --ci --min-severity high some-package.whlFull CI guide (GitHub Actions, GitLab CI, Docker, pre-commit hooks): docs/guides/ci-integration.md.
pydepgate scan some-package.whl --rules-file company-rules.gateAuto-discovery checks ./pydepgate.gate then <venv>/pydepgate.gate
when the flag is not set. The rule file format is TOML or JSON,
auto-detected. Full format spec:
docs/reference/rules-file.md.
pydepgate scan --save-to-db some-package.whl
pydepgate db list-runs
pydepgate db explain --run-id <run-id>pydepgate can store scan runs in a local SQLite evidence database, including artifact hashes, active findings, finding locations, decoded payload trees, and CVE matches. This makes findings reproducible after the terminal session is gone.
pydepgate explain STDLIB001
pydepgate explain DENS010
pydepgate explain --rule default_stdlib001_in_pth
pydepgate explain --listpydepgate scan --deep some-package.whl --min-severity high--deep runs the density analyzer over ordinary library .py files in
addition to startup vectors. The density layer produces enough
informational signals at library scope that the --min-severity high
filter is strongly recommended.
pydepgate scan some-package.whl --format sarif > findings.sarifEmits a SARIF 2.1.0 document with codeFlows on decoded-payload findings, 24-character partial fingerprints for cross-run deduplication, and content-blind message text (no payload bytes leak into the document). For GitHub Code Scanning integration: docs/guides/sarif-integration.md.
The current analyzer set covers five major classes of suspicious behavior in startup vectors. Each analyzer emits raw signals; the rules engine maps signals to severity-rated findings based on file kind and context.
Encoding abuse (ENC001, ENC002). Patterns where encoded content is
decoded and executed in a single chain, for example
exec(base64.b64decode(payload)). Catches base64, hex, codec-based,
zlib, bz2, lzma, and gzip variants. With --peek enabled, ENC002 fires
when the partial-decoder unwrap loop reaches 2+ chain layers or
exhausts its configured depth, strong evidence that a literal is
intentionally obfuscated rather than a benign encoded blob.
Dynamic execution (DYN001-007). Direct calls to exec, eval,
compile, or __import__; access to exec primitives via getattr,
globals(), locals(), vars(), or __builtins__ subscripts;
compile-then-exec across the file; and aliased call shapes that catch
e = exec; e(...) evasions.
String obfuscation (STR001-004). Obfuscated string expressions
that resolve to the names of exec primitives or dangerous stdlib
functions, computed by a safe partial evaluator that never executes
user code. Catches concatenation ('ev' + 'al'), character codes
(chr(101) + chr(118) + chr(97) + chr(108)), slicing ('lave'[::-1]),
str.join of literal pieces, bytes.fromhex(...).decode(), f-string
assembly, and single-assignment variables containing obfuscated values.
The "harder they hide it the stronger the signal" model is realized
through operation counting.
Suspicious stdlib usage (STDLIB001-003). Calls to stdlib functions
that are highly unusual in startup vectors: process spawn
(STDLIB001: os.system, subprocess.Popen, subprocess.run,
os.exec*), network operations (STDLIB002: urllib.request.urlopen,
socket.socket, http.client), and native code loading (STDLIB003:
ctypes.CDLL, ctypes.WinDLL). The rules engine promotes these to
CRITICAL when they appear in setup.py or .pth files.
Code density (DENS001-051). A broad layer covering the things
obfuscated code looks like even when no single primitive call is
suspicious on its own: high-entropy string literals, base64-alphabet
strings, machine-generated identifiers, confusable single-character
names, invisible Unicode characters, Unicode homoglyphs in identifiers,
disproportionate AST depth, deeply nested lambdas, byte-range integer
arrays, high-entropy docstrings, and dynamic __doc__ references
passed to a callable. Calibrated so the same content scans differently
depending on file kind: a high-entropy base64 literal in .pth is
CRITICAL, in __init__.py is MEDIUM, anywhere else is LOW.
Complete signal reference with severity tables per file kind: docs/reference/signals.md.
The LiteLLM 1.82.8 .pth payload is a single line:
import base64; exec(base64.b64decode('cHJpbnQoMSkK'))
A scanner that grepped for exec would catch it. A scanner that
grepped for base64.b64decode would catch it. But an attacker who
knew about either of those evasions could trivially defeat both.
pydepgate fires five separate findings on this line from four
independent analyzers:
- ENC001 (encoding_abuse): decode-then-execute pattern
- DYN002 (dynamic_execution):
exec()with non-literal argument at module scope - DENS001 (code_density): token-dense single line
- DENS010 (code_density): high-entropy string literal
- DENS011 (code_density): base64-alphabet string literal
Plus the rule layer promotes all of them to CRITICAL because the
file is a .pth. To evade pydepgate, an attacker has to defeat every
analyzer simultaneously while still producing a working .pth
payload. Each evasion narrows what's possible; the intersection of all
evasions is the empty set for any shape that could realistically
execute on Python startup.
Analyzers emit raw signals. The rules engine maps signals to
severity-rated findings using a data-driven rule set. Default rules
are built into pydepgate; users can override or augment them with a
pydepgate.gate file (TOML or JSON, auto-detected) in the project
root, the venv root, or specified via --rules-file.
A rule has three parts: identity, match conditions, and an effect:
[[rule]]
id = "litellm-pth-stdlib"
signal_id = "STDLIB001"
file_kind = "pth"
action = "set_severity"
severity = "critical"
explain = "subprocess calls in .pth files have no legitimate use case."Three actions are supported: set_severity, suppress, and
set_description. User rules always take precedence over default
rules, regardless of specificity. Suppressed findings are tracked
separately so users can see what would have fired and why it didn't.
Run pydepgate explain --list to see all default rules and signals
with descriptions. Complete rules file specification:
docs/reference/rules-file.md. Worked
walkthroughs of common rule-writing tasks:
docs/guides/custom-rules.md.
| Section | Contents |
|---|---|
| Getting Started | First scan, reading output, using explain |
| CLI Reference | All subcommands, all flags, environment variables |
| CLI Reference: db | Store, query, and explain local scan evidence |
| CLI Reference: cvedb | Build and inspect the local CVE database |
| CLI Reference: cvescan | Match artifacts against known vulnerable versions |
| Finding Fingerprint v1 | Deterministic finding fingerprint specification |
| Signals Reference | Every signal ID with severity tables per file kind |
| Rules File | pydepgate.gate format specification |
| Exit Codes | Exit code contract and CI implications |
| Output Formats | Human, JSON, SARIF schemas |
| Guide: CI Integration | GitHub Actions, GitLab CI, pre-commit, Docker-in-CI |
| Guide: Docker Image | Container tags, digests, verification, runtime properties |
| Guide: Custom Rules | Suppressing false positives, scoping rules |
| Guide: Decode Payloads | Recursive decode, IOC sidecars, encrypted archives |
| Guide: SARIF Integration | GitHub Code Scanning ingestion |
- Zero runtime dependencies. Standard library only. This is a load-bearing design constraint, not a stylistic preference: every additional dependency is a supply-chain attack surface for a tool whose job is to defend against supply-chain attacks.
- Safe by construction. Parsers and the partial evaluator never execute, compile, or import input content. Every operation modeled by the resolver is reimplemented from scratch using only Python builtins on values the resolver itself produced.
- Self-integrity at bootstrap. Critical stdlib references are captured into locals before any untrusted code runs (relevant when the runtime engine ships in v0.4).
- Lightweight. The full test suite runs in roughly twenty seconds on the lowest available options in Codespaces, including subprocess-based CLI tests against installed packages.
- Verifiable release artifacts. Container releases are built around digest identity rather than tag trust. From 0.5.0 onward, supported platform images are signed, attested, emitted with provenance/SBOM metadata, smoke-tested by digest, built from verified package inputs, and checked for reproducibility.
The codebase is organized as a layered pipeline.
Analyzers do not see raw bytes. They walk parsed representations and
emit Signal objects. The rules engine wraps signals with severity to
produce Finding objects, applying user and default rules in priority
order. The CLI renders findings in human, JSON, or SARIF format.
The _resolver.py module is reusable infrastructure for any analyzer
that needs to know what an expression evaluates to. It returns
structured ResolutionResult objects with success/failure status,
operation counts, partial values, and resolved fragment lists.
The static engine exposes three entry points for single-file analysis.
scan_file(path) reads bytes and routes through triage by filename.
scan_bytes(content, internal_path, ...) is the per-file workhorse
that artifact enumerators (wheel, sdist, installed) call once per
in-scope file. scan_loose_file_as(path, file_kind) bypasses triage
entirely and forces a file kind, preserving the real path through to
finding contexts; this is the entry point used by
pydepgate scan --single.
git clone https://github.com/nuclear-treestump/pydepgate
cd pydepgate
pip install -e .
python -m unittest discover tests -vThe test suite has grown to over 1600 tests as the analyzer set has expanded. Tests are organized by module and include happy-path coverage, evasion batteries, false-positive batteries, robustness checks against adversarial inputs, integration tests against synthetic wheels and sdists, and CLI tests via subprocess.
To regenerate the binary .pth test fixtures after editing them:
python scripts/generate_fixtures.pyContributors: see CONTRIBUTING.md for the issue process, sign-off requirements, and contribution scope.
This project builds tooling to defend against Python supply-chain
attacks. The test fixtures in tests/fixtures/ and the synthetic
samples used in integration tests model the structural shape of
known attacks (LiteLLM 1.82.8, Trojan Source CVE-2021-42574, others
catalogued under T1546.018) but contain only inert payloads. No
actual malicious code is present in this repository.
For regression testing against real malicious samples, use the OSSF malicious-packages, Datadog malicious-software-packages-dataset, or lxyeternal/pypi_malregistry datasets. Do so in disposable VMs or containers, and do not commit samples to this repository.
pydepgate's static analysis is honest about what it can and cannot catch. Documented gaps include:
Analysis gaps:
- Function return tracking.
code = make_payload()wheremake_payload()internally callscompile(...)is not flagged. __builtins__as a Name subscript (rather than via a function call).- Tuple unpacking, augmented assignment, and conditional assignments in the resolver's variable tracking.
- Lambda scope precision (lambdas count as their enclosing scope).
- Aliased stdlib imports such as
from subprocess import Popen as P.
Density-layer caveats:
DENS020(low-vowel-ratio identifiers) andDENS040(AST depth) both produce false positives on legitimate machine-generated code (Cython output, parser tables, generated configuration). They ship atLOWseverity outside startup vectors so they surface as contributing signals rather than standalone alerts.DENS031(homoglyphs) can fire on legitimate non-English variable names in non-Latin codebases. The default rule keeps it atHIGHrather thanCRITICALoutside startup vectors so users with intentional non-Latin naming can suppress with a single user rule.
Supply-chain security is too important to be a function of corporate goodwill. This project exists because the current state of Python supply-chain defense is not acceptable, and it will continue to exist on those terms.
This project will not be sold, transferred to a corporation, or made part of any employment or work agreement that could capture or stifle it. If a time comes when development by the current maintainer is no longer possible, the maintainer commits to finding a successor who will be held to the same conditions. If no such successor can be found, the project will be archived rather than placed under corporate control.
Built by Ikari (@0xIkari) - Python and security engineering. Available for security engineering roles; pydepgate remains independent under the terms of the Promise above.
- LinkedIn: zmillersecengineer
- Email: ikari@nuclear-treestump.com
Apache 2.0. See LICENSE.