High-speed file integrity and baseline scanner. Walks one or more roots, measures data gravity (bytes), and classifies large files as attested or unattested based on sibling sidecar presence (.f33, .sig, .asc, .sha256, .sha512, .md5, .pem). Emits checksum baselines (Hash Filename format), CSV, or JSON for use with fors33-verifier.
Trust model: The scanner is an O(1) discovery and liability mapping tool based on sidecar presence only. It does not validate Ed25519 signatures or cryptographic proof of baselines. For full cryptographic verification, use fors33-verifier.
For machine parsing, see LLM_CONTEXT.md.
pip install fors33-scannerScan the current directory (default root) with a 1 MB threshold:
fors33-scanner --threshold-mb 1.0Scan multiple roots:
fors33-scanner --root /var/log --root /data/telemetry --threshold-mb 10Emit JSON instead of human output (for CI, pipelines):
fors33-scanner --root /data --jsonFail CI/CD when exposure breaches policy threshold:
fors33-scanner --root /data --max-exposure 5.0 --jsonThrottle hashing workers for shared runners:
fors33-scanner --root /data --workers 2Stream SIEM-ready JSONL events (records + summary):
fors33-scanner --root /data --emit-jsonl -Depth-limit traversal (0=root only, 1=root + direct children):
fors33-scanner --root /data --max-depth 1Strict audit (fail on permission or file-lock errors instead of skipping):
fors33-scanner --root /data --strict-auditRecord TSA endpoint for tooling that reads FORS33_TSA_URL:
fors33-scanner --tsa-url https://tsa.example.com/rfc3161Worker count: positive --workers wins; otherwise a positive FORS33_WORKERS; otherwise default_dpk_worker_count() (uses cpu_count and optional FORS33_DPK_MAX_WORKERS). Non-positive values mean auto. Hard cap 64.
Large-file hashing uses FORS33_MMAP_MIN_MB / FORS33_MMAP_MAX_MB (defaults 500 / 4000), clamped to cgroup/RAM ceiling on Linux; optional FORS33_MMAP_PSI_SOME_AVG10_MAX disables mmap under memory pressure.
For production Docker or CI, pin a semver image tag or immutable digest instead of relying on :latest alone.
Generate checksum baseline (sha256, sha512, or blake3 per --algo):
fors33-scanner --root /data --emit-checksums fors33_baseline.sha256
fors33-scanner --root /data --algo sha512 --emit-checksums fors33_baseline.sha512Emit CSV or JSON baseline (compatible with fors33-verifier):
fors33-scanner --root /data --emit-csv fors33_baseline.csv
fors33-scanner --root /data --emit-json fors33_baseline.jsonAdd compliance exposure text to human output (default is strictly mathematical):
fors33-scanner --root /data --compliance-report0: successful scan / threshold not breached1: exposure threshold breach (--max-exposure)2: invocation/parameter misuse, or--strict-auditI/O access failure130: user interrupted scan (Ctrl+C)
Default human output (mathematical only):
[FILE COUNT] : 14,205
[TOTAL BYTES] : 2.1 TB
[ATTESTED] : 48 files, 4.1 GB
[UNATTESTED] : 264 files, 2.1 TB
[ELAPSED] : 4.20s
- Read-only: does not modify files or sidecars.
- Scan-only: O(1) discovery; baseline generation uses streaming chunked hashing.
- Excludes common dirs (.git, node_modules, venv, etc). Respects .f33ignore and --ignore-pattern / --exclude-dir.
- Legal notice prints to
stderron startup so data/JSON streams onstdoutremain parse-safe. - See
DISCLAIMER.mdfor enterprise legal/regulatory boundaries.
--emit-jsonl PATHemits one flat JSON object per line.- Multi-root scans include both
root_indexandroot_pathin eachscan_record. timestamprepresents hash completion time.- Final line is
scan_summarywith aggregate stats and scan parameters. - If
--emit-jsonl -and--jsonare both requested, JSONL takes precedence onstdout.
- Docker publish is manual via
workflow_dispatchwith explicitversionandpush_latestinputs. - Use
v0.6.0style version tags andlatestonly when manually approved.
Python 3.9+. Optional blake3 for BLAKE3 hashing. Linux, macOS, Windows.
MIT License. See LICENSE.