See what your AI coding agent actually did. Local audit of Claude Code session transcripts: leaked secrets, risky shell commands, unsafe file edits, hook bypasses — plus per-project / per-model token usage. Runs entirely on your machine, nothing uploaded.
Today: Claude Code only. Cursor and Windsurf session-format adapters are on the roadmap but not yet shipped — see Roadmap. Released: v0.2.5 on npm with signed provenance via OIDC trusted publishing. Verify with
npm audit signatures. Status: detection logic and per-project / per-model token reporting are shipped; release pipeline and signed provenance are shipped; a red-team golden set and drift monitoring are still in flight. Full posture in Governance.
AI coding agents happily run curl | sudo bash, write to ~/.ssh/authorized_keys, or commit with --no-verify when things get in their way. The transcripts that record this live on your disk — and often contain .env contents, API keys fed to the model, or tool outputs that captured credentials. Nobody reads them after the fact.
agentaudit is a grep with manners for those transcripts. Run it weekly (or in a hook) and get a punch list of things worth a second look.
# From npm (recommended)
npm install -g @product-nomad/agentaudit
agentaudit --version
# Or from source
git clone https://github.com/Product-nomad/agentaudit.git
cd agentaudit
npm ci && npm run build
npm link # puts `agentaudit` on PATHRequires Node 22+. The package is published scoped as @product-nomad/agentaudit; the CLI binary is agentaudit either way.
Every release from v0.2.1 onwards is published from GitHub Actions via npm OIDC trusted publishing, with a signed provenance attestation linking the tarball to a specific commit and workflow run. To verify what you just installed:
npm audit signaturesProvenance records for each version are also visible on https://www.npmjs.com/package/@product-nomad/agentaudit.
agentaudit audit # scan ~/.claude/projects/**/*.jsonl
agentaudit audit path/to/session.jsonl # scan specific files
agentaudit audit --min high # only show high/critical
agentaudit audit --json # machine-readable output
agentaudit audit --group-by rule # group by rule instead of session
agentaudit report # per-project / per-model token rollup
agentaudit report --top 5 # top N sessions by output tokens
agentaudit report --since 7d # only sessions active in the last 7 days
agentaudit report --since 2026-04-01 # or an ISO date
agentaudit report --tag-config clients.json # group by client tag (see Billing below)
agentaudit report --json # full UsageReport as JSON
agentaudit report --csv # one row per session × model, invoice-ready
agentaudit rules # list all audit checksExit code mirrors the highest severity found: 30 (critical), 20 (high), 10 (medium), 0 otherwise. Useful in a shell hook or cron:
agentaudit audit --min high || notify-send "Claude Code session findings"| Rule | Severity | What it catches |
|---|---|---|
secrets.in-user-prompt |
critical | API keys, tokens, private keys pasted by the human into a prompt |
secrets.in-tool-result |
high | Same patterns appearing in tool output fed back to the model |
bash.risky-command |
critical–low | rm -rf /, curl | sh, dd of=/dev/sdX, world-writable chmod, firewall flush, git push --force, etc. |
fs.sensitive-path-write |
critical–medium | Write/Edit targeting ~/.ssh, ~/.aws, ~/.gnupg, .env, /etc/sudoers, shell rc files, systemd units… |
git.hook-bypass |
medium | git commit --no-verify, -c commit.gpgsign=false, SKIP=…; also flags user prompts explicitly asking for the bypass |
Patterns are deliberately conservative. False positives erode trust; a false negative is a known limit we'd rather improve with a pull request than paper over.
Everything runs locally. agentaudit reads files under ~/.claude/projects/, processes them in memory, and prints to stdout. No network, no telemetry, no config to opt out of.
Findings may contain excerpts of prompts or command strings — don't paste --json output anywhere public without reviewing it first.
- Pattern-based, not semantic. It flags
curl | sh, not "any command that downloads and executes remote code." Your adversary is distracted engineers, not a determined attacker. - Single-session scope. It doesn't correlate across sessions or detect slow exfiltration.
- Self-matching. Rules that look for flag strings will match commands whose data contains those strings (e.g.
node -e 'const re = /--no-verify/'). We suppress the common interpreter-eval case; the long tail is a known gap. - Transcript fidelity. We parse Anthropic's Claude Code JSONL format as observed in late April 2026. The schema is private API and can change.
If you're a freelancer or consultant using Claude Code across multiple client projects, tag sessions by cwd with a small JSON file:
{
"clients": [
{ "pattern": "/home/me/clients/acme", "name": "acme" },
{ "pattern": "/home/me/clients/beta", "name": "beta" },
{ "pattern": "/personal/", "name": "personal" }
]
}Save as ~/.config/agentaudit/clients.json (default) or pass --tag-config path/to/file.json. Patterns are literal substrings unless wrapped in /.../ (then parsed as a regex; first match wins).
Then:
agentaudit report --since 2026-04-01→ monthly-close token totals by client.agentaudit report --csv > april.csv→ one row per session × model, ready for a pivot table.
agentaudit reports token counts, not dollars. Per-session dollar values need current pricing; see the decision log for the reasoning and the planned pricing-file support.
- Dollar-cost estimation via a dated, sourced
pricing.json(next up). - Cursor / Windsurf session format adapters.
- Custom rules via a small plugin interface (rules are plain objects — already pluggable internally).
- HTML report output for weekly review.
- Golden red-team fixture set for Phase V metric #2.
Where the project is, what's committed to, and what would make us archive it.
| Status | What |
|---|---|
| ✅ Shipped | Threat model + scope. Streaming JSONL parser (tolerant of malformed lines, documented in src/types.ts). Five rule families: secrets-in-prompt, secrets-in-tool-result, risky-bash, sensitive-path-edit, hook-bypass — 13 bash patterns, 15 sensitive paths, 15 secret patterns. 91 unit tests. Per-project / per-model token usage with CSV export. v0.2.5 published on npm with OIDC trusted publishing and signed provenance. CI on every push. |
| 🟡 In flight | Real-session validation against the developer's own corpus (17 sessions / 925 events as of 2026-04-25). Golden red-team fixture set (planted-secret + risky-command scenarios with known-true labels). Drift monitoring (rule-coverage / FP-rate / scan-time, scheduled weekly against a frozen corpus). |
| ⏳ Not started | Cursor and Windsurf session-format adapters. Pricing-file support to convert tokens to dollar estimates. Custom rules via plugin interface. |
Set early, reviewed before each release.
- Precision on real sessions. At minimum 95% true-positive rate in the
medium+bucket on the developer's own session history. Current: 2/2 (100%, n=2). - Coverage of planted-secret golden set. Detects ≥ 80% of a planned red-team fixture set of realistic leak scenarios. Current: not yet measured (golden set to be built in Phase V).
- Scan performance. Scans 1,000 events under 1 second on a modern laptop. Current: ~800 events in ~350ms wall-clock (release build).
- Privacy by default. Local-only. No network. No telemetry. Reads only under
~/.claude/projects/by default. - Transparency. OSS (MIT). Known gaps listed in the threat model in this repo.
- Data minimisation. Output redacts secret values (
ghp_…abcd) and shows only the matching line for long commands. - Non-harm. We refuse to add: telemetry, content phone-home, surveillance of other users' sessions, or anything that would enable covert monitoring.
- Rule coverage drift — matches per 1k events, tracked weekly against the scanned session corpus.
- False-positive rate — tracked against a labelled sample; alert if it exceeds 5%.
- Schema drift — alerted if new
typevalues appear in session JSONL that we don't recognise (Anthropic's private schema can evolve). - Scan-time drift — alerted if wall-clock-per-event regresses > 2×.
Material decisions are recorded in this repo's decision log (one paragraph per decision, dated). Change history sits alongside it — current release v0.2.5.
Archive this project if any of these become true:
- Anthropic (or the Cursor / Windsurf vendors, once their adapters land) ships a first-party local-audit tool that covers the same surface (secrets, risky shell, sensitive-path writes, hook bypasses) with comparable transparency.
- The session-transcript schemas converge on a stable public format and the rule set becomes a thin wrapper around that — at which point the rules belong upstream, not in a sidecar.
- Maintenance burden of tracking schema drift across three vendors exceeds the value delivered to users (measured against the outcome metrics above).
- The threat model assumption — local-only, single-user box, agent is trusted but its data is not — stops holding for the configurations users actually run.
MIT.