Skip to content

Conalh/overreach

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

overreach

CI release license Rust no LLM local-only deps

Catch code that overreaches before it merges.

overreach is a fast, local CLI that scans a diff, file, or repo for capability drift: outbound network calls, subprocess spawns, sensitive-file reads, curl | sh, disabled TLS verification, and hardcoded secrets. Think ripgrep, but for what this code is allowed to touch.

It is built for AI-assisted code review. The risky part of an agent PR is often not the feature itself — it's the quiet fetch, execSync, or .env read that appeared beside it. Treat overreach as a tripwire for capability drift, not a containment boundary: it favors recall and runs on every push, but a determined adversary can evade regex (string concatenation, base64, dynamic eval/import, or moving the call into a dependency) — so it's a fast first pass, not a security guarantee.

cargo install --git https://github.com/Conalh/overreach --tag v0.2.0 --locked
git diff | overreach --diff

No signup. No daemon. No telemetry. No network at scan time. It reads your diff and exits.

flowchart LR
    A["git diff (--diff)<br/>· file · repo"] --> S["Line scanner<br/>added lines · UTF-8 · 8 MiB cap"]
    S --> D["Detectors<br/>pipe-to-shell · secrets · sensitive-fs<br/>network · subprocess · TLS-off"]
    S --> G["Coverage gaps<br/>unreadable · non-UTF-8 · oversize"]
    D --> R["Graded findings<br/>critical · high · medium · low"]
    G --> R
    R --> O["Report<br/>human · --json"]
    O --> F{"--fail-on"}
    F -->|below or clean| P["exit 0<br/>pass"]
    F -->|at or above| X["exit 1<br/>fail"]
    S -.->|unreadable entrypoint| T["exit 2<br/>couldn't scan"]

    classDef in fill:#1e293b,stroke:#334155,color:#e2e8f0
    classDef core fill:#0f172a,stroke:#1e293b,color:#e2e8f0,stroke-width:2px
    classDef out fill:#0c4a6e,stroke:#0369a1,color:#e0f2fe
    class A in
    class S,D,G,R,O core
    class F,P,X,T out
Loading
$ git diff | overreach --diff
CRITICAL  src/util.js:15  [pipe_to_shell]
          Downloads a script and pipes it straight into a shell
CRITICAL  src/util.js:16  [hardcoded_secret]
          Possible hardcoded Anthropic credential (value redacted)
    HIGH  src/util.js:13  [network_call]
          Makes an outbound network call

3 finding(s): 2 critical, 1 high, 0 medium, 0 low
FAIL (findings at/above critical)

See also: SECURITY.md for the threat model and self-guarantees · CHANGELOG.md for release history · part of the agent-gov suite.

Why

Code review optimizes for "is the feature correct?" — not "did this diff quietly gain a new capability?" As autonomous agents get write access to real repositories, the second question is the one that bites. overreach is a fast, zero-config first pass that answers it, locally, before anything merges.

  • Diff-aware. Scans only the added lines of a unified diff, so you see what a change introduces, not what was already there.
  • Secrets are reported, never echoed. A hardcoded key is flagged by provider ("Anthropic", "AWS") with the literal value redacted — overreach never prints a credential back at you.
  • CI-ready. --json output and a configurable --fail-on severity make it a one-line PR gate.
  • Dependency-light. Three crates (regex, serde, serde_json). No network access at scan time, no telemetry, nothing phones home.

Install

# Recommended: install the pinned release
cargo install --git https://github.com/Conalh/overreach --tag v0.2.0 --locked

# From a local checkout
cargo install --path . --locked

# From main, if you want unreleased changes
cargo install --git https://github.com/Conalh/overreach --locked

Produces a single static-ish binary; drop it anywhere on PATH. A crates.io release (cargo install overreach) is planned but not yet published — install from source or Git for now.

Usage

overreach [PATH]                  # scan a file or directory (default: .)
git diff | overreach --diff       # scan only the added lines of a diff
overreach --diff --json           # machine-readable output for CI
overreach . --fail-on high        # also fail on new network calls / subprocess spawns
Flag Effect
--diff Read a unified diff from stdin; scan added lines only
--json Emit findings + summary as JSON
--fail-on <level> Exit non-zero at/above low|medium|high|critical (default critical)
-h, --help Help
-V, --version Version

Exit codes — so it slots straight into a pipeline:

Code Meaning
0 scanned; nothing at/above --fail-on
1 findings at/above --fail-on
2 could not scan (unreadable entrypoint) or invalid invocation

By default the gate fails only on critical findings (pipe_to_shell, hardcoded_secret, sensitive_fs_read). Adding a network call or shelling out is normal feature work, so network_call and subprocess_spawn are reported as high but do not fail the build unless you opt into --fail-on high — a gate that red-X'd every routine PR is a gate that gets switched off.

What it detects

Kind Severity What it flags
pipe_to_shell critical curl/wget … | sh — downloading and executing a script in one breath
hardcoded_secret critical Provider-prefixed credentials (Anthropic, OpenAI, GitHub, AWS, Slack, Google, GitLab, Stripe). Value redacted.
sensitive_fs_read critical References to .ssh/, id_rsa, /etc/passwd, .aws/credentials, .env, .npmrc
network_call high fetch, axios, XMLHttpRequest, WebSocket, raw sockets; Python requests/urllib/httpx/aiohttp; Node/Go http.get/http.Get; Rust reqwest/TcpStream; JVM/.NET HttpClient/openConnection
subprocess_spawn high child_process, execSync/execFile, top-level exec(, .spawn(; Python subprocess.*/os.system; Rust process::Command; Go exec.Command; Java Runtime.exec/ProcessBuilder; C# Process.Start
tls_verification_disabled medium rejectUnauthorized: false, verify=False, InsecureSkipVerify: true, NODE_TLS_REJECT_UNAUTHORIZED=0 (the insecure value only), Python ssl._create_unverified_context
file_too_large_to_scan, file_not_utf8, file_unreadable, directory_unreadable, path_unreadable low Coverage gaps, not content findings: a path that was skipped mid-walk (over the 8 MiB cap, not UTF-8, or unreadable) is surfaced so a clean report can't hide something unscanned. An unreadable entrypoint is a hard error (exit 2), not a low finding.

This is a fast, regex-based first pass — it favors recall over perfect precision. It is not a full taint analysis; treat findings as "look here," not "proven exploit." And because matching is literal regex, a determined adversary can evade it — string concatenation, base64, dynamic eval/import, or relocating the call into a dependency all slip past — so overreach surfaces drift and carelessness, not a motivated attacker. It deliberately does not flag method-call lookalikes such as JS regex.exec(str) or Rust tokio::spawn(...) as subprocesses, and process.env.X is not treated as a .env file read.

Matching is line-literal: overreach scans raw source lines and does not strip comments or string literals, so a comment or string that merely mentions a trigger token (a TODO about axios, a docstring naming subprocess.run) can be flagged. That's the trade-off for a zero-config first pass that favors recall — eyeball low-context findings before wiring it as a hard gate on comment-heavy code, and prefer scanning --diff (added lines only) over whole files.

Language coverage

Being regex-based, overreach is language-agnostic in that it scans any UTF-8 text file — but the detector patterns are tuned per language. Today's coverage, by detector:

Language network_call subprocess_spawn tls_verification_disabled
JS / TS
Python
Go
Rust
Java
C# / .NET

pipe_to_shell, sensitive_fs_read, and hardcoded_secret are language-independent — they match shell invocations, sensitive file paths, and provider key formats regardless of language. Coverage is additive: adding a language means adding patterns, never rewriting the engine. Gaps (Ruby, PHP, C/C++, shell beyond curl|sh, and the dashes above) are good first contributions.

How it compares

overreach is deliberately narrow: a fast, zero-config first pass for capability drift in a diff. It does not try to beat a real static analyzer at depth or a dedicated secret scanner at secrets — it tries to be the thing you can drop into any PR as one binary with no rules to write. The honest landscape, assuming typical out-of-the-box or common PR-gate usage:

Capability drift
(net · subprocess · TLS)
Hardcoded secrets Diff-aware first pass Zero-config Analysis depth Footprint
overreach ✅ built-in ⚠️ 8 providers, no entropy/verify ✅ added lines only ✅ no rules ❌ regex, no taint/dataflow ✅ one static binary · 3 deps · no network at scan time
ripgrep / grep ⚠️ DIY patterns ⚠️ DIY patterns ⚠️ git diff | rg
Semgrep ✅ via rules ✅ (validated in Pro) ✅ baseline scan ⚠️ rules required ✅ AST + dataflow ⚠️ heavier · cloud for best
gitleaks / TruffleHog ❌ secrets only ✅ entropy · git history · live verify
CodeQL ✅ via queries ⚠️ secondary; not its primary use ⚠️ PR alerts, whole-DB scan ❌ build DB + query packs ✅ semantic dataflow ❌ minutes per run · heavy

The table lists classic tooling, but in 2026 the louder competition is the cloud/ML AI-SAST wave — DryRun, Aikido, Snyk Code, Checkmarx, GitHub Advanced Security. They go far deeper than overreach, and that's the point: they are the depth tier. overreach lives at the opposite end — local, deterministic, no-AI, no-telemetry, free, and instant. It never competes on analysis depth; it competes on being the zero-friction tripwire you can run on every push with nothing sent anywhere.

Reach for something else when:

  • you need semantic dataflow or taint analysis (Semgrep, CodeQL);
  • you need full secret-scanning coverage across git history (gitleaks, TruffleHog);
  • you need live key validation;
  • you need policy-as-code rules maintained by a security team.

Reach for overreach when:

  • you want a fast, local tripwire for new network, subprocess, TLS-disable, sensitive-file, or hardcoded-secret patterns in a diff.

The capability grid above is a positioning sketch — every tool is more configurable than one table can show, and the right answer is often overreach and one of them. Speed, on the other hand, is measured:

Speed

overreach scans a 40 MiB / 3,000-file tree for all six detector families in ~0.13 s — near ripgrep-class speed while running every detector family, and far ahead of gitleaks, TruffleHog, grep, and Semgrep. Fast enough to run on every push.

The benchmark below is reproducible from bench/; it scans an identical generated corpus and reports wall-clock time. These tools do not all solve the same problem — gitleaks/TruffleHog do secrets only (with entropy and live verification); Semgrep does the AST/dataflow analysis overreach deliberately skips — so slower does not mean worse.

overreach vs ripgrep, grep, gitleaks, TruffleHog and Semgrep — scan time on an identical corpus

Tool 40 MiB / 3k-file scan what it is
ripgrep (parallel) 68 ms the fastest line scanner
overreach (parallel) 127 ms all six detector families
ripgrep -j1 215 ms single-threaded, same engine
gitleaks 654 ms secrets only · entropy, git history
TruffleHog 1.11 s secrets only · live key verification
grep -rP 2.82 s naive recursive baseline
Semgrep 34.8 s AST framework running equivalent pattern checks

"Clean" never means "didn't scan"

The worst failure mode for a security tool is a silent gap that reads as a pass. overreach is built so a clean report is trustworthy:

  • Symlink-safe walk. Mid-walk, symlinks are skipped via symlink_metadata, so a hostile checkout can't escape the scan root (link -> /etc) or loop forever (link -> ..). A user-supplied entrypoint is followed exactly once, as deliberate intent.
  • Skips are surfaced, not swallowed. A file that's over the 8 MiB cap, not valid UTF-8, or unreadable — an unreadable directory, or a path that can't be stat'd mid-walk — each becomes a low-severity coverage-gap finding (file_too_large_to_scan, file_not_utf8, file_unreadable, directory_unreadable, path_unreadable) rather than vanishing. An attacker can't hide a key in a 9 MB blob, a binary blob, or a locked-down file and get back "clean."
  • An unscannable entrypoint is a hard failure, not a pass. If the path you point overreach at can't be read at all (missing, permission denied), it exits 2 — distinct from 1 (findings at/above --fail-on) and 0 (scanned clean). A security gate must never exit 0 on something it couldn't scan.
  • Redaction is pinned by a canary test. rendered_output_never_echoes_a_credential_value plants a sentinel secret and asserts it never appears in any rendered output, so a future change can't accidentally start printing credentials. The exit-code contract, diff line-numbering, the ++-vs-header edge case, every coverage gap above, and the false-positive guards (regex.exec, tokio::spawn) are all pinned by unit and CLI integration tests.

In CI

# .github/workflows/overreach.yml
name: overreach
on: pull_request

# Minimum-privilege token: this job only needs to read source.
permissions:
  contents: read

jobs:
  overreach:
    runs-on: ubuntu-latest
    steps:
      # Pin third-party actions to commit SHAs (with the tag as a comment)
      # so a compromised upstream tag can't silently change what runs in CI.
      - uses: actions/checkout@93cb6efe18208431cddfb8368fd83d5badbf9bfd # v5
        with:
          fetch-depth: 0
          # Don't leave GITHUB_TOKEN in .git/config — `cargo build` runs
          # build scripts from every transitive dep.
          persist-credentials: false
      - uses: dtolnay/rust-toolchain@29eef336d9b2848a0b548edc03f92a220660cdb8 # stable
      # --locked enforces the committed Cargo.lock.
      - run: cargo build --release --locked
      - name: Scan the PR diff
        env:
          # Route the trigger value through env so it can't be interpolated
          # into the shell script body.
          BASE_REF: ${{ github.base_ref }}
        # pipefail so a failed `git diff` can't be masked by the trailing pipe
        # and let the scan report "clean" on an unscanned diff.
        run: |
          set -euo pipefail
          # The default gate fails only on critical findings (curl|sh, secrets,
          # sensitive-file reads). Add --fail-on high to also block new network
          # calls and subprocess spawns.
          git diff "origin/$BASE_REF...HEAD" | ./target/release/overreach --diff

Using the composite action

The repo also ships a composite action (action.yml) that builds overreach and scans the PR diff. It does not check out your code — you must run actions/checkout first, with fetch-depth: 0 so the base ref exists to diff against:

name: overreach
on: pull_request

permissions:
  contents: read

jobs:
  overreach:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@93cb6efe18208431cddfb8368fd83d5badbf9bfd # v5
        with:
          fetch-depth: 0
          persist-credentials: false
      - uses: Conalh/overreach@v0.2.0
        # Defaults to scanning the PR diff and failing on critical findings.
        # with:
        #   fail-on: high   # also block new network calls / subprocess spawns
        #   path: src/      # scan a path in full-tree mode instead of the PR diff

Where this fits

overreach is the standalone, language-agnostic cousin of CapabilityEcho from the agent-gov suite — the same idea (catch capability drift in a diff), repackaged as one fast binary with no Node and no suite to adopt. Use overreach for a quick gate anywhere; reach for the full agent-gov suite when you want cross-tool consolidation and a single PR verdict.

License

MIT © Conal Hickey

About

Catch code that overreaches before it merges — a fast capability scanner for diffs, files, and repos.

Topics

Resources

License

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors