diff --git a/README.md b/README.md index 7c4e61b..59c4a96 100644 --- a/README.md +++ b/README.md @@ -1,359 +1,318 @@ -# Leakwatch +
+ +Leakwatch — detect, verify & report leaked secrets + +**Detect, verify & report leaked secrets across code, Git history, containers, and the cloud.** +Open source (MIT) · single binary · built for CI. [![CI](https://github.com/HodeTech/Leakwatch/actions/workflows/ci.yml/badge.svg)](https://github.com/HodeTech/Leakwatch/actions/workflows/ci.yml) -[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) +[![Release](https://img.shields.io/github/v/release/HodeTech/Leakwatch?sort=semver&color=e6394d)](https://github.com/HodeTech/Leakwatch/releases/latest) +[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE) [![Go Report Card](https://goreportcard.com/badge/github.com/HodeTech/leakwatch)](https://goreportcard.com/report/github.com/HodeTech/leakwatch) -[![Go Reference](https://pkg.go.dev/badge/github.com/HodeTech/leakwatch.svg)](https://pkg.go.dev/github.com/HodeTech/leakwatch) -[![GitHub Marketplace](https://img.shields.io/badge/Marketplace-Leakwatch%20Secret%20Scanner-2ea44f?logo=github)](https://github.com/marketplace/actions/leakwatch-secret-scanner) +[![GitHub Marketplace](https://img.shields.io/badge/Marketplace-Leakwatch-2ea44f?logo=github)](https://github.com/marketplace/actions/leakwatch-secret-scanner) -> Next-generation secret scanning platform — fast, accurate, open source. +[Quick Start](#quick-start) · [GitHub Action](#github-action) · [Verification](#is-it-still-live) · [Detectors](#detectors) · [Docs](https://hodetech.github.io/Leakwatch/) -**Leakwatch** is a high-performance security tool that detects, verifies, and reports leaked secrets (API keys, passwords, certificates) in codebases, Git histories, and container images. +
--- -## Why Leakwatch? +## What is Leakwatch? -| Feature | Leakwatch | TruffleHog | Gitleaks | -|---------|-----------|------------|----------| -| **License** | MIT | AGPL-3.0 | MIT [^gl-action] | -| **Secret Verification** | Yes (54 verifiers, 51 packages) | Yes | No | -| **Container Scanning** | Yes | Yes | No | -| **SARIF Output** | Yes | No [^th-sarif] | Yes | -| **Aho-Corasick Prefilter** | Yes | Yes | Yes | -| **Entropy Analysis** | Yes | Yes | Yes | -| **Custom Rules** | YAML | YAML (config) | TOML | - -[^gl-action]: The Gitleaks CLI is MIT-licensed. The official `gitleaks-action` GitHub Action, however, runs under a commercial EULA and requires a (free) license key for **organization** accounts (personal accounts are exempt). -[^th-sarif]: TruffleHog emits JSON / plain / GitHub-Actions output; it has no native SARIF formatter (SARIF requires an external converter). All three tools use Aho-Corasick keyword pre-filtering and Shannon-entropy filtering, and all three support custom rules (Leakwatch: YAML, TruffleHog: `config.yaml` `detectors:` block, Gitleaks: TOML). - -**What makes Leakwatch different:** -- **MIT license _with_ verification** — Among these tools, Leakwatch is the only one that is both permissively licensed (MIT, unlike TruffleHog's AGPL-3.0) and performs live secret verification (unlike Gitleaks, which is detection-only) -- **85.7% verification coverage** — 54 of 63 detectors have live API or format validation verification -- **Verification + container + SARIF in one MIT binary** — TruffleHog lacks SARIF; Gitleaks lacks verification and container scanning -- **Easy extensibility** — YAML for simple rules, Go plugin for advanced ones -- **Single binary, zero dependencies** — Runs on every platform -- **Scan summary** — Every scan prints a summary to stderr (date, source, target, files scanned, duration, findings) -- **Colored terminal output** — Severity-colored table output (red=critical/high, yellow=medium, blue=low), auto-disabled for file output +Leaked API keys, tokens, and passwords are one of the most common causes of breaches. **Leakwatch** finds them across your **codebase, full Git history, container images, and cloud storage** — and then *verifies whether each secret is still live*, so you spend time on real incidents instead of triaging noise. ---- +```console +$ leakwatch scan fs . -## Quick Start +SEVERITY DETECTOR FILE REDACTED STATUS REMEDIATION +-------- -------- ---- -------- ------ ----------- +CRITICAL github-token config.env ****cdEF unverified - +CRITICAL database-connection-string config.env postgres://admin:****@db.prod.internal:5432/app unverified - +CRITICAL aws-access-key-id config.env ****MPLE unverified - + +Found 3 secrets (3 critical). +``` + +> Secret values are **redacted by default** and never written to disk or logs. See [Security](#security). -### Installation +## Features + +- **6 scan sources** — filesystem, Git history (every commit), container images, AWS S3, Google Cloud Storage, Slack +- **63 built-in detectors** + **YAML custom rules** (no Go code needed) +- **54 live verifiers (85.7%)** — confirms whether a secret is *still active*, not just present +- **5 output formats** — JSON, SARIF, CSV, terminal table, and **GitHub inline annotations** +- **Drop-in distribution** — GitHub Action (Marketplace), Docker image, Homebrew, `go install`, single static binary +- **Secret-safe** — redacted output by default; secrets are never logged or stored +- **Fast & CI-ready** — Aho-Corasick keyword pre-filter + Shannon entropy, concurrent worker pool, exit-code aware, SARIF → Code Scanning + +## Quick Start ```bash # Homebrew (macOS/Linux) brew install HodeTech/tap/leakwatch -# Go install +# Go go install github.com/HodeTech/leakwatch@latest # Docker -docker run --rm -v $(pwd):/scan ghcr.io/hodetech/leakwatch:latest scan fs /scan - -# Binary download — pick the archive for your OS/arch from the releases page: -# https://github.com/HodeTech/Leakwatch/releases (e.g. leakwatch_1.5.0_linux_amd64.tar.gz) +docker run --rm -v "$(pwd):/scan" ghcr.io/hodetech/leakwatch:latest scan fs /scan ``` -### Quick Setup +…or grab a prebuilt binary from the [releases page](https://github.com/HodeTech/Leakwatch/releases). Then: ```bash -# Generate a recommended .leakwatch.yaml in the current directory -leakwatch init +leakwatch scan fs . # scan the current directory +leakwatch scan git . # scan full Git history +leakwatch scan image nginx:latest # scan a container image +leakwatch scan fs . --format sarif -o results.sarif # SARIF for Code Scanning +leakwatch scan git . --only-verified # only secrets confirmed live (CLI verifies by default) +leakwatch init # generate a .leakwatch.yaml ``` -### Usage +
+More examples — cloud, Slack, multi-repo, remediation ```bash -# Scan current directory (default when no path given) -leakwatch scan fs - -# Scan a specific directory -leakwatch scan fs /path/to/project - -# Scan Git repository (full history) -leakwatch scan git /path/to/repo -leakwatch scan git https://github.com/org/repo.git - -# Scan container image -leakwatch scan image nginx:latest - -# Show only verified secrets -leakwatch scan git . --only-verified - -# Output in SARIF format -leakwatch scan fs . --format sarif --output results.sarif - -# Scan since last commit (for CI/CD) -leakwatch scan git . --since-commit HEAD~1 - -# Scan AWS S3 bucket leakwatch scan s3 my-bucket --prefix config/ - -# Scan Google Cloud Storage bucket leakwatch scan gcs my-bucket --prefix secrets/ - -# Scan Slack workspace leakwatch scan slack --token xoxb-... --channels general,engineering +leakwatch scan repos https://github.com/org/a.git https://github.com/org/b.git --parallel 5 +leakwatch scan git . --since-commit HEAD~1 # only new commits (great for CI) +leakwatch scan fs . --remediation # include rotation steps & doc links +``` -# Scan multiple repos in parallel -leakwatch scan repos https://github.com/org/repo1.git https://github.com/org/repo2.git --parallel 5 +
-# Include remediation guidance (rotation steps, doc links) -leakwatch scan fs . --remediation -``` +## GitHub Action ---- +Add secret scanning to any workflow in one line — published on the [GitHub Marketplace](https://github.com/marketplace/actions/leakwatch-secret-scanner): -## Supported Secret Types +```yaml +- uses: actions/checkout@v4 +- uses: HodeTech/Leakwatch@v1 + with: + scan-type: fs # fs | git | image +``` -**63 built-in secret detectors across 60 packages.** +- **`format: github`** → findings appear as **inline annotations** on the pull request. +- **`format: sarif` + `sarif-upload: true`** → findings show up as **Code Scanning alerts** (needs `permissions: security-events: write`). +- **`scan-diff: auto`** (git scans) → scans only the commits a PR/push introduces. -| Category | Detector | ID | Severity | -|----------|----------|----|----------| -| **Cloud — AWS** | Access Key ID | `aws-access-key-id` | Critical | -| **Cloud — GCP** | Service Account Key | `gcp-service-account` | Critical | -| **Cloud — Azure** | Storage Connection String | `azure-storage-key` | Critical | -| **Cloud — Azure** | Entra ID Client Secret | `azure-entra-secret` | Critical | -| **Cloud — Cloudflare** | API Token | `cloudflare-api-token` | Critical | -| **Cloud — DigitalOcean** | Personal Access Token | `digitalocean-token` | Critical | -| **Cloud — Heroku** | API Key | `heroku-api-key` | Critical | -| **Cloud — Vercel** | API Token | `vercel-token` | High | -| **AI/ML** | OpenAI API Key | `openai-api-key` | Critical | -| **AI/ML** | Anthropic API Key | `anthropic-api-key` | Critical | -| **AI/ML** | Hugging Face Token | `huggingface-token` | Critical | -| **AI/ML** | DeepSeek API Key | `deepseek-api-key` | Critical | -| **DevTools** | GitHub PAT | `github-token` | Critical | -| **DevTools** | GitHub OAuth Token | `github-oauth-token` | Critical | -| **DevTools** | GitLab PAT | `gitlab-pat` | Critical | -| **DevTools** | Bitbucket App Password | `bitbucket-app-password` | Critical | -| **DevTools** | NPM Token | `npm-token` | High | -| **DevTools** | PyPI Token | `pypi-api-token` | High | -| **DevTools** | RubyGems Key | `rubygems-api-key` | High | -| **DevTools** | Docker Hub PAT | `dockerhub-pat` | Critical | -| **CI/CD** | CircleCI Token | `circleci-token` | High | -| **CI/CD** | Terraform Cloud Token | `terraform-cloud-token` | Critical | -| **Communication** | Slack Bot Token | `slack-token` | Critical | -| **Communication** | Slack Webhook | `slack-webhook` | High | -| **Communication** | Discord Bot Token | `discord-bot-token` | Critical | -| **Communication** | Telegram Bot Token | `telegram-bot-token` | High | -| **Communication** | MS Teams Webhook | `teams-webhook` | High | -| **Email** | SendGrid API Key | `sendgrid-api-key` | Critical | -| **Email** | Mailgun API Key | `mailgun-api-key` | Critical | -| **Email** | Postmark Server Token | `postmark-server-token` | High | -| **Payment** | Stripe Live Key | `stripe-api-key-live` | Critical | -| **Payment** | Stripe Test Key | `stripe-api-key-test` | High | -| **Payment** | Coinbase API Key | `coinbase-api-key` | Critical | -| **Blockchain** | Infura API Key | `infura-api-key` | High | -| **Database** | Connection String (PG/MySQL/MongoDB) | `database-connection-string` | Critical | -| **Database** | Redis Connection | `redis-connection-string` | Critical | -| **Database** | Snowflake Credentials | `snowflake-credentials` | Critical | -| **Database** | RabbitMQ Connection | `rabbitmq-connection-string` | Critical | -| **Database** | Supabase Service Key | `supabase-service-key` | Critical | -| **Infrastructure** | FTP/SFTP Credentials | `ftp-credentials` | Critical | -| **Infrastructure** | LDAP Credentials | `ldap-credentials` | Critical | -| **Infrastructure** | Databricks PAT | `databricks-token` | Critical | -| **Identity** | JWT | `jwt` | High | -| **Identity** | Private Key (RSA/SSH/PGP) | `private-key` | Critical | -| **Identity** | Okta API Token | `okta-api-token` | Critical | -| **Identity** | Auth0 Management Token | `auth0-management-token` | Critical | -| **Identity** | HashiCorp Vault Token | `hashicorp-vault-token` | Critical | -| **Monitoring** | Datadog API Key | `datadog-api-key` | Critical | -| **Monitoring** | Grafana API Key | `grafana-api-key` | High | -| **Monitoring** | PagerDuty API Key | `pagerduty-api-key` | High | -| **Monitoring** | New Relic API Key | `newrelic-api-key` | High | -| **Monitoring** | Sentry Auth Token | `sentry-token` | High | -| **Security** | Snyk API Key | `snyk-api-key` | High | -| **Security** | Twilio API Key | `twilio-api-key` | Critical | -| **Secrets Mgmt** | Doppler Service Token | `doppler-token` | Critical | -| **Feature Flags** | LaunchDarkly SDK Key | `launchdarkly-sdk-key` | High | -| **Code Quality** | SonarCloud Token | `sonarcloud-token` | High | -| **SaaS** | Shopify Access Token | `shopify-access-token` | Critical | -| **SaaS** | Notion Token | `notion-token` | High | -| **SaaS** | Linear API Key | `linear-api-key` | High | -| **SaaS** | Figma PAT | `figma-pat` | High | -| **SaaS** | Airtable PAT | `airtable-pat` | High | -| **Generic** | Generic API Key | `generic-api-key` | Medium | -| **Custom** | YAML-defined rules | user-defined | user-defined | - -### Verification Coverage (54/63 — 85.7%) - -| Verification Type | Detectors | Description | -|-------------------|-----------|-------------| -| **Live API Verification** | `aws-access-key-id`, `github-token`, `github-oauth-token`, `gitlab-pat`, `slack-token`, `openai-api-key`, `anthropic-api-key`, `deepseek-api-key`, `huggingface-token`, `sendgrid-api-key`, `mailgun-api-key`, `postmark-server-token`, `stripe-api-key-live`, `stripe-api-key-test`, `digitalocean-token`, `cloudflare-api-token`, `heroku-api-key`, `vercel-token`, `npm-token`, `pypi-api-token`, `rubygems-api-key`, `dockerhub-pat`, `circleci-token`, `terraform-cloud-token`, `discord-bot-token`, `telegram-bot-token`, `sentry-token`, `pagerduty-api-key`, `newrelic-api-key`, `grafana-api-key`, `datadog-api-key`, `snyk-api-key`, `twilio-api-key`, `doppler-token`, `launchdarkly-sdk-key`, `sonarcloud-token`, `shopify-access-token`, `notion-token`, `linear-api-key`, `figma-pat`, `airtable-pat`, `okta-api-token`, `auth0-management-token`, `databricks-token`, `bitbucket-app-password`, `coinbase-api-key`, `supabase-service-key`, `infura-api-key`, `teams-webhook` | API call to provider to confirm active/inactive status | -| **Format Validation** | `azure-storage-key`, `azure-entra-secret`, `gcp-service-account`, `snowflake-credentials`, `rabbitmq-connection-string` | Structural check (decode, parse) without network call | -| **Not Verifiable** | `jwt`, `private-key`, `generic-api-key`, `database-connection-string`, `redis-connection-string`, `ftp-credentials`, `ldap-credentials`, `slack-webhook`, `hashicorp-vault-token` | No public verification API or verification would cause side effects | - -> **Can't find your secret type?** Leakwatch supports [YAML custom rules](docs/guides/custom-rules.md) — define your own detector in 5 lines of YAML without writing Go code. +Exit codes (used for CI gating): **`0`** no findings · **`1`** findings reported · **`2`** error. ---- +Full inputs and recipes: **[CI/CD Integration guide](docs/guides/ci-cd-integration.md)**. -## CI/CD Integration +## Is it still live? -### GitHub Actions +Detection is only half the job — a key that was already rotated isn't an incident. For most secret types, Leakwatch makes a **controlled, read-only API call** to the provider to confirm status: -```yaml -- uses: HodeTech/Leakwatch@v1 - with: - scan-type: git - no-verify: false # turn verification ON (required for only-verified) - only-verified: true # report only secrets confirmed live - sarif-upload: true -``` +| Tier | What it means | Coverage | +|------|---------------|----------| +| **Live verified** | Read-only API call confirms the key is active / inactive | ~49 detectors | +| **Format checked** | Structurally validated where no safe live check exists | 5 detectors | +| **Not verifiable** | No public API (e.g. JWTs, private keys) — detected & triaged manually | 9 detectors | -### Pre-commit Hook +That's **54 of 63 detectors (85.7%)** with verification. Verification is on by default for the CLI and off by default in the Action (to keep CI fast and offline) — flip it with `no-verify`. -```yaml -# .pre-commit-config.yaml -repos: - - repo: https://github.com/HodeTech/Leakwatch - rev: v1.5.0 - hooks: - - id: leakwatch -``` +## Why Leakwatch? ---- +| | **Leakwatch** | TruffleHog | Gitleaks | +|---|---|---|---| +| License | **MIT** | AGPL-3.0 | MIT [^gl] | +| Live secret verification | **Yes (54 verifiers)** | Yes | No | +| Container image scanning | **Yes** | Yes | No | +| Cloud sources (S3 / GCS / Slack) | **Yes** | No | No | +| SARIF output | **Yes** | No [^th] | Yes | +| Custom rules | **YAML** | YAML | TOML | +| Single static binary | **Yes** | Yes | Yes | + +**The short version:** Leakwatch is the only one of the three that is **both** permissively MIT-licensed **and** does live verification — plus container & cloud scanning and native SARIF, in one dependency-free binary. + +[^gl]: The Gitleaks CLI is MIT; the official `gitleaks-action` runs under a commercial EULA and needs a (free) license key for **organization** accounts. +[^th]: TruffleHog emits JSON / plain / GitHub-Actions output and has no native SARIF formatter. All three tools use Aho-Corasick pre-filtering, Shannon-entropy filtering, and support custom rules. + +## Detectors + +**63 built-in detectors** across these categories, plus your own [YAML custom rules](docs/guides/custom-rules.md): + +| Category | Examples | +|----------|----------| +| **Cloud** | AWS, GCP, Azure, Cloudflare, DigitalOcean, Heroku, Vercel | +| **AI / ML** | OpenAI, Anthropic, Hugging Face, DeepSeek | +| **Dev & CI/CD** | GitHub, GitLab, npm, PyPI, RubyGems, Docker Hub, CircleCI, Terraform Cloud | +| **Communication & Email** | Slack, Discord, Telegram, MS Teams, SendGrid, Mailgun, Postmark | +| **Payments** | Stripe, Coinbase | +| **Databases & Infra** | Postgres/MySQL/Mongo, Redis, Snowflake, RabbitMQ, Supabase, FTP, LDAP, Databricks | +| **Identity & Secrets** | JWT, private keys (RSA/SSH/PGP), Okta, Auth0, HashiCorp Vault, Doppler | +| **Monitoring & Security** | Datadog, Grafana, PagerDuty, New Relic, Sentry, Snyk, Twilio | +| **SaaS** | Shopify, Notion, Linear, Figma, Airtable | +| **Generic & Custom** | high-entropy generic keys · LaunchDarkly · SonarCloud · your YAML rules | + +
+Full detector catalog (63) with IDs, severity & verification + +| Category | Detector | ID | Severity | +|----------|----------|----|----------| +| Cloud — AWS | Access Key ID | `aws-access-key-id` | Critical | +| Cloud — GCP | Service Account Key | `gcp-service-account` | Critical | +| Cloud — Azure | Storage Connection String | `azure-storage-key` | Critical | +| Cloud — Azure | Entra ID Client Secret | `azure-entra-secret` | Critical | +| Cloud — Cloudflare | API Token | `cloudflare-api-token` | Critical | +| Cloud — DigitalOcean | Personal Access Token | `digitalocean-token` | Critical | +| Cloud — Heroku | API Key | `heroku-api-key` | Critical | +| Cloud — Vercel | API Token | `vercel-token` | High | +| AI/ML | OpenAI API Key | `openai-api-key` | Critical | +| AI/ML | Anthropic API Key | `anthropic-api-key` | Critical | +| AI/ML | Hugging Face Token | `huggingface-token` | Critical | +| AI/ML | DeepSeek API Key | `deepseek-api-key` | Critical | +| DevTools | GitHub PAT | `github-token` | Critical | +| DevTools | GitHub OAuth Token | `github-oauth-token` | Critical | +| DevTools | GitLab PAT | `gitlab-pat` | Critical | +| DevTools | Bitbucket App Password | `bitbucket-app-password` | Critical | +| DevTools | NPM Token | `npm-token` | High | +| DevTools | PyPI Token | `pypi-api-token` | High | +| DevTools | RubyGems Key | `rubygems-api-key` | High | +| DevTools | Docker Hub PAT | `dockerhub-pat` | Critical | +| CI/CD | CircleCI Token | `circleci-token` | High | +| CI/CD | Terraform Cloud Token | `terraform-cloud-token` | Critical | +| Communication | Slack Bot Token | `slack-token` | Critical | +| Communication | Slack Webhook | `slack-webhook` | High | +| Communication | Discord Bot Token | `discord-bot-token` | Critical | +| Communication | Telegram Bot Token | `telegram-bot-token` | High | +| Communication | MS Teams Webhook | `teams-webhook` | High | +| Email | SendGrid API Key | `sendgrid-api-key` | Critical | +| Email | Mailgun API Key | `mailgun-api-key` | Critical | +| Email | Postmark Server Token | `postmark-server-token` | High | +| Payment | Stripe Live Key | `stripe-api-key-live` | Critical | +| Payment | Stripe Test Key | `stripe-api-key-test` | High | +| Payment | Coinbase API Key | `coinbase-api-key` | Critical | +| Blockchain | Infura API Key | `infura-api-key` | High | +| Database | Connection String (PG/MySQL/MongoDB) | `database-connection-string` | Critical | +| Database | Redis Connection | `redis-connection-string` | Critical | +| Database | Snowflake Credentials | `snowflake-credentials` | Critical | +| Database | RabbitMQ Connection | `rabbitmq-connection-string` | Critical | +| Database | Supabase Service Key | `supabase-service-key` | Critical | +| Infrastructure | FTP/SFTP Credentials | `ftp-credentials` | Critical | +| Infrastructure | LDAP Credentials | `ldap-credentials` | Critical | +| Infrastructure | Databricks PAT | `databricks-token` | Critical | +| Identity | JWT | `jwt` | High | +| Identity | Private Key (RSA/SSH/PGP) | `private-key` | Critical | +| Identity | Okta API Token | `okta-api-token` | Critical | +| Identity | Auth0 Management Token | `auth0-management-token` | Critical | +| Identity | HashiCorp Vault Token | `hashicorp-vault-token` | Critical | +| Monitoring | Datadog API Key | `datadog-api-key` | Critical | +| Monitoring | Grafana API Key | `grafana-api-key` | High | +| Monitoring | PagerDuty API Key | `pagerduty-api-key` | High | +| Monitoring | New Relic API Key | `newrelic-api-key` | High | +| Monitoring | Sentry Auth Token | `sentry-token` | High | +| Security | Snyk API Key | `snyk-api-key` | High | +| Security | Twilio API Key | `twilio-api-key` | Critical | +| Secrets Mgmt | Doppler Service Token | `doppler-token` | Critical | +| Feature Flags | LaunchDarkly SDK Key | `launchdarkly-sdk-key` | High | +| Code Quality | SonarCloud Token | `sonarcloud-token` | High | +| SaaS | Shopify Access Token | `shopify-access-token` | Critical | +| SaaS | Notion Token | `notion-token` | High | +| SaaS | Linear API Key | `linear-api-key` | High | +| SaaS | Figma PAT | `figma-pat` | High | +| SaaS | Airtable PAT | `airtable-pat` | High | +| Generic | Generic API Key | `generic-api-key` | Medium | + +
+ +## Output formats + +`--format` selects the output; `--output`/`-o` writes to a file instead of stdout. + +| Format | Use it for | +|--------|-----------| +| `json` | Machine-readable findings (default) | +| `sarif` | GitHub Code Scanning / security tooling (v2.1.0) | +| `csv` | Spreadsheets (sanitized against formula injection) | +| `table` | Human-readable terminal output (severity-colored) | +| `github` | Inline pull request annotations in GitHub Actions | ## Configuration -Create a configuration file with recommended defaults using `leakwatch init`, or write one manually: +Generate a starter file with `leakwatch init`, or write `.leakwatch.yaml`: ```yaml -# .leakwatch.yaml scan: concurrency: 8 - max-file-size: 10485760 # 10MB - + max-file-size: 10485760 # 10 MB detection: - entropy: - enabled: true - threshold: 4.0 - + entropy: { enabled: true, threshold: 4.0 } verification: enabled: true timeout: 10s - filter: - exclude-paths: - - "vendor/**" - - "node_modules/**" - - "**/*.lock" - + exclude-paths: ["vendor/**", "node_modules/**", "**/*.lock"] output: format: json show-raw: false ``` ---- +Use `.leakwatchignore` and `# leakwatch:ignore` markers to suppress known false positives. Details: **[Configuration guide](docs/guides/configuration.md)**. + +## Security + +- Secret values are **redacted by default** (e.g. `AKIA****MPLE`) and are **never written to disk or logs**. The raw value is only emitted if you explicitly pass `--show-raw`. +- Verification uses **controlled, read-only** API calls to providers; it makes no state-changing requests. +- Found a vulnerability? Please report it privately via a [GitHub security advisory](https://github.com/HodeTech/Leakwatch/security/advisories/new). ## Architecture ```mermaid flowchart LR subgraph Sources["Sources (6)"] - S1["Git (go-git)"] - S2["Filesystem (io/fs)"] - S3["Container (crane)"] + S1["Git"] + S2["Filesystem"] + S3["Container"] S4["AWS S3"] S5["GCS"] S6["Slack"] end - subgraph Engine["Detection Engine"] - E1[Aho-Corasick] - E2[Regex] - E3[Entropy] + E1["Aho-Corasick prefilter"] + E2["Regex"] + E3["Shannon entropy"] end - - subgraph Verify["Verification (54 verifiers, 51 packages)"] - V1[Live API Verification] - V2[Format Validation] + subgraph Verify["Verification (54 verifiers)"] + V1["Live API"] + V2["Format validation"] end - - Sources -->|Chunks| Engine - Engine -->|Findings| Verify - Verify --> Output["JSON / SARIF / CSV"] + Sources -->|chunks| Engine + Engine -->|findings| Verify + Verify --> Output["JSON · SARIF · CSV · Table · GitHub"] ``` -Detailed architecture: [docs/architecture/03-ARCHITECTURE.md](docs/architecture/03-ARCHITECTURE.md) - ---- +Deep dive: [Architecture](docs/architecture/03-ARCHITECTURE.md) · [ADRs](docs/decisions/README.md) ## Documentation -### Architecture & Design - -| Document | Description | -|----------|-------------| -| [Competitive Analysis](docs/architecture/01-COMPETITIVE-ANALYSIS.md) | Market analysis and positioning | -| [Technology Decisions](docs/architecture/02-TECHNOLOGY-DECISIONS.md) | Technology choices and rationale | -| [Architecture Design](docs/architecture/03-ARCHITECTURE.md) | Detailed architecture and interfaces | - -### Standards - -| Document | Description | -|----------|-------------| -| [Documentation Standards](docs/standards/00-DOCUMENTATION-STANDARDS.md) | Diagrams, formatting, and document rules | -| [Code Review Standards](docs/standards/01-CODE-REVIEW-STANDARDS.md) | Review process, checklists, finding classification | -| [Release and Distribution Standards](docs/standards/02-RELEASE-STANDARDS.md) | Version management, CI/CD, release checklist | -| [Development Standards](docs/standards/04-DEVELOPMENT-STANDARDS.md) | Code standards, testing, and CI/CD | - -### Decisions (ADR) - -| Document | Description | -|----------|-------------| -| [ADR Index](docs/decisions/README.md) | All architecture decisions | -| [ADR-0001](docs/decisions/ADR-0001-programming-language.md) | Programming language: Go | -| [ADR-0005](docs/decisions/ADR-0005-pattern-matching.md) | Pattern matching: Aho-Corasick hybrid | -| [ADR-0007](docs/decisions/ADR-0007-license.md) | License: MIT | - -### Guides - -| Document | Description | -|----------|-------------| -| [Getting Started](docs/guides/getting-started.md) | Installation, first scan, understanding output | -| [Configuration](docs/guides/configuration.md) | .leakwatch.yaml, environment variables, ignore files | -| [CI/CD Integration](docs/guides/ci-cd-integration.md) | GitHub Actions, GitLab CI, Jenkins, pre-commit | -| [Custom Rules](docs/guides/custom-rules.md) | YAML rule definitions, regex, entropy, keyword | -| [Container Scanning](docs/guides/container-scanning.md) | Docker/OCI image scanning, registry authentication | -| [Cloud Scanning](docs/guides/cloud-scanning.md) | AWS S3, GCS, parallel repo scanning | -| [Git Scanning](docs/guides/git-scanning.md) | Full history scanning, diff-based, remote repos | -| [Slack Scanning](docs/guides/slack-scanning.md) | Slack workspace, channel/DM/file scanning | -| [Secret Verification](docs/guides/secret-verification.md) | Verification modes, rate limiting, --only-verified | -| [Docker Usage](docs/guides/docker-usage.md) | Running Leakwatch in Docker containers | -| [VS Code Extension](docs/guides/vscode-extension.md) | IDE integration, scan-on-save, diagnostics | - -### Planning - -| Document | Description | -|----------|-------------| -| [Roadmap](docs/05-ROADMAP.md) | Phased development plan | - ---- +Full bilingual (EN/TR) manuals are at **[hodetech.github.io/Leakwatch](https://hodetech.github.io/Leakwatch/)**. Quick links: + +[Getting Started](docs/guides/getting-started.md) · +[Configuration](docs/guides/configuration.md) · +[CI/CD](docs/guides/ci-cd-integration.md) · +[Custom Rules](docs/guides/custom-rules.md) · +[Container Scanning](docs/guides/container-scanning.md) · +[Cloud Scanning](docs/guides/cloud-scanning.md) · +[Git Scanning](docs/guides/git-scanning.md) · +[Slack Scanning](docs/guides/slack-scanning.md) · +[Verification](docs/guides/secret-verification.md) · +[Docker](docs/guides/docker-usage.md) · +[VS Code Extension](docs/guides/vscode-extension.md) · +[Roadmap](docs/05-ROADMAP.md) ## Contributing -We welcome your contributions! Please see the [CONTRIBUTING.md](CONTRIBUTING.md) file. +Contributions are welcome — see [CONTRIBUTING.md](CONTRIBUTING.md). ```bash -# Set up the development environment git clone https://github.com/HodeTech/Leakwatch.git -cd Leakwatch -go mod download -go test ./... +cd Leakwatch && go mod download && go test ./... ``` ---- - ## License -MIT License — see the [LICENSE](LICENSE) file for details. - ---- - -## Status - -> **Phases 1–8 are complete.** Leakwatch supports filesystem, Git, container, S3, GCS, and Slack scanning (6 sources) with 63 detectors across 60 packages, 54 verifiers (85.7% coverage), multiple output formats, and CI/CD integration. - -To track the project's progress, see the [Roadmap](docs/05-ROADMAP.md) document. +[MIT](LICENSE) © HodeTech — Leakwatch is maintained by [HodeTech](https://github.com/HodeTech). diff --git a/cmd/stats_test.go b/cmd/stats_test.go new file mode 100644 index 0000000..993bb2b --- /dev/null +++ b/cmd/stats_test.go @@ -0,0 +1,36 @@ +package cmd + +import ( + "testing" + + "github.com/HodeTech/leakwatch/internal/detector" + "github.com/HodeTech/leakwatch/internal/meta" + "github.com/HodeTech/leakwatch/internal/verifier" + "github.com/stretchr/testify/assert" +) + +// detectorsAtInit and verifiersAtInit snapshot the registries right after every +// package blank-imported by imports.go has run its init(), before any test can +// mutate the global registries. Capturing here makes the guard below +// independent of test ordering. +var ( + detectorsAtInit []detector.Detector + verifiersAtInit []verifier.Verifier +) + +func init() { + detectorsAtInit = detector.All() + verifiersAtInit = verifier.All() +} + +// TestMetaCounts_MatchRuntime guards the published counts in internal/meta +// against what the binary actually registers. Every detector and verifier +// package is blank-imported by imports.go in this package, so both registries +// are fully populated here (the detector-only test in internal/detector cannot +// see verifiers, hence the cross-check lives here). +func TestMetaCounts_MatchRuntime(t *testing.T) { + assert.Len(t, detectorsAtInit, meta.Detectors, + "meta.Detectors drifted from detector.All(); update internal/meta then run `go generate ./...`") + assert.Len(t, verifiersAtInit, meta.Verifiers, + "meta.Verifiers drifted from verifier.All(); update internal/meta then run `go generate ./...`") +} diff --git a/docs/05-ROADMAP.md b/docs/05-ROADMAP.md index 8834f5a..d90a2a6 100644 --- a/docs/05-ROADMAP.md +++ b/docs/05-ROADMAP.md @@ -1,9 +1,9 @@ # Leakwatch - Phased Development Roadmap -> **Document Version:** 7.0 +> **Document Version:** 7.1 > **Date:** 2026-04-09 > **Status:** Approved -> **Last Updated:** 2026-05-24 +> **Last Updated:** 2026-05-25 --- @@ -23,15 +23,25 @@ | Phase 8.2 — CLI UX Improvements | Completed | `v1.3.2` | 2026-03-25 | | Phase 8.3 — Scan Summary + Security | Completed | `v1.4.0` | 2026-04-08 | | Phase 8.4 — False Positive Reduction | Completed | `v1.5.0` | 2026-04-09 | -| Phase 9 — Detection Accuracy & FP Reduction | Planned | `v1.6.0` | — | -| Phase 10 — Detector Library Expansion | Planned | `v1.7.0` | — | -| Phase 11 — Verification Depth & Credential Impact | Planned | `v1.8.0` | — | -| Phase 12 — Source Expansion (Confluence/Jira, org-scale) | Planned | `v1.9.0` | — | -| Phase 13 — Secrets Inventory | Planned | `v1.10.0` | — | -| Phase 14 — Honeytokens | Planned | `v1.11.0` | — | +| Phase 8.5 — GitHub Marketplace Action & Distribution | Completed | `v1.6.0` | 2026-05-25 | +| Phase 9 — Detection Accuracy & FP Reduction | Planned | `v1.7.0` | — | +| Phase 10 — Detector Library Expansion | Planned | `v1.8.0` | — | +| Phase 11 — Verification Depth & Credential Impact | Planned | `v1.9.0` | — | +| Phase 12 — Source Expansion (Confluence/Jira, org-scale) | Planned | `v1.10.0` | — | +| Phase 13 — Secrets Inventory | Planned | `v1.11.0` | — | +| Phase 14 — Honeytokens | Planned | `v1.12.0` | — | > **Prioritization note (v7.0):** the planned sequence is re-ordered so the work that most strengthens the core promise — accurate, verified, low-noise findings — comes first. Detection accuracy and false-positive reduction (Phase 9), broader coverage of high-blast-radius credential types (Phase 10), and deeper verification with credential-impact insight (Phase 11) precede new scan sources (Phase 12) and the inventory/honeytoken platform features (Phases 13–14). Rationale is detailed in [Planned Work — Prioritization](#planned-work--prioritization). +### v1.6.0 Highlights + +- **GitHub Marketplace Action** — `uses: HodeTech/Leakwatch@v1`. Composite action that installs a prebuilt, checksum-verified binary (no Go toolchain), runs a scan, maps exit codes, writes a job summary, supports PR-diff scanning (`scan-diff`), and can upload SARIF to Code Scanning. Linux & macOS runners. +- **New `github` output format** — `--format github` emits workflow commands so findings appear as inline annotations on pull requests +- **Config keys now take effect** — `custom-rules`, `verification.*`, `filter.exclude-detectors`, and `output.severity-threshold` from `.leakwatch.yaml` are wired into the scan (previously documented but no-ops); `scan repos` honors all scan config too +- **Accurate locations & inline ignore** — findings report real line numbers; `# leakwatch:ignore[:]` markers are honored; SARIF results carry location-stable `partialFingerprints` +- **Distribution** — multi-arch GHCR image (public), Homebrew tap (`HodeTech/tap/leakwatch`), and cross-platform release archives with checksums +- **Security hardening** — credentials redacted in Git URLs and verifier transport errors; the composite action isolates inputs via env (no shell injection) and honors the leakwatch exit code + ### v1.5.0 Highlights - **False positive reduction** — improved filtering for lock files (`package-lock.json`, `yarn.lock`, etc.), test fixtures, and placeholder patterns @@ -84,7 +94,7 @@ ## Roadmap Overview -Leakwatch development proceeds in incremental phases, each building on the previous one and each producing a usable deliverable. Phases 1–8 (through `v1.5.0`) are complete; Phases 9–14 are the planned forward path, sequenced by leverage on the product's core promise — see [Planned Work — Prioritization](#planned-work--prioritization). +Leakwatch development proceeds in incremental phases, each building on the previous one and each producing a usable deliverable. Phases 1–8 (through `v1.6.0`) are complete; Phases 9–14 are the planned forward path, sequenced by leverage on the product's core promise — see [Planned Work — Prioritization](#planned-work--prioritization). ```mermaid gantt @@ -119,12 +129,13 @@ gantt GitHub Action & Docker :done, f5b, after f5a, 2w v1.0.0 Release :milestone, after f5b, 0d - section Completed v1.1-v1.5 + section Completed v1.1-v1.6 Remediation, Slack, Verifiers :done, f6, after f5b, 6w UX, Security, FP reduction :done, f8, after f6, 6w + Marketplace Action & distrib. :done, f85, after f8, 3w - section Planned v1.6.0+ - Detection accuracy & FP :p9, after f8, 5w + section Planned v1.7.0+ + Detection accuracy & FP :p9, after f85, 5w Detector library expansion :p10, after p9, 6w Verification depth & impact :p11, after p10, 6w Source expansion :p12, after p11, 6w @@ -411,7 +422,7 @@ The product's core promise is **accurate, verified, low-noise secret findings**. **Goal:** Make accuracy a measurable strength. Raise detector precision and recall, cut false positives across the board, and ensure every documented detection/verification behavior actually fires. This phase improves the quality of every existing scan without adding new surfaces. -**Duration:** 4-5 weeks | **Version:** `v1.6.0` | **Status:** Planned +**Duration:** 4-5 weeks | **Version:** `v1.7.0` | **Status:** Planned ### Deliverables @@ -440,7 +451,7 @@ The product's core promise is **accurate, verified, low-noise secret findings**. ### Exit Criteria -GitHub Release published with `v1.6.0` tag. +GitHub Release published with `v1.7.0` tag. --- @@ -448,7 +459,7 @@ GitHub Release published with `v1.6.0` tag. **Goal:** Grow coverage of frequently-leaked, high-blast-radius credential types, prioritizing secrets whose exposure causes the most damage. Every new detector with a public verification endpoint ships with its verifier. -**Duration:** 5-6 weeks | **Version:** `v1.7.0` | **Status:** Planned +**Duration:** 5-6 weeks | **Version:** `v1.8.0` | **Status:** Planned ### Deliverables @@ -471,7 +482,7 @@ GitHub Release published with `v1.6.0` tag. ### Exit Criteria -GitHub Release published with `v1.7.0` tag. +GitHub Release published with `v1.8.0` tag. --- @@ -479,7 +490,7 @@ GitHub Release published with `v1.7.0` tag. **Goal:** Deepen the verification differentiator. Harden the verification engine, verify more credential classes, and — for live secrets — tell users what the credential can actually reach so they can triage blast radius. -**Duration:** 5-6 weeks | **Version:** `v1.8.0` | **Status:** Planned +**Duration:** 5-6 weeks | **Version:** `v1.9.0` | **Status:** Planned ### Deliverables @@ -501,7 +512,7 @@ GitHub Release published with `v1.7.0` tag. ### Exit Criteria -GitHub Release published with `v1.8.0` tag. +GitHub Release published with `v1.9.0` tag. --- @@ -509,7 +520,7 @@ GitHub Release published with `v1.8.0` tag. **Goal:** Reach secrets wherever they live — collaboration platforms and org-scale code hosting — now that the detection/verification core is strong. -**Duration:** 5-6 weeks | **Version:** `v1.9.0` | **Status:** Planned +**Duration:** 5-6 weeks | **Version:** `v1.10.0` | **Status:** Planned ### Deliverables @@ -538,7 +549,7 @@ GitHub Release published with `v1.8.0` tag. ### Exit Criteria -GitHub Release published with `v1.9.0` tag. +GitHub Release published with `v1.10.0` tag. --- @@ -546,7 +557,7 @@ GitHub Release published with `v1.9.0` tag. **Goal:** Persistent SQLite-based inventory tracking secrets across scans. -**Duration:** 4-5 weeks | **Version:** `v1.10.0` | **Status:** Planned +**Duration:** 4-5 weeks | **Version:** `v1.11.0` | **Status:** Planned ### Deliverables @@ -573,7 +584,7 @@ GitHub Release published with `v1.9.0` tag. ### Exit Criteria -GitHub Release published with `v1.10.0` tag. +GitHub Release published with `v1.11.0` tag. --- @@ -581,7 +592,7 @@ GitHub Release published with `v1.10.0` tag. **Goal:** Generate and deploy decoy credentials that alert on unauthorized use. -**Duration:** 3-4 weeks | **Version:** `v1.11.0` | **Status:** Planned +**Duration:** 3-4 weeks | **Version:** `v1.12.0` | **Status:** Planned ### Deliverables @@ -607,7 +618,7 @@ GitHub Release published with `v1.10.0` tag. ### Exit Criteria -GitHub Release published with `v1.11.0` tag. +GitHub Release published with `v1.12.0` tag. --- @@ -753,12 +764,13 @@ Source packages (no formal standard, but visible gaps): | `v1.3.2` | Phase 8.2 | CLI UX improvements | 2026-03-25 | | `v1.4.0` | Phase 8.3 | Scan summary, `init` command, colored table, security patches | 2026-04-08 | | `v1.5.0` | Phase 8.4 | False positive reduction, ADO.NET support | 2026-04-09 | -| `v1.6.0` | Phase 9 | Detection accuracy & false-positive reduction | — | -| `v1.7.0` | Phase 10 | Detector library expansion | — | -| `v1.8.0` | Phase 11 | Verification depth & credential impact | — | -| `v1.9.0` | Phase 12 | Source expansion (Confluence/Jira, org-scale) | — | -| `v1.10.0` | Phase 13 | Secrets inventory (SQLite) | — | -| `v1.11.0` | Phase 14 | Honeytokens | — | +| `v1.6.0` | Phase 8.5 | GitHub Marketplace Action, `github` output format, config wiring | 2026-05-25 | +| `v1.7.0` | Phase 9 | Detection accuracy & false-positive reduction | — | +| `v1.8.0` | Phase 10 | Detector library expansion | — | +| `v1.9.0` | Phase 11 | Verification depth & credential impact | — | +| `v1.10.0` | Phase 12 | Source expansion (Confluence/Jira, org-scale) | — | +| `v1.11.0` | Phase 13 | Secrets inventory (SQLite) | — | +| `v1.12.0` | Phase 14 | Honeytokens | — | | `v2.x.x` | Future | ML detection, SaaS platform, Vault | Ongoing | > **Note on v1.1.0 / v1.2.0:** Phase 6 (Remediation Guidance) and Phase 7 (Slack Scanning) were completed and merged into `main`, but no `v1.1.0` or `v1.2.0` git tags were ever created. The features shipped as part of the `v1.3.0` release. The version slots are preserved here to keep the phase-to-version mapping consistent. diff --git a/docs/assets/banner.html b/docs/assets/banner.html new file mode 100644 index 0000000..da32aec --- /dev/null +++ b/docs/assets/banner.html @@ -0,0 +1,89 @@ + + + + + + + + + + + diff --git a/docs/assets/banner.png b/docs/assets/banner.png new file mode 100644 index 0000000..60742ef Binary files /dev/null and b/docs/assets/banner.png differ diff --git a/internal/detector/registry_count_test.go b/internal/detector/registry_count_test.go index 2d7a839..ab9eb5e 100644 --- a/internal/detector/registry_count_test.go +++ b/internal/detector/registry_count_test.go @@ -14,13 +14,15 @@ package detector_test // rules at runtime (detector.RegisterIfAbsent) and is therefore not part of // the compile-time count. // -// If you add or remove a detector, update wantDetectorCount below and keep the -// blank-import block in sync with cmd/imports.go. +// If you add or remove a detector, update internal/meta.Detectors (the single +// source of truth for the published count) and keep the blank-import block in +// sync with cmd/imports.go. import ( "testing" "github.com/HodeTech/leakwatch/internal/detector" + "github.com/HodeTech/leakwatch/internal/meta" "github.com/stretchr/testify/assert" // Each blank import runs the package's init(), registering its detector(s) @@ -88,9 +90,6 @@ import ( _ "github.com/HodeTech/leakwatch/internal/detector/vercel" // register vercel detector ) -// wantDetectorCount is the expected number of compile-time registered detectors. -const wantDetectorCount = 63 - // registeredAtInit snapshots the registry right after every blank-imported // detector package has run its init(), but before any test can mutate the // global registry (the in-package registry_test.go calls detector.Reset()). @@ -102,8 +101,8 @@ func init() { } func TestAll_RegisteredDetectorCount_MatchesGolden(t *testing.T) { - assert.Len(t, registeredAtInit, wantDetectorCount, - "compile-time registered detector count drifted; update wantDetectorCount and cmd/imports.go together") + assert.Len(t, registeredAtInit, meta.Detectors, + "compile-time registered detector count drifted; update internal/meta.Detectors and cmd/imports.go together") // Every registered detector must have a unique, non-empty ID. ids := make(map[string]bool, len(registeredAtInit)) diff --git a/internal/meta/counts.go b/internal/meta/counts.go new file mode 100644 index 0000000..1475b8f --- /dev/null +++ b/internal/meta/counts.go @@ -0,0 +1,38 @@ +// Package meta holds the canonical, human-maintained project counts that are +// published in the README banner, the social-preview image, and the docs. +// +// These constants are the single source of truth for the published numbers: +// +// - Detectors and Verifiers are guarded at test time against the live +// registries (detector.All() / verifier.All()), so adding or removing one +// without updating the constant fails CI. See +// internal/detector/registry_count_test.go and cmd/stats_test.go. +// - Sources and OutputFormats change rarely and are golden values. They are +// not derived at runtime on purpose: the scan command also exposes a +// "repos" subcommand that is not a distinct source, and selectFormatter +// accepts fallback aliases, so neither maps cleanly to a count. +// +// When any of these change, run `go generate ./...` to refresh the generated +// stat blocks in docs/assets/banner.html and site/assets/og.svg, then +// re-render their PNGs (the re-render command is in each asset's header). +package meta + +//go:generate go run ./statsgen + +const ( + // Detectors is the number of compile-time registered secret detectors; + // it must equal len(detector.All()). + Detectors = 63 + + // Verifiers is the number of registered verifiers; it must equal + // len(verifier.All()). + Verifiers = 54 + + // Sources is the number of scan sources: filesystem, git, container image, + // S3, GCS, and Slack. + Sources = 6 + + // OutputFormats is the number of output formats: json, sarif, csv, table, + // and github. + OutputFormats = 5 +) diff --git a/internal/meta/statsgen/main.go b/internal/meta/statsgen/main.go new file mode 100644 index 0000000..9337e69 --- /dev/null +++ b/internal/meta/statsgen/main.go @@ -0,0 +1,129 @@ +// Command statsgen rewrites the project's marketing-asset stat blocks from the +// canonical counts in internal/meta. It is wired to `go generate ./...` via the +// directive in internal/meta/counts.go. +// +// It only edits text inside a "stats:begin" / "stats:end" marker pair, so the +// surrounding markup and any context-specific numbers elsewhere in the file +// (verification tiers, historical highlights, coverage progressions) are never +// touched. With -check it verifies the files are up to date instead of writing, +// exiting non-zero on drift; this mode backs the guard test and CI. +package main + +import ( + "flag" + "fmt" + "os" + "path/filepath" + "regexp" + "strings" + + "github.com/HodeTech/leakwatch/internal/meta" +) + +const ( + beginMarker = "stats:begin" + endMarker = "stats:end" +) + +// managedFiles are rewritten relative to the repository root. +var managedFiles = []string{ + "docs/assets/banner.html", + "site/assets/og.svg", +} + +// replacement pairs a noun-anchored pattern with its canonical replacement. +// Anchoring on the trailing noun keeps unrelated numbers (and the marker text +// itself) untouched. +type replacement struct { + re *regexp.Regexp + with string +} + +func replacements() []replacement { + return []replacement{ + {regexp.MustCompile(`\d+ detectors`), fmt.Sprintf("%d detectors", meta.Detectors)}, + {regexp.MustCompile(`\d+ live verifiers`), fmt.Sprintf("%d live verifiers", meta.Verifiers)}, + {regexp.MustCompile(`\d+ sources`), fmt.Sprintf("%d sources", meta.Sources)}, + {regexp.MustCompile(`\d+ output formats`), fmt.Sprintf("%d output formats", meta.OutputFormats)}, + } +} + +func main() { + check := flag.Bool("check", false, "verify files are up to date instead of writing") + flag.Parse() + + root, err := repoRoot() + if err != nil { + fail(err) + } + + var stale []string + for _, rel := range managedFiles { + path := filepath.Join(root, rel) + orig, err := os.ReadFile(path) + if err != nil { + fail(fmt.Errorf("read %s: %w", rel, err)) + } + updated, err := rewrite(string(orig)) + if err != nil { + fail(fmt.Errorf("%s: %w", rel, err)) + } + if updated == string(orig) { + continue + } + if *check { + stale = append(stale, rel) + continue + } + if err := os.WriteFile(path, []byte(updated), 0o644); err != nil { + fail(fmt.Errorf("write %s: %w", rel, err)) + } + fmt.Printf("updated %s\n", rel) + } + + if len(stale) > 0 { + fail(fmt.Errorf("stat blocks out of date in: %s\nrun `go generate ./...` and re-render the PNGs", + strings.Join(stale, ", "))) + } +} + +// rewrite applies the canonical counts inside the stats marker region and +// returns the updated content. It errors when the markers are missing so an +// unmarked (silently unmanaged) asset is caught rather than ignored. +func rewrite(content string) (string, error) { + begin := strings.Index(content, beginMarker) + end := strings.Index(content, endMarker) + if begin == -1 || end == -1 || end < begin { + return "", fmt.Errorf("missing %q/%q markers", beginMarker, endMarker) + } + region := content[begin:end] + for _, r := range replacements() { + region = r.re.ReplaceAllString(region, r.with) + } + return content[:begin] + region + content[end:], nil +} + +// repoRoot walks up from the working directory to the module root (the first +// directory containing go.mod), so the command works both under `go generate` +// (run from internal/meta) and under `go test` (run from the package dir). +func repoRoot() (string, error) { + dir, err := os.Getwd() + if err != nil { + return "", err + } + for { + if _, err := os.Stat(filepath.Join(dir, "go.mod")); err == nil { + return dir, nil + } + parent := filepath.Dir(dir) + if parent == dir { + return "", fmt.Errorf("go.mod not found searching upward from %s", dir) + } + dir = parent + } +} + +func fail(err error) { + fmt.Fprintln(os.Stderr, "statsgen:", err) + os.Exit(1) +} diff --git a/internal/meta/statsgen/main_test.go b/internal/meta/statsgen/main_test.go new file mode 100644 index 0000000..9e3129e --- /dev/null +++ b/internal/meta/statsgen/main_test.go @@ -0,0 +1,31 @@ +package main + +import ( + "os" + "path/filepath" + "testing" +) + +// TestManagedAssetsUpToDate fails when a marketing asset's stat block no longer +// matches internal/meta — i.e. a count was bumped but `go generate ./...` was +// not run (or the PNG was re-rendered from a stale source). It runs as part of +// `go test ./...`, so CI catches the drift without a dedicated workflow step. +func TestManagedAssetsUpToDate(t *testing.T) { + root, err := repoRoot() + if err != nil { + t.Fatalf("locate repo root: %v", err) + } + for _, rel := range managedFiles { + orig, err := os.ReadFile(filepath.Join(root, rel)) + if err != nil { + t.Fatalf("read %s: %v", rel, err) + } + updated, err := rewrite(string(orig)) + if err != nil { + t.Fatalf("%s: %v", rel, err) + } + if updated != string(orig) { + t.Errorf("%s stat block is stale; run `go generate ./...` and re-render its PNG", rel) + } + } +} diff --git a/site/assets/og.svg b/site/assets/og.svg index b930efd..42b55a2 100644 --- a/site/assets/og.svg +++ b/site/assets/og.svg @@ -19,11 +19,13 @@ Some secrets shouldn't be █████████. Leakwatch finds the ones that are — detect · verify · report. + 63 detectors 54 live verifiers 6 sources SARIF · JSON · CSV + github.com/HodeTech/Leakwatch · MIT