Skip to content

feat(detector): detect GitHub stateless (JWT-format) ghs_ installation tokens#15

Open
cemililik wants to merge 3 commits into
mainfrom
feat/github-stateless-ghs-token
Open

feat(detector): detect GitHub stateless (JWT-format) ghs_ installation tokens#15
cemililik wants to merge 3 commits into
mainfrom
feat/github-stateless-ghs-token

Conversation

@cemililik
Copy link
Copy Markdown
Collaborator

@cemililik cemililik commented May 25, 2026

Summary

Detects GitHub's new stateless (JWT-format) ghs_ installation tokens (rolled out from April 2026 to the Actions GITHUB_TOKEN and App installation tokens). The new format is ghs_APPID_<jwt> — a ghs_-prefixed JWT of ~520 chars containing exactly two dots; segments are base64url (A-Za-z0-9, _, -).

The old pattern gh[orus]_[A-Za-z0-9_]{36,} had no . in its body class, so a new token was either truncated at the first dot (its JWT body falling to the generic jwt detector → one secret, two wrong findings) or missed entirely when a base64url - appeared before the body reached 36 chars (only flagged as a generic JWT). Both failure modes were reproduced before the change and proven fixed after.

What changed

  • github-oauth-token detector (github_oauth.go) — ordered alternation: a stateless branch ghs_[A-Za-z0-9_-]{8,}(?:\.[A-Za-z0-9_-]{8,}){2} (listed first so it wins over the opaque branch at a ghs_ start) plus the unchanged opaque branch gh[orus]_[A-Za-z0-9_]{36,}. The whole token is captured as a single finding. gho_/ghu_/ghr_/legacy ghs_ are unchanged and never start eating dots; ghp_ stays with the separate github-token detector. Kept as one raw-string literal so the tools/site-build AST extractor can still read it.
  • jwt detector (jwt.go) — suppresses a JWT that is the body of a ghs_ token (walks back over the contiguous token run and checks it contains ghs_; RE2 has no lookbehind), so the secret is reported exactly once by the more specific detector. Sound because the GitHub stateless floors ({8,}) are ≤ the JWT floors ({10,}), so whenever this fires the GitHub detector also matches the whole token — suppression can only drop a duplicate, never a secret.
  • Docs — README detector table, en/tr detector catalogs, CHANGELOG; site bundle regenerated (site/js/manuals/{en,tr}.js, site/js/detectors.js).

Design decisions

  • ghu_ left opaque on purpose — GitHub has signalled a later ghu_ format change but has not published it; speculating risks false matches. The code documents exactly where to extend (gh[su]_…).
  • Verification unaffected & not misleading — a bogus token → /user 401 → inactive; a live installation token → 403 → verify-error (not a false active/revoked). Sending the whole token is strictly more correct than the old truncated fragment.
  • Redaction still reveals only the trailing four characters, safe for a ~520-char token.

Review follow-ups (in this PR)

After a multi-angle review, three findings were actioned (the rest accepted with rationale: inherent benign over-capture, a rare FP class already tighter than GitHub's own regex, and the documented ghu_ extension point):

Verification

gofumpt -l . empty · go vet ./... · go build ./... · go test -race ./... (no failures/races) · golangci-lint run ./... --config .golangci.yml0 issues · github(det)/jwt(det)/github(verifier) 100% coverage · tools/site-build regeneration leaves the tree clean.

Notes

🤖 Generated with Claude Code

Summary by CodeRabbit

Release Notes

  • New Features

    • Added detection support for GitHub stateless installation tokens (JWT-formatted ghs_ tokens).
  • Bug Fixes

    • Eliminated duplicate findings when both GitHub OAuth and JWT detectors matched the same stateless token.
  • Documentation

    • Updated detector catalog and README to reflect expanded GitHub token type coverage.

Review Change Stack

cemililik and others added 3 commits May 25, 2026 13:59
… tokens

From April 2026 GitHub issues installation tokens (including the Actions
GITHUB_TOKEN) in a new ghs_APPID_<jwt> format: a ghs_-prefixed JWT of ~520
chars containing exactly two dots. The previous github-oauth-token pattern
`gh[orus]_[A-Za-z0-9_]{36,}` had no dot in its body class, so it truncated
such a token at the first dot — and missed it entirely when a base64url '-'
appeared before the body reached 36 chars — while the JWT body fell through
to the generic jwt detector. One secret was reported as two wrong findings
(or one wrong one).

Changes:
- github-oauth-token now matches an ordered alternation: a stateless branch
  `ghs_[A-Za-z0-9_-]{8,}(?:\.[A-Za-z0-9_-]{8,}){2}` (listed first so it wins
  over the opaque branch) plus the unchanged opaque branch
  `gh[orus]_[A-Za-z0-9_]{36,}`. The whole token is captured as one finding.
  Opaque gho_/ghu_/ghr_/legacy ghs_ are unchanged and never start eating
  dots; ghp_ stays with the separate github-token detector.
- jwt detector suppresses a JWT that is the body of a ghs_ token (walks back
  over the preceding token run and checks for a "ghs_" prefix; RE2 has no
  lookbehind), so the secret is reported exactly once by the more specific
  detector.

Decisions:
- ghu_ (user-to-server) is intentionally left opaque: GitHub has signalled a
  later format change but has not published it; speculating risks false
  matches. A note in the code says where to extend when documented.
- Verification is unaffected: a bogus token -> 401 -> inactive; a live
  installation token -> 403 -> verify-error (not a false active/revoked).
  Redaction still reveals only the trailing four characters.

Tests are table-driven and cover full capture of both failure modes
(long-header truncation, dash-early miss), opaque regressions, no
over-capture into trailing context, the exactly-one-detector invariant, and
jwt suppression. github and jwt detector packages remain at 100% coverage.

Docs: README detector table, en/tr detector catalogs, and CHANGELOG updated;
site bundle regenerated (site/js/manuals/{en,tr}.js, site/js/detectors.js).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
isGitHubStatelessBody required the contiguous token run before a JWT to BEGIN
with "ghs_". When a base64url char is glued directly in front (e.g.
"xghs_APPID_eyJ...eyJ...sig" with no delimiter), the run was "xghs_APPID_" so
the JWT was not recognised as a ghs_ body and was reported again — while the
unanchored github-oauth-token pattern still matched "ghs_..." mid-string,
double-reporting the same secret.

Use bytes.Contains instead of HasPrefix. Wherever "ghs_" appears in the run the
run has no dots, so it is glued onto the JWT and forms a "ghs_...eyJ.eyJ.sig"
shape the github detector captures in full (its segment floors {8,} are <= this
detector's {10,}); suppressing can therefore only drop a duplicate, never a
secret. Realistic delimiters (=, ", space, newline, :, /) are not token bytes,
so this only tightens a contrived edge.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
GH-2: tools/site-build extracts each detector's regex from the AST and only
emits a detector with a single regexp.MustCompile(`literal`); a concatenated /
const / fmt.Sprintf pattern silently vanishes from the web playground while the
registry count test still passes. Add TestDetectorsJS_CoversEveryRegisteredDetector
(in the golden-count test that already blank-imports every detector) to pin the
bundle to detector.All() minus the documented skips (generic). Verified it fails
with an actionable message when an entry is dropped. This converts the previous
"keep it one raw-string literal" code comment into an enforced invariant.

GH-5: add an explicit GitHub /user 403 verifier case. A live stateless
installation token authenticates as an app installation, so /user answers 403,
which is neither active (200) nor inactive (401) and maps to verify-error —
documenting that it is never mislabelled active or "invalid or revoked".

Both are test-only; no production behavior changes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @cemililik, your pull request is larger than the review limit of 500000 diff characters

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 25, 2026

📝 Walkthrough

Walkthrough

Extended GitHub token detection to support stateless ghs_ installation tokens in JWT format. Updated OAuth detector regex, suppressed duplicate JWT detector matches for embedded bodies, synchronized web playground detector, added registry validation, and updated all documentation.

Changes

GitHub Stateless Token Detection

Layer / File(s) Summary
GitHub OAuth Detector: Stateless Token Pattern & Tests
internal/detector/github/github_oauth.go, internal/detector/github/github_oauth_test.go
Extended oauthTokenPattern regex to match stateless JWT-shaped ghs_ tokens and legacy opaque formats (gho_, ghu_, ghr_, ghs_). Updated detector description. Added tests for full-token capture, redaction, boundary handling (no over-capture across dots), legacy backward compatibility, and regression check ensuring no detector overlap on stateless tokens.
JWT Detector: Suppress Embedded Stateless Token Bodies
internal/detector/jwt/jwt.go, internal/detector/jwt/jwt_test.go
Reworked JWT scanner to use byte-range iteration and detect ghs_ prefix in preceding context, suppressing JWT matches that form the body of stateless tokens. Added helper functions isGitHubStatelessBody and isTokenByte. Prevents duplicate findings while preserving standalone JWT detection with comprehensive test coverage across embedding scenarios.
Verifier: HTTP 403 Status Mapping for Stateless Tokens
internal/verifier/github/github_oauth_verifier_test.go
Added test documenting that stateless installation tokens receive HTTP 403 on GitHub /user endpoint and that Verify correctly returns StatusVerifyError with status code in message.
Web Playground Detector: Frontend Pattern Update
site/js/detectors.js
Updated JavaScript detector regex to include stateless ghs_ token format (dot-separated base64url segments) alongside opaque token patterns, keeping frontend detector synchronized with backend.
Detector Registry Cross-Check: Prevent Silent Detector Drops
internal/detector/registry_count_test.go
Added TestDetectorsJS_CoversEveryRegisteredDetector with helpers to validate that generated site/js/detectors.js bundle includes all registered detectors (except documented skips) and contains no stale IDs. Prevents silent detector removal due to build or bundling issues.
Documentation & Changelog Updates
CHANGELOG.md, README.md, docs/user-manuals/en/detectors/detector-catalog.md, docs/user-manuals/tr/detectors/detector-catalog.md
Updated all user-facing documentation to describe stateless ghs_ JWT-format token support, documenting the fix for duplicate JWT detection and the new registry validation test, and expanding detector catalog entries across English and Turkish user manuals.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 A rabbit hops through GitHub tokens with delight,
Stateless ghs_ twins now shine so bright,
No more double-reports from JWT's keen eye,
Registry guards ensure none slip by! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 60.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and specifically describes the main change: adding detection for GitHub stateless JWT-format ghs_ installation tokens, which is the central focus across all modified detector code and tests.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/github-stateless-ghs-token

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enhances the github-oauth-token detector to support GitHub's new stateless (JWT-format) ghs_ installation tokens. The changes include updating the regex pattern to capture these tokens in full and modifying the jwt detector to suppress duplicate findings when a JWT is identified as part of a GitHub stateless token. Additionally, the PR includes comprehensive tests for these changes, updates documentation, and introduces a new test to ensure the web playground bundle remains synchronized with the detector registry. I have no feedback to provide as there were no review comments to assess.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
internal/verifier/github/github_oauth_verifier_test.go (1)

100-124: ⚡ Quick win

Use a table-driven case for this new verifier scenario.

This adds another single-case function in a _test.go file; please express it as a table entry (or fold it into a shared status-mapping table) to stay compliant and keep scenarios easier to extend.

As per coding guidelines, "Use table-driven tests in test files".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/verifier/github/github_oauth_verifier_test.go` around lines 100 -
124, Convert the single-case test TestOAuthVerify_Forbidden_ReturnsVerifyError
into a table-driven test by creating a test table (slice of structs) with a
descriptive name field and entries for the forbidden scenario (and any existing
scenarios) and iterate with t.Run for each case; inside each case construct the
httptest server (as currently done), instantiate OAuthVerifier (apiURL,
httpClient), build the detector.RawFinding (using oauthDetectorID and the same
Raw/Redacted values), call v.Verify(ctx, raw) and assert expected result.Status
and result.Message for that case; update references to
TestOAuthVerify_Forbidden_ReturnsVerifyError to the new table-driven test
function name and ensure each case is self-contained so tests remain isolated.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@internal/detector/github/github_oauth_test.go`:
- Around line 100-225: The PR fails the repository-wide detector test-coverage
gate because several detector packages are below 95% (internal/detector, gcp,
generic, custom, privatekey, snowflake, stripe, testutil); add unit tests that
exercise each package's exported detector Scan implementations and helper
functions to raise coverage: create table-driven tests (mirroring patterns in
internal/detector/github_oauth_test.go) that call detectors'
Scan(context.Background(), []byte(...)) for expected matches and non-matches,
assert finding counts and fields (use symbols like OAuthDetector.Scan,
Token.Scan, and any package-specific detector types such as GCPDetector or
StripeDetector), and add tests for utility code in testutil to cover edge cases
and error paths; aim to hit the uncovered branches (negative cases, boundary
inputs, and redaction logic) so each listed package reaches ≥95% coverage and
the gate passes.

In `@internal/detector/jwt/jwt.go`:
- Around line 66-99: The detector packages overall miss the 95% coverage gate;
add focused unit tests to exercise uncovered branches in the detector packages
(e.g., custom, gcp, generic, heroku, privatekey, snowflake, stripe and testutil)
so overall detector coverage rises above 95%. Specifically, create table-driven
tests that exercise positive and negative detection paths and edge cases
(including token boundary and separator behavior similar to
jwt.isGitHubStatelessBody and jwt.isTokenByte), add mocks or sample inputs for
cloud/provider-specific detectors (GCP, Heroku, Stripe, Snowflake, private keys,
custom patterns), and include tests for testutil helpers so they are executed;
run `go test ./internal/detector/... -coverprofile` to verify coverage and
iterate until the detector package coverage meets the 95% threshold.

In `@internal/detector/registry_count_test.go`:
- Around line 199-202: The raw error returned from os.Getwd should be wrapped
with context before returning; update the failing branch that calls os.Getwd
(the block returning "", err) to return a wrapped error using
fmt.Errorf("getting working directory: %w", err) (and add a fmt import if
missing) so the function (the helper that calls os.Getwd in
registry_count_test.go) preserves call-site context per the repo error-wrapping
rule.

---

Nitpick comments:
In `@internal/verifier/github/github_oauth_verifier_test.go`:
- Around line 100-124: Convert the single-case test
TestOAuthVerify_Forbidden_ReturnsVerifyError into a table-driven test by
creating a test table (slice of structs) with a descriptive name field and
entries for the forbidden scenario (and any existing scenarios) and iterate with
t.Run for each case; inside each case construct the httptest server (as
currently done), instantiate OAuthVerifier (apiURL, httpClient), build the
detector.RawFinding (using oauthDetectorID and the same Raw/Redacted values),
call v.Verify(ctx, raw) and assert expected result.Status and result.Message for
that case; update references to TestOAuthVerify_Forbidden_ReturnsVerifyError to
the new table-driven test function name and ensure each case is self-contained
so tests remain isolated.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3c1fc6cb-7da4-4eac-b9ec-e52c8406ee15

📥 Commits

Reviewing files that changed from the base of the PR and between cbe8c4d and 8a4fbeb.

📒 Files selected for processing (13)
  • CHANGELOG.md
  • README.md
  • docs/user-manuals/en/detectors/detector-catalog.md
  • docs/user-manuals/tr/detectors/detector-catalog.md
  • internal/detector/github/github_oauth.go
  • internal/detector/github/github_oauth_test.go
  • internal/detector/jwt/jwt.go
  • internal/detector/jwt/jwt_test.go
  • internal/detector/registry_count_test.go
  • internal/verifier/github/github_oauth_verifier_test.go
  • site/js/detectors.js
  • site/js/manuals/en.js
  • site/js/manuals/tr.js

Comment on lines +100 to +225
// fakeStatelessToken builds an obviously-fake GitHub stateless installation
// token of the ghs_APPID_<jwt> form (header.payload.signature). It is assembled
// from parts at runtime so the source file never contains a contiguous,
// real-looking token literal that secret push-protection could flag.
func fakeStatelessToken(headerTail string) string {
const appID = "12345678"
header := "eyJ" + headerTail
payload := "eyJ" + strings.Repeat("Gh1Ij2Kl", 30)
signature := strings.Repeat("Mn3Op4Qr", 12)
return "ghs_" + appID + "_" + header + "." + payload + "." + signature
}

// TestOAuthDetector_Scan_StatelessToken_CapturesWholeToken proves the new
// ghs_APPID_<jwt> stateless installation tokens are captured in full by a single
// github-oauth-token finding (the pre-2026 behaviour truncated them at the first
// dot or missed them entirely when a base64url '-' appeared early).
func TestOAuthDetector_Scan_StatelessToken_CapturesWholeToken(t *testing.T) {
tests := []struct {
name string
token string
}{
{
name: "long alphanumeric header segment",
token: fakeStatelessToken(strings.Repeat("Ab9Cd0Ef", 5)),
},
{
name: "base64url dash early in header",
token: fakeStatelessToken("Ab-Cd0Ef9Gh"),
},
{
name: "base64url underscore in header",
token: fakeStatelessToken("Ab_Cd0Ef9Gh_Ij"),
},
{
name: "short app id",
token: "ghs_42_" + "eyJ" + strings.Repeat("Ab9Cd0Ef", 4) + "." + "eyJ" + strings.Repeat("Gh1Ij2Kl", 20) + "." + strings.Repeat("Mn3Op4Qr", 10),
},
}

d := &OAuthDetector{}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
findings := d.Scan(context.Background(), []byte(tt.token))
require.Len(t, findings, 1, "stateless token must yield exactly one finding")

f := findings[0]
assert.Equal(t, "github-oauth-token", f.DetectorID)
// The whole token is captured, not just the header segment.
assert.Equal(t, tt.token, string(f.Raw), "must capture the entire token")
assert.Greater(t, len(f.Raw), 100, "stateless tokens are long")

// Redaction stays safe for a long token: only the last four
// characters are ever revealed.
assert.Equal(t, "****"+tt.token[len(tt.token)-4:], f.Redacted)
assert.Len(t, f.Redacted, len("****")+4)
assert.NotContains(t, f.Redacted, tt.token[:len(tt.token)-4],
"redaction must not expose the token body")
})
}
}

// TestOAuthDetector_Scan_NoOverCapture guards the greedy branches against eating
// surrounding context: opaque tokens must not start consuming dots, and a
// stateless token must stop at its third (signature) segment.
func TestOAuthDetector_Scan_NoOverCapture(t *testing.T) {
suffix40 := strings.Repeat("Abc1D678", 5)
stateless := fakeStatelessToken(strings.Repeat("Ab9Cd0Ef", 5))

tests := []struct {
name string
input string
want string // expected single captured match
}{
{
name: "opaque gho_ followed by dotted domain",
input: "gho_" + suffix40 + ".example.com",
want: "gho_" + suffix40,
},
{
name: "stateless token at end of a sentence",
input: "leaked token: " + stateless + ". Please rotate it.",
want: stateless,
},
{
name: "stateless token followed by a fourth dotted segment",
input: stateless + "." + strings.Repeat("Qq11Ww22", 8),
want: stateless,
},
}

d := &OAuthDetector{}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
findings := d.Scan(context.Background(), []byte(tt.input))
require.Len(t, findings, 1)
assert.Equal(t, tt.want, string(findings[0].Raw))
})
}
}

// TestOAuthDetector_Scan_LegacyOpaqueUnchanged confirms the legacy opaque shapes
// (including legacy opaque ghs_) are still captured whole and unchanged.
func TestOAuthDetector_Scan_LegacyOpaqueUnchanged(t *testing.T) {
suffix40 := strings.Repeat("Abc1D678", 5)
for _, prefix := range []string{"gho_", "ghu_", "ghr_", "ghs_"} {
t.Run(prefix, func(t *testing.T) {
token := prefix + suffix40
findings := (&OAuthDetector{}).Scan(context.Background(), []byte(token))
require.Len(t, findings, 1)
assert.Equal(t, token, string(findings[0].Raw))
})
}
}

// TestGitHubDetectors_StatelessNoPrefixOverlap ensures a stateless ghs_ token is
// still claimed by exactly one of the two GitHub detectors (never the ghp_
// personal-access-token detector).
func TestGitHubDetectors_StatelessNoPrefixOverlap(t *testing.T) {
token := []byte(fakeStatelessToken(strings.Repeat("Ab9Cd0Ef", 5)))

tokenFindings := (&Token{}).Scan(context.Background(), token)
oauthFindings := (&OAuthDetector{}).Scan(context.Background(), token)

assert.Empty(t, tokenFindings, "ghp_ detector must not claim a ghs_ token")
require.Len(t, oauthFindings, 1, "oauth detector must claim the ghs_ token")
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check test coverage for the github detector package
cd internal/detector/github
go test -cover -coverprofile=coverage.out
go tool cover -func=coverage.out | grep total

Repository: HodeTech/Leakwatch

Length of output: 474


🏁 Script executed:

#!/bin/bash
set -euo pipefail

pkgs=$(go list ./internal/detector/...)
echo "Detector packages:"
echo "$pkgs"

fail=0
for pkg in $pkgs; do
  echo "== $pkg =="
  tmpfile="/tmp/cover_${pkg//\//_}.out"
  go test -cover -coverprofile="$tmpfile" "$pkg" >/tmp/cover_stdout.txt 2>/tmp/cover_stderr.txt || {
    echo "go test failed for $pkg"
    cat /tmp/cover_stderr.txt
    fail=1
    continue
  }
  total=$(go tool cover -func="$tmpfile" | awk '/total:/{print $3}')
  echo "total: $total"
  # total is like 100.0%
  pct=${total%%%}
  if awk "BEGIN{exit !($pct < 95)}"; then
    echo "Coverage below 95% for $pkg: $total"
    fail=1
  fi
done

if [ "$fail" -ne 0 ]; then
  echo "One or more detector packages are below 95% coverage."
  exit 1
fi

echo "All detector packages meet 95%+ coverage."

Repository: HodeTech/Leakwatch

Length of output: 9032


Test coverage gate for detector packages fails (95% not met overall)
internal/detector/github is at 100%, but multiple detector packages are below the 95% minimum (e.g., internal/detector 81.2%, gcp 88.7%, generic 83.3%, custom 92.9%, privatekey 93.8%, snowflake 92.3%, stripe **92.3%, and testutil` 0.0%), so the detector coverage requirement is not satisfied.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/detector/github/github_oauth_test.go` around lines 100 - 225, The PR
fails the repository-wide detector test-coverage gate because several detector
packages are below 95% (internal/detector, gcp, generic, custom, privatekey,
snowflake, stripe, testutil); add unit tests that exercise each package's
exported detector Scan implementations and helper functions to raise coverage:
create table-driven tests (mirroring patterns in
internal/detector/github_oauth_test.go) that call detectors'
Scan(context.Background(), []byte(...)) for expected matches and non-matches,
assert finding counts and fields (use symbols like OAuthDetector.Scan,
Token.Scan, and any package-specific detector types such as GCPDetector or
StripeDetector), and add tests for utility code in testutil to cover edge cases
and error paths; aim to hit the uncovered branches (negative cases, boundary
inputs, and redaction logic) so each listed package reaches ≥95% coverage and
the gate passes.

Comment on lines +66 to +99
// isGitHubStatelessBody reports whether the JWT beginning at start is the body
// of a GitHub stateless installation token (ghs_APPID_<jwt>). RE2 has no
// lookbehind, so it walks back over the contiguous token run (base64url plus the
// ghs_/app-ID separators) immediately preceding the match and checks whether
// that run contains the literal "ghs_".
//
// Contains rather than HasPrefix: the run may carry leading base64url bytes with
// no delimiter (e.g. "xghs_APPID_"). Wherever "ghs_" appears in the run, the run
// has no dots (dots are not token bytes) so it is glued straight onto this JWT,
// forming a "ghs_...eyJ.eyJ.sig" shape that the github-oauth-token detector
// captures in full — its per-segment floors ({8,}) are at or below this
// detector's ({10,}). Suppressing here therefore only removes a duplicate of a
// secret the github detector already reports; it can never drop one. (This
// assumes the github-oauth-token detector is active, which it is by default.)
func isGitHubStatelessBody(data []byte, start int) bool {
i := start
for i > 0 && isTokenByte(data[i-1]) {
i--
}
return bytes.Contains(data[i:start], ghsPrefix)
}

// isTokenByte reports whether b is part of a contiguous token run: a base64url
// character or one of the separators ('_', '-') that appear in a ghs_ token.
func isTokenByte(b byte) bool {
switch {
case b >= 'a' && b <= 'z', b >= 'A' && b <= 'Z', b >= '0' && b <= '9':
return true
case b == '_', b == '-':
return true
default:
return false
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check test coverage for the jwt detector package
cd internal/detector/jwt
go test -cover -coverprofile=coverage.out
go tool cover -func=coverage.out | grep total

Repository: HodeTech/Leakwatch

Length of output: 267


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# List all detector packages
packages=$(go list ./internal/detector/... | tr '\n' ' ')
echo "Detector packages:"
echo "$packages"

for pkg in $(go list ./internal/detector/...); do
  echo "== $pkg =="
  # Create a temp coverprofile per package
  tmpfile=$(mktemp)
  # Run tests with coverage
  go test -cover -coverprofile="$tmpfile" "$pkg" >/dev/null
  # Extract total statement coverage
  total=$(go tool cover -func="$tmpfile" | awk '/total:/ {print $3}')
  echo "statement_coverage: $total"
  rm -f "$tmpfile"
done

Repository: HodeTech/Leakwatch

Length of output: 9124


Coverage gate: detector packages are below the 95% minimum

  • github.com/HodeTech/leakwatch/internal/detector overall statement coverage is 81.2%
  • Multiple detector packages are under 95% (e.g., custom 92.9%, gcp 88.7%, generic 83.3%, heroku 92.9%, privatekey 93.8%, snowflake 92.3%, stripe **92.3%, testutil` 0.0%)
  • internal/detector/jwt itself is 100%, but the stated requirement is not satisfied across detector packages.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/detector/jwt/jwt.go` around lines 66 - 99, The detector packages
overall miss the 95% coverage gate; add focused unit tests to exercise uncovered
branches in the detector packages (e.g., custom, gcp, generic, heroku,
privatekey, snowflake, stripe and testutil) so overall detector coverage rises
above 95%. Specifically, create table-driven tests that exercise positive and
negative detection paths and edge cases (including token boundary and separator
behavior similar to jwt.isGitHubStatelessBody and jwt.isTokenByte), add mocks or
sample inputs for cloud/provider-specific detectors (GCP, Heroku, Stripe,
Snowflake, private keys, custom patterns), and include tests for testutil
helpers so they are executed; run `go test ./internal/detector/...
-coverprofile` to verify coverage and iterate until the detector package
coverage meets the 95% threshold.

Comment on lines +199 to +202
dir, err := os.Getwd()
if err != nil {
return "", err
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Wrap the returned os.Getwd error with context.

Line 201 returns a raw error, which drops call-site context and violates the repo’s error-wrapping rule.

Suggested patch
 func repoRoot() (string, error) {
 	dir, err := os.Getwd()
 	if err != nil {
-		return "", err
+		return "", fmt.Errorf("get working directory: %w", err)
 	}

As per coding guidelines, "Wrap every error with fmt.Errorf("context: %w", err) before returning".

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
dir, err := os.Getwd()
if err != nil {
return "", err
}
dir, err := os.Getwd()
if err != nil {
return "", fmt.Errorf("get working directory: %w", err)
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/detector/registry_count_test.go` around lines 199 - 202, The raw
error returned from os.Getwd should be wrapped with context before returning;
update the failing branch that calls os.Getwd (the block returning "", err) to
return a wrapped error using fmt.Errorf("getting working directory: %w", err)
(and add a fmt import if missing) so the function (the helper that calls
os.Getwd in registry_count_test.go) preserves call-site context per the repo
error-wrapping rule.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant