Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,10 +31,12 @@ and this project adheres to [Semantic Versioning](https://semver.org/).
- **`cmd/imports.go`** — each blank import now carries an inline `// register <plugin>` comment plus a file-level explanation of the ADR-0004 plugin-registration pattern (SonarCloud `godre:S8184`).

### Fixed
- **GitHub stateless (JWT-format) `ghs_` installation tokens are now detected in full** — from April 2026 GitHub began issuing installation tokens (including the Actions `GITHUB_TOKEN`) in a new `ghs_APPID_<jwt>` format: a `ghs_`-prefixed JWT of ~520 characters containing two dots. The previous `gh[orus]_[A-Za-z0-9_]{36,}` pattern had no `.` in its body class, so it truncated such a token at the first dot — and missed it entirely when a base64url `-` appeared before the body reached 36 characters — while the JWT body fell through to the generic `jwt` detector. One secret was reported as two wrong findings (or one wrong one). The `github-oauth-token` detector now captures the whole stateless token as a single `ghs_<base64url>.<base64url>.<base64url>` match while leaving the opaque `gho_`/`ghu_`/`ghr_`/legacy `ghs_` matches and `ghp_` (separate `github-token` detector) unchanged. The `jwt` detector now suppresses a JWT that is the body of a `ghs_` token so the secret is reported exactly once. Redaction still reveals only the trailing four characters; verification is unaffected (a live installation token is reported as a verify error rather than a false "active"/"revoked"). `ghu_` (user-to-server) is intentionally left opaque until GitHub publishes its new format. (60 detector packages / 63 detectors unchanged.)
- **`dbconn` placeholder case-sensitivity bug** — `Password=TODO` and `Password=FIXME` (uppercase) were previously **not** skipped as placeholders even though the placeholder list contained the entries. The lookup lowercased the password but compared against uppercase slice entries, so the two values silently fell through and were reported as findings. The placeholder slice is now lowercased so case-insensitive matching actually works as documented. **User-visible behavior change:** `Password=TODO` no longer produces a finding.

### Tests
- **`detector/dbconn` coverage** raised from 51.5% to 97.0% (CLAUDE.md standard is 95%). New table-driven tests cover ADO.NET parsing, the placeholder list (case-insensitive), `redactADONet`, and the `url.Parse` error path of `redactPassword`.
- **Generated `detectors.js` is guarded against silent detector drops** — a new test pins the web-playground bundle (`site/js/detectors.js`) to the live detector registry, so a detector whose regex is not a single AST-extractable ``regexp.MustCompile(`literal`)`` (e.g. concatenated fragments) fails CI instead of silently vanishing from the playground. Also adds an explicit GitHub `/user` 403 verifier case documenting that a live stateless installation token maps to verify-error, never a false active/revoked.

### Security
- **`action/action.yml` shell injection (SonarCloud `githubactions:S7630`, 8 BLOCKER findings)** — every `${{ inputs.* }}` interpolation moved into an `env:` block; bash args switched from a whitespace-joined string to an array. CWE-94 closed.
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -166,7 +166,7 @@ That's **54 of 63 detectors (85.7%)** with verification. Verification is on by d
| AI/ML | Hugging Face Token | `huggingface-token` | Critical |
| AI/ML | DeepSeek API Key | `deepseek-api-key` | Critical |
| DevTools | GitHub PAT | `github-token` | Critical |
| DevTools | GitHub OAuth Token | `github-oauth-token` | Critical |
| DevTools | GitHub OAuth & Installation Token (incl. stateless `ghs_`) | `github-oauth-token` | Critical |
| DevTools | GitLab PAT | `gitlab-pat` | Critical |
| DevTools | Bitbucket App Password | `bitbucket-app-password` | Critical |
| DevTools | NPM Token | `npm-token` | High |
Expand Down
2 changes: 1 addition & 1 deletion docs/user-manuals/en/detectors/detector-catalog.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ This page lists every built-in detector. For verification coverage details see [
| ID | Detects | Severity |
|----|---------|----------|
| `github-token` | GitHub Personal Access Token | Critical |
| `github-oauth-token` | GitHub OAuth2 Token | Critical |
| `github-oauth-token` | GitHub OAuth2 & installation token — `gho_`/`ghu_`/`ghr_`/`ghs_`, including new stateless (JWT-format) `ghs_` installation tokens | Critical |
| `gitlab-pat` | GitLab Personal Access Token | Critical |
| `bitbucket-app-password` | Bitbucket App Password | Critical |
| `circleci-token` | CircleCI Personal API Token | High |
Expand Down
2 changes: 1 addition & 1 deletion docs/user-manuals/tr/detectors/detector-catalog.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ Bu sayfa her yerleşik dedektörü listeler. Doğrulama kapsamı ayrıntıları
| ID | Tespit eder | Şiddet |
|----|------------|--------|
| `github-token` | GitHub Kişisel Erişim Token'ı | Critical |
| `github-oauth-token` | GitHub OAuth2 Token'ı | Critical |
| `github-oauth-token` | GitHub OAuth2 ve kurulum (installation) token'ı — `gho_`/`ghu_`/`ghr_`/`ghs_`, yeni durumsuz (JWT biçimli) `ghs_` kurulum token'ları dâhil | Critical |
| `gitlab-pat` | GitLab Kişisel Erişim Token'ı | Critical |
| `bitbucket-app-password` | Bitbucket Uygulama Parolası | Critical |
| `circleci-token` | CircleCI Kişisel API Token'ı | High |
Expand Down
36 changes: 33 additions & 3 deletions internal/detector/github/github_oauth.go
Original file line number Diff line number Diff line change
Expand Up @@ -8,16 +8,46 @@ import (
"github.com/HodeTech/leakwatch/pkg/finding"
)

var oauthTokenPattern = regexp.MustCompile(`gh[orus]_[A-Za-z0-9_]{36,}`)
// oauthTokenPattern matches GitHub server-/user-to-server tokens. There are two
// shapes to cover, so the pattern is an ordered alternation:
//
// 1. ghs_ STATELESS installation tokens (rolled out from April 2026). These are
// ghs_APPID_<jwt> — a ghs_ prefix, the app ID, an underscore, then a JWT
// (header.payload.signature). They are ~520 chars and contain exactly two
// dots, so the legacy `[A-Za-z0-9_]` body class truncated them at the first
// dot (or missed them when a base64url '-' appeared early). This branch
// captures the whole token: a base64url run followed by exactly two
// dot-separated base64url runs. It is listed FIRST so a stateless token is
// never claimed by the shorter opaque branch below.
// 2. ghs_/gho_/ghu_/ghr_ OPAQUE tokens (legacy ghs_ and the still-opaque
// gho_/ghu_/ghr_): a fixed prefix followed by >=36 of [A-Za-z0-9_].
//
// ghp_ Personal Access Tokens are deliberately excluded and handled by the
// github-token detector (see github_token.go), so any single token is reported
// by exactly one detector.
//
// Note on ghu_ (user-to-server): GitHub has signalled that ghu_ tokens will also
// move to the stateless JWT format later, but the format/timeline are not yet
// published. ghu_ is intentionally left opaque here; when GitHub documents the
// new ghu_ shape, add it to the stateless branch (gh[su]_...) rather than
// guessing at an unspecified format now.
//
// Branch 1 is `ghs_[A-Za-z0-9_-]{8,}(?:\.[A-Za-z0-9_-]{8,}){2}` (stateless
// ghs_APPID_<jwt>); branch 2 is `gh[orus]_[A-Za-z0-9_]{36,}` (opaque
// gho_/ghu_/ghr_/legacy ghs_). The pattern is one raw-string literal — not
// concatenated fragments — so the tools/site-build detector extractor, which
// reads the MustCompile argument from the AST, can still pick it up.
var oauthTokenPattern = regexp.MustCompile(`ghs_[A-Za-z0-9_-]{8,}(?:\.[A-Za-z0-9_-]{8,}){2}|gh[orus]_[A-Za-z0-9_]{36,}`)

// OAuthDetector detects GitHub OAuth2 Tokens.
// OAuthDetector detects GitHub server-/user-to-server tokens
// (gho_/ghu_/ghr_/ghs_), including new stateless ghs_ installation tokens.
type OAuthDetector struct{}

// ID returns the unique identifier of the GitHub OAuth2 token detector.
func (d *OAuthDetector) ID() string { return "github-oauth-token" }

// Description returns a human-readable description of the GitHub OAuth2 token detector.
func (d *OAuthDetector) Description() string { return "GitHub OAuth2 Token" }
func (d *OAuthDetector) Description() string { return "GitHub OAuth2 & Installation Token" }

// Keywords returns the Aho-Corasick pre-filter keywords for GitHub OAuth2 token detection.
func (d *OAuthDetector) Keywords() []string { return []string{"gho_", "ghu_", "ghr_", "ghs_"} }
Expand Down
129 changes: 128 additions & 1 deletion internal/detector/github/github_oauth_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ import (
func TestOAuthDetector_Metadata_ReturnsExpectedValues(t *testing.T) {
d := &OAuthDetector{}
assert.Equal(t, "github-oauth-token", d.ID())
assert.Equal(t, "GitHub OAuth2 Token", d.Description())
assert.Equal(t, "GitHub OAuth2 & Installation Token", d.Description())
assert.Equal(t, finding.SeverityCritical, d.Severity())
assert.NotEmpty(t, d.Keywords())
}
Expand Down Expand Up @@ -97,6 +97,133 @@ func TestOAuthDetector_Scan_MatchAndReject(t *testing.T) {
}
}

// fakeStatelessToken builds an obviously-fake GitHub stateless installation
// token of the ghs_APPID_<jwt> form (header.payload.signature). It is assembled
// from parts at runtime so the source file never contains a contiguous,
// real-looking token literal that secret push-protection could flag.
func fakeStatelessToken(headerTail string) string {
const appID = "12345678"
header := "eyJ" + headerTail
payload := "eyJ" + strings.Repeat("Gh1Ij2Kl", 30)
signature := strings.Repeat("Mn3Op4Qr", 12)
return "ghs_" + appID + "_" + header + "." + payload + "." + signature
}

// TestOAuthDetector_Scan_StatelessToken_CapturesWholeToken proves the new
// ghs_APPID_<jwt> stateless installation tokens are captured in full by a single
// github-oauth-token finding (the pre-2026 behaviour truncated them at the first
// dot or missed them entirely when a base64url '-' appeared early).
func TestOAuthDetector_Scan_StatelessToken_CapturesWholeToken(t *testing.T) {
tests := []struct {
name string
token string
}{
{
name: "long alphanumeric header segment",
token: fakeStatelessToken(strings.Repeat("Ab9Cd0Ef", 5)),
},
{
name: "base64url dash early in header",
token: fakeStatelessToken("Ab-Cd0Ef9Gh"),
},
{
name: "base64url underscore in header",
token: fakeStatelessToken("Ab_Cd0Ef9Gh_Ij"),
},
{
name: "short app id",
token: "ghs_42_" + "eyJ" + strings.Repeat("Ab9Cd0Ef", 4) + "." + "eyJ" + strings.Repeat("Gh1Ij2Kl", 20) + "." + strings.Repeat("Mn3Op4Qr", 10),
},
}

d := &OAuthDetector{}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
findings := d.Scan(context.Background(), []byte(tt.token))
require.Len(t, findings, 1, "stateless token must yield exactly one finding")

f := findings[0]
assert.Equal(t, "github-oauth-token", f.DetectorID)
// The whole token is captured, not just the header segment.
assert.Equal(t, tt.token, string(f.Raw), "must capture the entire token")
assert.Greater(t, len(f.Raw), 100, "stateless tokens are long")

// Redaction stays safe for a long token: only the last four
// characters are ever revealed.
assert.Equal(t, "****"+tt.token[len(tt.token)-4:], f.Redacted)
assert.Len(t, f.Redacted, len("****")+4)
assert.NotContains(t, f.Redacted, tt.token[:len(tt.token)-4],
"redaction must not expose the token body")
})
}
}

// TestOAuthDetector_Scan_NoOverCapture guards the greedy branches against eating
// surrounding context: opaque tokens must not start consuming dots, and a
// stateless token must stop at its third (signature) segment.
func TestOAuthDetector_Scan_NoOverCapture(t *testing.T) {
suffix40 := strings.Repeat("Abc1D678", 5)
stateless := fakeStatelessToken(strings.Repeat("Ab9Cd0Ef", 5))

tests := []struct {
name string
input string
want string // expected single captured match
}{
{
name: "opaque gho_ followed by dotted domain",
input: "gho_" + suffix40 + ".example.com",
want: "gho_" + suffix40,
},
{
name: "stateless token at end of a sentence",
input: "leaked token: " + stateless + ". Please rotate it.",
want: stateless,
},
{
name: "stateless token followed by a fourth dotted segment",
input: stateless + "." + strings.Repeat("Qq11Ww22", 8),
want: stateless,
},
}

d := &OAuthDetector{}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
findings := d.Scan(context.Background(), []byte(tt.input))
require.Len(t, findings, 1)
assert.Equal(t, tt.want, string(findings[0].Raw))
})
}
}

// TestOAuthDetector_Scan_LegacyOpaqueUnchanged confirms the legacy opaque shapes
// (including legacy opaque ghs_) are still captured whole and unchanged.
func TestOAuthDetector_Scan_LegacyOpaqueUnchanged(t *testing.T) {
suffix40 := strings.Repeat("Abc1D678", 5)
for _, prefix := range []string{"gho_", "ghu_", "ghr_", "ghs_"} {
t.Run(prefix, func(t *testing.T) {
token := prefix + suffix40
findings := (&OAuthDetector{}).Scan(context.Background(), []byte(token))
require.Len(t, findings, 1)
assert.Equal(t, token, string(findings[0].Raw))
})
}
}

// TestGitHubDetectors_StatelessNoPrefixOverlap ensures a stateless ghs_ token is
// still claimed by exactly one of the two GitHub detectors (never the ghp_
// personal-access-token detector).
func TestGitHubDetectors_StatelessNoPrefixOverlap(t *testing.T) {
token := []byte(fakeStatelessToken(strings.Repeat("Ab9Cd0Ef", 5)))

tokenFindings := (&Token{}).Scan(context.Background(), token)
oauthFindings := (&OAuthDetector{}).Scan(context.Background(), token)

assert.Empty(t, tokenFindings, "ghp_ detector must not claim a ghs_ token")
require.Len(t, oauthFindings, 1, "oauth detector must claim the ghs_ token")
}
Comment on lines +100 to +225
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check test coverage for the github detector package
cd internal/detector/github
go test -cover -coverprofile=coverage.out
go tool cover -func=coverage.out | grep total

Repository: HodeTech/Leakwatch

Length of output: 474


🏁 Script executed:

#!/bin/bash
set -euo pipefail

pkgs=$(go list ./internal/detector/...)
echo "Detector packages:"
echo "$pkgs"

fail=0
for pkg in $pkgs; do
  echo "== $pkg =="
  tmpfile="/tmp/cover_${pkg//\//_}.out"
  go test -cover -coverprofile="$tmpfile" "$pkg" >/tmp/cover_stdout.txt 2>/tmp/cover_stderr.txt || {
    echo "go test failed for $pkg"
    cat /tmp/cover_stderr.txt
    fail=1
    continue
  }
  total=$(go tool cover -func="$tmpfile" | awk '/total:/{print $3}')
  echo "total: $total"
  # total is like 100.0%
  pct=${total%%%}
  if awk "BEGIN{exit !($pct < 95)}"; then
    echo "Coverage below 95% for $pkg: $total"
    fail=1
  fi
done

if [ "$fail" -ne 0 ]; then
  echo "One or more detector packages are below 95% coverage."
  exit 1
fi

echo "All detector packages meet 95%+ coverage."

Repository: HodeTech/Leakwatch

Length of output: 9032


Test coverage gate for detector packages fails (95% not met overall)
internal/detector/github is at 100%, but multiple detector packages are below the 95% minimum (e.g., internal/detector 81.2%, gcp 88.7%, generic 83.3%, custom 92.9%, privatekey 93.8%, snowflake 92.3%, stripe **92.3%, and testutil` 0.0%), so the detector coverage requirement is not satisfied.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/detector/github/github_oauth_test.go` around lines 100 - 225, The PR
fails the repository-wide detector test-coverage gate because several detector
packages are below 95% (internal/detector, gcp, generic, custom, privatekey,
snowflake, stripe, testutil); add unit tests that exercise each package's
exported detector Scan implementations and helper functions to raise coverage:
create table-driven tests (mirroring patterns in
internal/detector/github_oauth_test.go) that call detectors'
Scan(context.Background(), []byte(...)) for expected matches and non-matches,
assert finding counts and fields (use symbols like OAuthDetector.Scan,
Token.Scan, and any package-specific detector types such as GCPDetector or
StripeDetector), and add tests for utility code in testutil to cover edge cases
and error paths; aim to hit the uncovered branches (negative cases, boundary
inputs, and redaction logic) so each listed package reaches ≥95% coverage and
the gate passes.


// TestGitHubDetectors_NoPrefixOverlap_ReportedByExactlyOne is a regression test
// for the token/oauth prefix overlap (DETA-M-02): every GitHub token prefix must
// be claimed by exactly one of the two detectors, never both.
Expand Down
60 changes: 56 additions & 4 deletions internal/detector/jwt/jwt.go
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
package jwt

import (
"bytes"
"context"
"regexp"

Expand All @@ -11,6 +12,12 @@ import (

var jwtPattern = regexp.MustCompile(`eyJ[A-Za-z0-9_-]{10,}\.eyJ[A-Za-z0-9_-]{10,}\.[A-Za-z0-9_-]{10,}`)

// ghsPrefix marks a GitHub stateless installation token (ghs_APPID_<jwt>). The
// embedded JWT also matches jwtPattern, but it is already reported in full by
// the github-oauth-token detector, so this detector suppresses it to avoid
// splitting one secret into two findings (see isGitHubStatelessBody).
var ghsPrefix = []byte("ghs_")

// JWT detects JSON Web Tokens.
type JWT struct{}

Expand All @@ -28,13 +35,20 @@ func (d *JWT) Severity() finding.Severity { return finding.SeverityHigh }

// Scan scans the given data for JSON Web Token patterns.
func (d *JWT) Scan(_ context.Context, data []byte) []detector.RawFinding {
matches := jwtPattern.FindAll(data, -1)
if len(matches) == 0 {
locs := jwtPattern.FindAllIndex(data, -1)
if len(locs) == 0 {
return nil
}

findings := make([]detector.RawFinding, 0, len(matches))
for _, match := range matches {
findings := make([]detector.RawFinding, 0, len(locs))
for _, loc := range locs {
start, end := loc[0], loc[1]
// Skip JWTs that are the body of a GitHub stateless installation token
// (ghs_APPID_<jwt>); those are reported in full by github-oauth-token.
if isGitHubStatelessBody(data, start) {
continue
}
match := data[start:end]
// Reveal only the trailing characters to avoid exposing the JWT
// header, payload, or signature.
findings = append(findings, detector.RawFinding{
Expand All @@ -43,9 +57,47 @@ func (d *JWT) Scan(_ context.Context, data []byte) []detector.RawFinding {
Redacted: detector.RedactBytes(match),
})
}
if len(findings) == 0 {
return nil
}
return findings
}

// isGitHubStatelessBody reports whether the JWT beginning at start is the body
// of a GitHub stateless installation token (ghs_APPID_<jwt>). RE2 has no
// lookbehind, so it walks back over the contiguous token run (base64url plus the
// ghs_/app-ID separators) immediately preceding the match and checks whether
// that run contains the literal "ghs_".
//
// Contains rather than HasPrefix: the run may carry leading base64url bytes with
// no delimiter (e.g. "xghs_APPID_"). Wherever "ghs_" appears in the run, the run
// has no dots (dots are not token bytes) so it is glued straight onto this JWT,
// forming a "ghs_...eyJ.eyJ.sig" shape that the github-oauth-token detector
// captures in full — its per-segment floors ({8,}) are at or below this
// detector's ({10,}). Suppressing here therefore only removes a duplicate of a
// secret the github detector already reports; it can never drop one. (This
// assumes the github-oauth-token detector is active, which it is by default.)
func isGitHubStatelessBody(data []byte, start int) bool {
i := start
for i > 0 && isTokenByte(data[i-1]) {
i--
}
return bytes.Contains(data[i:start], ghsPrefix)
}

// isTokenByte reports whether b is part of a contiguous token run: a base64url
// character or one of the separators ('_', '-') that appear in a ghs_ token.
func isTokenByte(b byte) bool {
switch {
case b >= 'a' && b <= 'z', b >= 'A' && b <= 'Z', b >= '0' && b <= '9':
return true
case b == '_', b == '-':
return true
default:
return false
}
}
Comment on lines +66 to +99
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check test coverage for the jwt detector package
cd internal/detector/jwt
go test -cover -coverprofile=coverage.out
go tool cover -func=coverage.out | grep total

Repository: HodeTech/Leakwatch

Length of output: 267


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# List all detector packages
packages=$(go list ./internal/detector/... | tr '\n' ' ')
echo "Detector packages:"
echo "$packages"

for pkg in $(go list ./internal/detector/...); do
  echo "== $pkg =="
  # Create a temp coverprofile per package
  tmpfile=$(mktemp)
  # Run tests with coverage
  go test -cover -coverprofile="$tmpfile" "$pkg" >/dev/null
  # Extract total statement coverage
  total=$(go tool cover -func="$tmpfile" | awk '/total:/ {print $3}')
  echo "statement_coverage: $total"
  rm -f "$tmpfile"
done

Repository: HodeTech/Leakwatch

Length of output: 9124


Coverage gate: detector packages are below the 95% minimum

  • github.com/HodeTech/leakwatch/internal/detector overall statement coverage is 81.2%
  • Multiple detector packages are under 95% (e.g., custom 92.9%, gcp 88.7%, generic 83.3%, heroku 92.9%, privatekey 93.8%, snowflake 92.3%, stripe **92.3%, testutil` 0.0%)
  • internal/detector/jwt itself is 100%, but the stated requirement is not satisfied across detector packages.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/detector/jwt/jwt.go` around lines 66 - 99, The detector packages
overall miss the 95% coverage gate; add focused unit tests to exercise uncovered
branches in the detector packages (e.g., custom, gcp, generic, heroku,
privatekey, snowflake, stripe and testutil) so overall detector coverage rises
above 95%. Specifically, create table-driven tests that exercise positive and
negative detection paths and edge cases (including token boundary and separator
behavior similar to jwt.isGitHubStatelessBody and jwt.isTokenByte), add mocks or
sample inputs for cloud/provider-specific detectors (GCP, Heroku, Stripe,
Snowflake, private keys, custom patterns), and include tests for testutil
helpers so they are executed; run `go test ./internal/detector/...
-coverprofile` to verify coverage and iterate until the detector package
coverage meets the 95% threshold.


func init() {
detector.Register(&JWT{})
}
62 changes: 62 additions & 0 deletions internal/detector/jwt/jwt_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,68 @@ func TestJWT_Scan_MatchesValidTokens(t *testing.T) {
}
}

// TestJWT_Scan_SuppressesGitHubStatelessTokenBody verifies that the JWT body of
// a GitHub stateless installation token (ghs_APPID_<jwt>) is NOT reported by the
// jwt detector: that whole token is already reported by github-oauth-token, so
// emitting the embedded JWT too would split one secret into two findings.
func TestJWT_Scan_SuppressesGitHubStatelessTokenBody(t *testing.T) {
// Built from parts so no contiguous real-looking token literal is committed.
header := "eyJ" + strings.Repeat("Ab9Cd0Ef", 5)
payload := "eyJ" + strings.Repeat("Gh1Ij2Kl", 30)
signature := strings.Repeat("Mn3Op4Qr", 12)
jwtBody := header + "." + payload + "." + signature
statelessToken := "ghs_12345678_" + jwtBody

tests := []struct {
name string
input string
expected int
}{
{
name: "stateless ghs_ token body is suppressed",
input: statelessToken,
expected: 0,
},
{
name: "stateless token embedded in config is suppressed",
input: "GITHUB_TOKEN=" + statelessToken + "\n",
expected: 0,
},
{
// A base64url char glued directly before "ghs_" (no delimiter) must
// still be recognised as a ghs_ body: the github-oauth-token detector
// matches "ghs_..." mid-string, so reporting the JWT too would double
// the same secret.
name: "stateless token glued to a preceding token char is suppressed",
input: "x" + statelessToken,
expected: 0,
},
{
name: "standalone JWT is still reported",
input: jwtBody,
expected: 1,
},
{
name: "JWT preceded by a non-ghs token run is still reported",
input: "Bearer " + jwtBody,
expected: 1,
},
{
name: "stateless token plus an unrelated standalone JWT",
input: statelessToken + " and also " + jwtBody,
expected: 1,
},
}

d := &JWT{}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
findings := d.Scan(context.Background(), []byte(tt.input))
assert.Len(t, findings, tt.expected)
})
}
}

func TestJWT_Scan_RejectsInvalidInput(t *testing.T) {
tests := []struct {
name string
Expand Down
Loading
Loading