Skip to content

ci: add banned-files check to block secrets, binaries, and OS artifacts#674

Closed
WuKongAI-CMU wants to merge 1 commit intoNVIDIA:mainfrom
WuKongAI-CMU:ci/banned-files-check
Closed

ci: add banned-files check to block secrets, binaries, and OS artifacts#674
WuKongAI-CMU wants to merge 1 commit intoNVIDIA:mainfrom
WuKongAI-CMU:ci/banned-files-check

Conversation

@WuKongAI-CMU
Copy link

@WuKongAI-CMU WuKongAI-CMU commented Mar 22, 2026

Summary

Adds a CI workflow that automatically rejects PRs containing files that should never be committed:

  • Credential files: .env, .env.*, *.pem, *.key, *.p12, credentials.json, service-account*.json, .npmrc, .netrc, SSH private keys
  • OS metadata: .DS_Store, Thumbs.db, desktop.ini
  • Bytecode: *.pyc

The check inspects only files changed in the PR (git diff --name-only), not the full repo tree. Paths under test/fixtures/ and testdata/ are exempt to avoid blocking legitimate test data.

Also adds missing patterns (service-account*.json, Thumbs.db, desktop.ini) to .gitignore.

Test plan

  • Tested locally with simulated file lists — correctly detects .env.local, .DS_Store, *.pem, credentials.json
  • Correctly skips exempt paths (test/fixtures/test.pem)
  • Passes clean files through without false positives
  • Deny list is maintained in a single array, easy to extend

Fixes #550

Summary by CodeRabbit

  • Chores
    • Added automated validation to prevent sensitive credential files from being committed to the repository.
    • Updated configuration to exclude OS-specific temporary artifacts from version control.

Add a CI workflow that scans PR diffs for files that should never be
committed: credential files (.env, *.pem, *.key, credentials.json),
OS metadata (.DS_Store, Thumbs.db), and bytecode (*.pyc).

The check runs only on changed files (not the full tree) and exempts
test/fixtures/ and testdata/ paths to avoid false positives.

Also adds missing patterns (service-account*.json, Thumbs.db,
desktop.ini) to .gitignore for belt-and-suspenders coverage.

Fixes NVIDIA#550

Signed-off-by: peteryuqin <peter.yuqin@gmail.com>
@coderabbitai
Copy link

coderabbitai bot commented Mar 22, 2026

📝 Walkthrough

Walkthrough

This PR introduces a GitHub Actions workflow that prevents commits of sensitive files (credentials, keys, certificates), binaries, and OS artifacts. It scans files changed in pull requests against a deny list and fails if matches are found. Additionally, .gitignore is updated to cover additional patterns.

Changes

Cohort / File(s) Summary
CI Workflow
.github/workflows/banned-files.yaml
New GitHub Actions workflow that detects and blocks banned files in PRs. Computes changed files using git, matches against deny list patterns (.env*, *.pem, *.key, *.p12, *.pfx, credentials.json, service-account*.json, .DS_Store, Thumbs.db, desktop.ini, .envrc, *.pyc, __pycache__/, node_modules/, dist/), exempts test/fixtures/ and testdata/, and reports violations.
Configuration
.gitignore
Added three patterns: service-account*.json, Thumbs.db, desktop.ini to prevent accidental commits of cloud credentials and Windows Explorer metadata.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Poem

🐰 Hop, hop, hooray! Say the bunnies with glee,
No secrets or binaries shall sneak past this spree,
.env files banned, and credentials locked tight,
The banned-files check keeps the repo just right! 🔐✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and accurately summarizes the main change: adding a CI workflow to block banned files including secrets, binaries, and OS artifacts.
Linked Issues check ✅ Passed The pull request successfully addresses all coding requirements from issue #550: implements the banned-files CI workflow with correct trigger events, checks only changed files via git diff, blocks all specified patterns with human-readable error messages, maintains the deny list in a single location, exempts test paths, and updates .gitignore for missing patterns.
Out of Scope Changes check ✅ Passed All changes are directly scoped to issue #550 requirements: the new workflow implements the banned-files check and the .gitignore update covers missing patterns mentioned in the issue specification.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/banned-files.yaml:
- Around line 27-48: Add explicit directory patterns 'node_modules/', 'dist/',
and '__pycache__/' to the BANNED_PATTERNS array and change the matching logic
that currently tests only the basename to test the full file path; specifically,
update the BANNED_PATTERNS declaration (variable name BANNED_PATTERNS) to
include those three entries and replace any basename-based check with a
path-based comparison (e.g., use the full filepath variable in a shell pattern
match or grep against "$file" rather than basename("$file")) so patterns like
node_modules/package/index.js are caught.
- Line 54: Update the git diff command assigned to CHANGED: include 'M' in the
--diff-filter (so use ACRM or AC R M combined) to ensure modified files are
scanned, and replace the HEAD reference with the explicit PR head SHA (${{
github.event.pull_request.head.sha }}) so the command uses the PR's head commit;
update the line with CHANGED=$(git diff --name-only --diff-filter=... "${{
github.event.pull_request.base.sha }}" "${{ github.event.pull_request.head.sha
}}") accordingly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 8bfefea1-b225-4237-ad1e-4d763671a205

📥 Commits

Reviewing files that changed from the base of the PR and between d37a09f and 6400747.

📒 Files selected for processing (2)
  • .github/workflows/banned-files.yaml
  • .gitignore

Comment on lines +27 to +48
BANNED_PATTERNS=(
'.env' 'May contain API keys or secrets'
'.env.*' 'May contain API keys or secrets'
'.envrc' 'direnv secrets file'
'*.pem' 'Private key or certificate'
'*.key' 'Private key'
'*.p12' 'PKCS#12 keystore'
'*.pfx' 'PKCS#12 keystore'
'*.jks' 'Java keystore'
'*.keystore' 'Keystore file'
'credentials.json' 'Cloud provider credentials'
'service-account*.json' 'Cloud provider service account'
'.npmrc' 'May contain registry tokens'
'.netrc' 'May contain login credentials'
'*_rsa' 'SSH private key'
'*_ed25519' 'SSH private key'
'*_ecdsa' 'SSH private key'
'.DS_Store' 'macOS Finder metadata'
'Thumbs.db' 'Windows Explorer metadata'
'desktop.ini' 'Windows Explorer metadata'
'*.pyc' 'Python bytecode'
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify deny-list coverage and matching strategy
rg -n --type=yaml 'node_modules|dist|__pycache__|basename=\$\(basename "\$file"\)|\[\[ "\$basename" == \$pattern \]\]' .github/workflows/banned-files.yaml

Repository: NVIDIA/NemoClaw

Length of output: 160


🏁 Script executed:

cat -n .github/workflows/banned-files.yaml

Repository: NVIDIA/NemoClaw

Length of output: 4429


Directory build artifacts should be explicitly blocked in the deny list with full-path matching.

The BANNED_PATTERNS list is missing node_modules/, dist/, and __pycache__/ entries. Additionally, line 79 only checks the basename, which prevents path-based patterns from working. A directory artifact like node_modules/package/index.js would not be caught since the basename check only sees index.js.

Add the missing directory patterns and update the matching logic to check the full file path:

Suggested fix
           BANNED_PATTERNS=(
             '*.pyc'                   'Python bytecode'
+            'node_modules/*'          'Dependency vendor directory'
+            'dist/*'                  'Build artifact directory'
+            '__pycache__/*'           'Python cache directory'
           )
             basename=$(basename "$file")
-            if [[ "$basename" == $pattern ]]; then
+            if [[ "$file" == $pattern || "$basename" == $pattern ]]; then
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/banned-files.yaml around lines 27 - 48, Add explicit
directory patterns 'node_modules/', 'dist/', and '__pycache__/' to the
BANNED_PATTERNS array and change the matching logic that currently tests only
the basename to test the full file path; specifically, update the
BANNED_PATTERNS declaration (variable name BANNED_PATTERNS) to include those
three entries and replace any basename-based check with a path-based comparison
(e.g., use the full filepath variable in a shell pattern match or grep against
"$file" rather than basename("$file")) so patterns like
node_modules/package/index.js are caught.

EXEMPT_DIRS="^(test/fixtures/|testdata/)"

# Get files changed in this PR
CHANGED=$(git diff --name-only --diff-filter=ACR "${{ github.event.pull_request.base.sha }}" HEAD)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify current diff-filter usage in workflow
rg -n --type=yaml 'git diff --name-only --diff-filter=' .github/workflows/banned-files.yaml

Repository: NVIDIA/NemoClaw

Length of output: 170


🏁 Script executed:

cat -n .github/workflows/banned-files.yaml | head -80

Repository: NVIDIA/NemoClaw

Length of output: 3596


🏁 Script executed:

cat -n .github/workflows/banned-files.yaml | tail -30

Repository: NVIDIA/NemoClaw

Length of output: 1364


🏁 Script executed:

# Check if there are any banned files already tracked in the repository
git ls-files | grep -E '\.(env|pem|key|p12|pfx|jks|keystore)$|credentials\.json|service-account.*\.json|\.npmrc|\.netrc|_rsa|_ed25519|_ecdsa|\.DS_Store|Thumbs\.db|desktop\.ini|\.pyc' | head -20

Repository: NVIDIA/NemoClaw

Length of output: 41


🏁 Script executed:

# Check the git diff behavior difference between HEAD and head.sha in the context
# Let's also verify how the checkout action handles the ref
git rev-parse HEAD
git log --oneline -1

Repository: NVIDIA/NemoClaw

Length of output: 179


🏁 Script executed:

# Let's verify what the diff-filter options mean and check the actual behavior
# in a PR context. Let's also see if there are any comments or docs about this choice
git log -p --follow -S 'diff-filter=ACR' -- .github/workflows/banned-files.yaml | head -100

Repository: NVIDIA/NemoClaw

Length of output: 41


🏁 Script executed:

# Let's check if there are any other workflow files or documentation that explain the design decision
find .github -type f -name "*.yaml" -o -name "*.yml" | xargs grep -l "diff-filter" 2>/dev/null

Repository: NVIDIA/NemoClaw

Length of output: 94


Include modified files in the scan set.

Line 54 uses --diff-filter=ACR, which excludes M (Modified). If a banned file were already tracked in the repository (either accidentally or from before this check existed), modifications to it would bypass this check. Include M in the filter to catch all changes, including modifications to banned files.

Additionally, replace HEAD with ${{ github.event.pull_request.head.sha }} for more explicit and reliable SHA references in the PR context.

Suggested fix
-          CHANGED=$(git diff --name-only --diff-filter=ACR "${{ github.event.pull_request.base.sha }}" HEAD)
+          CHANGED=$(git diff --name-only --diff-filter=ACMR "${{ github.event.pull_request.base.sha }}" "${{ github.event.pull_request.head.sha }}")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
CHANGED=$(git diff --name-only --diff-filter=ACR "${{ github.event.pull_request.base.sha }}" HEAD)
CHANGED=$(git diff --name-only --diff-filter=ACMR "${{ github.event.pull_request.base.sha }}" "${{ github.event.pull_request.head.sha }}")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/banned-files.yaml at line 54, Update the git diff command
assigned to CHANGED: include 'M' in the --diff-filter (so use ACRM or AC R M
combined) to ensure modified files are scanned, and replace the HEAD reference
with the explicit PR head SHA (${{
github.event.pull_request.head.sha }}) so the command uses the PR's head commit;
update the line with CHANGED=$(git diff --name-only --diff-filter=... "${{
github.event.pull_request.base.sha }}" "${{ github.event.pull_request.head.sha
}}") accordingly.

@cv
Copy link
Contributor

cv commented Mar 23, 2026

Thanks for putting this together — the banned-files idea is solid and the pattern list was really helpful!

We ended up implementing this as a pre-commit hook in #673 instead of a GHA workflow. The advantage is that it catches banned files before they leave the developer's machine, rather than after they've already been pushed to GitHub. The hook uses git ls-files --ignored --exclude-standard --cached to reject any force-added files that .gitignore would normally block, so .gitignore stays the single source of truth — no duplicate pattern list to maintain.

We also picked up the .gitignore additions from this PR (service-account*.json, Thumbs.db, desktop.ini) and added a few more (key.json, token.json, secrets.json/yaml, .envrc, .direnv/, .pypirc, *.tfvars).

Going to close this in favor of the pre-commit approach in #673, but thank you for flagging the gap!

@cv cv closed this Mar 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ci: add banned-files check to block secrets, binaries, and OS artifacts

2 participants