-
Notifications
You must be signed in to change notification settings - Fork 8
[chore][ci] memory-slug leak guard + clean 6 P0 references #274
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,137 @@ | ||
| #!/usr/bin/env python3 | ||
| """Block internal memory-slug references from leaking into public OSS files. | ||
|
|
||
| Why this guard exists | ||
| ===================== | ||
| Some agents that work on this repo use a private file-based memory system | ||
| indexed by `[[short-kebab-name]]` slugs (e.g. `[[feedback_no_test_on_prod]]`). | ||
| Those slugs are agent-internal: they only resolve inside the agent's | ||
| private memory store and have no meaning to outside contributors. Each | ||
| leak makes the public OSS look like it has dangling links and reveals | ||
| internal process slang. release.yml has burned us once on this class | ||
| already (Vincent caught a "Co-Authored-By: Claude" leak twice; the slug | ||
| shape is the same failure mode — internal artefact escaping to OSS via | ||
| auto-generated content). | ||
|
|
||
| What this checks | ||
| ================ | ||
| Scans the source tree for the pattern `[[<type>_<slug>]]` where `<type>` | ||
| is one of the four agent-memory categories: `feedback`, `project`, | ||
| `reference`, `user`. The matcher requires the leading double-bracket so | ||
| ordinary markdown reflinks `[label][ref]` don't false-positive. | ||
|
|
||
| Skipped paths | ||
| ============= | ||
| - `node_modules/`, `dist/`, `build/`, `coverage/` — generated/vendored | ||
| - `.git/` — VCS metadata | ||
| - `.claude/`, `~/.claude/`, `memory/` — these ARE the memory store | ||
| - This script itself + its workflow (talk about the pattern by design) | ||
|
|
||
| Exit code: 0 if clean, 1 if any leak found (prints findings). | ||
| """ | ||
|
|
||
| from __future__ import annotations | ||
|
|
||
| import os | ||
| import re | ||
| import sys | ||
| from pathlib import Path | ||
|
|
||
| # `\[\[(feedback|project|reference|user)_[a-z0-9_-]+\]\]` | ||
| # - leading `\[\[` to require the memory-link shape (not arbitrary `[x]`) | ||
| # - one of the four category prefixes (the agent memory types) | ||
| # - underscore separator + slug body (kebab/snake/digit chars) | ||
| # - closing `\]\]` | ||
| SLUG_RE = re.compile(r"\[\[(feedback|project|reference|user)_[a-z0-9_-]+(?:\.md)?\]\]") | ||
|
|
||
| EXTENSIONS = {".ts", ".tsx", ".js", ".jsx", ".mjs", ".cjs", ".md", ".yml", ".yaml"} | ||
|
|
||
| # Walk-relative dir names to prune entirely. | ||
| SKIP_DIRS = { | ||
| "node_modules", | ||
| "dist", | ||
| "build", | ||
| "out", | ||
| "coverage", | ||
| ".git", | ||
| ".next", | ||
| ".turbo", | ||
| ".cache", | ||
| # Memory stores by convention. | ||
| "memory", | ||
| ".claude", | ||
| } | ||
|
|
||
| # Files where the pattern is allowed (this guard talks about it by design). | ||
| SELF_ALLOWLIST = { | ||
| ".github/scripts/check-no-memory-slugs.py", | ||
| ".github/workflows/no-memory-slugs.yml", | ||
| } | ||
|
|
||
| # Path-prefix allowlist for the initial rollout. The pre-existing 57 | ||
| # legacy references in these trees are historical design context — many | ||
| # RFCs intentionally cite the slug as the source of a decision and the | ||
| # SOP doc lists agent-memory categories by name. Cleaning them up is | ||
| # tracked as a backlog item separate from this guard, so we narrow the | ||
| # initial enforcement to production code + user-facing docs and let the | ||
| # documentation trees be audited offline at the owners' pace. | ||
| # | ||
| # REMOVE entries from this list as their backing tree is audited. | ||
| ALLOWLIST_PATH_PREFIXES = ( | ||
| "docs/sop/", | ||
| "docs/rfcs/", | ||
| "docs/research/", | ||
| "docs/troubleshooting/", | ||
| "docs/tests/", | ||
|
Comment on lines
+81
to
+85
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Because these entire documentation trees are skipped, any new Useful? React with 👍 / 👎. |
||
| ) | ||
|
|
||
|
|
||
| def scan(root: Path) -> list[tuple[str, int, str]]: | ||
| findings: list[tuple[str, int, str]] = [] | ||
| for dirpath, dirnames, filenames in os.walk(root): | ||
| # Mutate dirnames in-place so os.walk doesn't descend into pruned dirs. | ||
| dirnames[:] = [d for d in dirnames if d not in SKIP_DIRS] | ||
| for fname in filenames: | ||
| path = Path(dirpath) / fname | ||
| if path.suffix.lower() not in EXTENSIONS: | ||
| continue | ||
| rel = path.relative_to(root).as_posix() | ||
| if rel in SELF_ALLOWLIST: | ||
| continue | ||
| if any(rel.startswith(prefix) for prefix in ALLOWLIST_PATH_PREFIXES): | ||
| continue | ||
| try: | ||
| # `errors='replace'` so a stray binary masquerading as a text | ||
| # extension does not abort the whole scan. | ||
| text = path.read_text(encoding="utf-8", errors="replace") | ||
| except OSError: | ||
| continue | ||
| for lineno, line in enumerate(text.splitlines(), start=1): | ||
| for match in SLUG_RE.finditer(line): | ||
| findings.append((rel, lineno, match.group(0))) | ||
| return findings | ||
|
|
||
|
|
||
| def main() -> int: | ||
| root = Path(sys.argv[1] if len(sys.argv) > 1 else ".").resolve() | ||
| if not root.exists(): | ||
| print(f"error: scan root does not exist: {root}", file=sys.stderr) | ||
| return 2 | ||
| findings = scan(root) | ||
| if not findings: | ||
| print("OK: no internal memory-slug references found") | ||
| return 0 | ||
| print( | ||
| f"FAIL: found {len(findings)} internal memory-slug reference(s). " | ||
| "These are private agent-memory pointers and must not leak into " | ||
| "public OSS files — rewrite the comment to convey the intent " | ||
| "directly without the [[slug]] form.", | ||
| file=sys.stderr, | ||
| ) | ||
| for rel, lineno, snippet in findings: | ||
| print(f" {rel}:{lineno}: {snippet}", file=sys.stderr) | ||
| return 1 | ||
|
|
||
|
|
||
| if __name__ == "__main__": | ||
| sys.exit(main()) | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,59 @@ | ||
| # Block internal memory-slug references from leaking into public OSS files. | ||
| # | ||
| # Runs the Python grep guard in .github/scripts/check-no-memory-slugs.py | ||
| # on every PR + main push. Fails the build on any unexpected | ||
| # `[[feedback_*]]` / `[[project_*]]` / `[[reference_*]]` / `[[user_*]]` | ||
| # reference in source / docs (with a tree-prefix allowlist for the | ||
| # legacy documentation areas tracked under a separate backlog issue — | ||
| # see the script for the current allowlist). | ||
| # | ||
| # Python (not in-yml bash sed loop) per the team's CI-guard pattern: | ||
| # multi-pattern scans on large repos can run pathologically slow under | ||
| # bash on Windows runners; a small Python script is portable and easy | ||
| # to extend. | ||
|
|
||
| name: lint (no internal memory-slug leak) | ||
|
|
||
| on: | ||
| pull_request: | ||
| paths: | ||
| - '**/*.ts' | ||
| - '**/*.tsx' | ||
| - '**/*.js' | ||
| - '**/*.jsx' | ||
| - '**/*.mjs' | ||
| - '**/*.cjs' | ||
| - '**/*.md' | ||
| - '**/*.yml' | ||
| - '**/*.yaml' | ||
| - '.github/scripts/check-no-memory-slugs.py' | ||
| - '.github/workflows/no-memory-slugs.yml' | ||
| push: | ||
| branches: [main] | ||
| paths: | ||
| - '**/*.ts' | ||
| - '**/*.tsx' | ||
| - '**/*.js' | ||
| - '**/*.jsx' | ||
| - '**/*.mjs' | ||
| - '**/*.cjs' | ||
| - '**/*.md' | ||
| - '**/*.yml' | ||
| - '**/*.yaml' | ||
| - '.github/scripts/check-no-memory-slugs.py' | ||
| - '.github/workflows/no-memory-slugs.yml' | ||
|
|
||
| concurrency: | ||
| group: lint-slugs-${{ github.workflow }}-${{ github.ref }} | ||
| cancel-in-progress: ${{ github.ref != 'refs/heads/main' }} | ||
|
|
||
| jobs: | ||
| no-memory-slugs: | ||
| name: no internal [[feedback_*]] slug references | ||
| runs-on: ubuntu-latest | ||
| timeout-minutes: 2 | ||
| steps: | ||
| - uses: actions/checkout@v4 | ||
|
|
||
| - name: Run check-no-memory-slugs.py | ||
| run: python3 .github/scripts/check-no-memory-slugs.py . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When a leak lands in other tracked public text files, this extension allowlist skips it entirely; I checked the repo and there are public
docs-siteVue components and shell installer scripts, while the workflow path filter also omits those suffixes, so a.vue/.sh-only PR would not even run the guard. That leaves internal[[feedback_*]]-style slugs able to pass CI outside the listed TS/JS/Markdown/YAML files, despite the job being intended to block leaks in public OSS files.Useful? React with 👍 / 👎.