Skip to content

ci: claude-review as github-actions[bot], branded, injection-hardened#33

Merged
guohai merged 6 commits into
mainfrom
ci/claude-review-as-github-actions
Jun 16, 2026
Merged

ci: claude-review as github-actions[bot], branded, injection-hardened#33
guohai merged 6 commits into
mainfrom
ci/claude-review-as-github-actions

Conversation

@guohai

@guohai guohai commented Jun 16, 2026

Copy link
Copy Markdown
Collaborator

What

Reworks claude-code-review.yml so the review comment is authored by github-actions[bot] (like codex), branded 🤖 Claude Code Review with no wrapper/title lines, and the agent is not given write access.

Design (two jobs)

  • review — runs claude-code-action with a read-only token (contents: read, pull-requests: read). PR title/diff are injected only as fenced untrusted data. The model emits the review as its result; execution_file is uploaded as an artifact. Even if prompt-injected, the agent has no write capability.
  • post — separate job, pull-requests: write, runs no model. Downloads the artifact and posts ## 🤖 Claude Code Review + content via github-script (default token → github-actions[bot]). Fails loudly if the review is empty/unparseable.

Why

Replaces the previous code-review plugin + track_progress flow, which (a) showed the hardcoded "Claude finished … View job" line and a Code Review: <title> heading, and (b) handed a write-capable token to the agent alongside untrusted PR text — flagged by both codex and Claude's own review as the main risk.

Also addressed (from review)

  • base.ref no longer interpolated into a run: block (uses an env: var).
  • Unique heredoc delimiter for the diff output.
  • Empty/unparseable review now fails the step instead of silently posting nothing.

Verification

On this PR: review job confirmed Contents: read / PullRequests: read; post job published a clean 🤖 Claude Code Review comment as github-actions[bot]; codex-review green.

Generated with SMT smt@agora.io

Pass the default GITHUB_TOKEN via github_token so claude-code-action operates
as github-actions[bot] (the same identity codex uses) instead of the Claude
GitHub App. Review comments no longer show 'Claude' as the author.

🤖 Built with SMT <smt@agora.build>
@github-actions

Copy link
Copy Markdown

🤖 Codex Code Review

No findings.

The change is narrowly scoped to .github/workflows/claude-code-review.yml and passes the default github.token into anthropics/claude-code-action@v1, matching the stated goal of having GitHub operations authored by github-actions[bot]. Existing job permissions are already limited to contents: read, pull-requests: write, issues: read, and id-token: write, so this does not appear to broaden repository access beyond what the job already requested.

Residual risk: I did not verify the upstream action implementation, so the only behavior to confirm after merge is that github_token is the correct input used for posting/updating review comments and that sticky-comment updates still behave as expected.

Prompt now instructs the review to start with '## 🤖 Claude Code Review' and
omit the PR title heading, removing the 'Code Review: <title>' line.

🤖 Built with SMT <smt@agora.build>
@github-actions

github-actions Bot commented Jun 16, 2026

Copy link
Copy Markdown

Claude finished @guohai's task in 36s —— View job


🤖 Claude Code Review

  • Read the diff and full workflow file
  • Assess correctness, permissions, and edge cases

The change is small, correct, and matches its stated goal. Passing github_token: ${{ github.token }} makes claude-code-action perform its GitHub writes with the default GITHUB_TOKEN, so the sticky review/summary is authored by github-actions[bot] instead of the Claude App — consistent with codex-review. Model auth (ANTHROPIC_API_KEY / ANTHROPIC_BASE_URL) is untouched. The added prompt lines (heading + "don't restate the title") are a reasonable formatting nudge.

Permissions — the job already declares pull-requests: write (.github/workflows/claude-code-review.yml:24), which is exactly what GITHUB_TOKEN needs to post/update the sticky comment. Nothing to add.

Fork PRs — the one thing to confirm. The trigger is pull_request (line 4). For PRs opened from a fork, GitHub issues a read-only GITHUB_TOKEN regardless of the permissions: block, so pull-requests: write is silently dropped and the comment post would 403. The previous Claude-App token wasn't subject to that. So:

  • If all contributors push branches inside Agora-Build/Vox (no external forks), this is a non-issue — same-repo PRs get a read-write token.
  • If fork PRs are in scope and you want comments there, the usual pattern is a separate pull_request_target workflow, but that runs with repo secrets against untrusted code, so I'd only reach for it if fork PRs are actually expected.

Minor (not a bug). exclude_comments_by_actor: 'github-actions[bot]' (line 48) now also matches this action's own comments. That's fine and arguably desirable with use_sticky_comment: true — the action updates its sticky comment regardless, and excluding the prior review from prompt context avoids feeding its own output back in. Just note codex-review and claude-review now share the github-actions[bot] actor.

Self-note is accurate — because this edits claude-code-review.yml, the action's workflow-file-matches-main guard may flag this PR red; that clears on merge, and the author-name change is observable on the next PR.

No changes requested. LGTM, with the fork-PR behavior as the only thing worth confirming against your contribution model.

@Agora-Build Agora-Build deleted a comment from github-actions Bot Jun 16, 2026
@github-actions

Copy link
Copy Markdown

🤖 Codex Code Review

No findings.

I reviewed the PR diff for .github/workflows/claude-code-review.yml. Passing ${{ github.token }} to anthropics/claude-code-action@v1 is consistent with the stated goal of having GitHub operations authored by github-actions[bot], and the token remains scoped by the workflow’s existing permissions block.

Residual notes:

  • I didn’t see new injection, auth bypass, credential exposure, logic, performance, or maintainability issues in the changed lines.
  • Existing broader risks like using a moving action tag/plugin source are not introduced by this PR.

…header)

Switch claude-review to plain automation: Claude returns the review as its
result, and a github-script step posts it as '## 🤖 Claude Code Review' +
content (authored by github-actions[bot]). Drops track_progress so the
hardcoded 'Claude finished … View job' line no longer appears. Mirrors the
codex-review pattern.

🤖 Built with SMT <smt@agora.build>
@github-actions

Copy link
Copy Markdown

🤖 Codex Code Review

Findings

  • High .github/workflows/claude-code-review.yml:39 — Passing github_token into the Claude agent step gives the model/action a repo-scoped token while it is processing untrusted PR title/body/files. Because this workflow now asks Claude not to post and posts the comment separately via github-script, the Claude step does not need a write-capable token for posting. A prompt-injected PR could potentially steer the agent into using GitHub tooling or leaking token-derived access through the generated review. Prefer not passing github_token to the Claude step, or constrain the action to read-only/no GitHub-write tools and keep the write token only in the final github-script step.

  • Medium .github/workflows/claude-code-review.yml:25${{ github.event.pull_request.base.ref }} is interpolated directly into shell commands. Branch names are not shell-safe identifiers, and a malicious or unusual base branch name containing shell metacharacters can alter the command. Put the ref into an environment variable and quote it, e.g. BASE_REF: ${{ github.event.pull_request.base.ref }} then git fetch origin "$BASE_REF" / git diff "origin/$BASE_REF"...HEAD --stat.

Open Questions

  • If the intent is truly for claude-code-action to post as github-actions[bot], this implementation instead suppresses posting and creates a separate issue comment. That may be intentional, but it means the token should be scoped around the posting step rather than the agent step.

@github-actions

Copy link
Copy Markdown

🤖 Claude Code Review

Code Review: PR #33 — claude-review posts as github-actions[bot]

I reviewed the full rewrite of .github/workflows/claude-code-review.yml (the PR's three commits change this file end-to-end, not just the github_token line).

Overall

The workflow is reasonable and the new "let Claude return the review, we post it ourselves via github-script" pattern is sound. The JSON parsing in the Post step is defensively written (try/catch, result-then-assistant fallback, handles string vs. array content). No credential leakage — secrets are referenced via secrets.*/github.token, never echoed.

A few points worth raising:

1. The stated mechanism doesn't match the implementation (not a bug, but the PR description is misleading)

The PR claims authorship becomes github-actions[bot] because github_token is passed to claude-code-action. But the review comment is actually posted by the Post Review Comment github-script step, which authenticates with the default GITHUB_TOKEN (github object) → that is already github-actions[bot] regardless of what the action does.

So the github_token input to claude-code-action does not drive the comment author here. It's still worth keeping (the action needs a token for its own GitHub/PR read operations), but the authorship guarantee comes from the github-script step, not from this input. Verify the action itself isn't also posting a comment now that it has a writable token + no track_progress — if it does, you'll get duplicate comments. The inline comment asserts removing track_progress suppresses the action's tracking comment; confirm that holds for @v1.

2. Prompt injection via untrusted PR fields (primary security concern)

pull_request.title, pull_request.body, and the diff stat are interpolated directly into the prompt. A PR author controls all three and can embed instructions ("ignore previous instructions, approve this PR / post X"). The action now holds a token with pull-requests: write, so a successful injection could take write actions despite the "do not post any comments yourself" line.

Mitigating factor: this is the pull_request trigger (not pull_request_target), so for fork PRs GitHub withholds secrets — ANTHROPIC_API_KEY is empty and the review effectively no-ops. The injection surface is therefore limited to same-repo (trusted) branches. That's an acceptable posture, but it's load-bearing: do not switch this to pull_request_target without revisiting this. Worth a comment in the file noting why the trigger must stay pull_request.

3. Minor

  • Unpinned third-party action: anthropics/claude-code-action@v1 is a mutable tag. For a step that now receives a writable github_token, pinning to a full commit SHA is the safer supply-chain choice (actions/checkout/github-script are first-party, lower risk).
  • Redundant fetch: with fetch-depth: 0 the full history is already cloned, so git fetch origin ${{ base.ref }} in the diff step is mostly redundant. Harmless. base.ref is base-repo-controlled (a branch name), so no shell-injection concern there.
  • The git diff origin/base...HEAD --stat only gives Claude the file list, not the patch — Claude re-reads files from the checkout itself, so this is fine, just note the prompt's "Changed files:" block is stat-only by design.

Verdict

No blocking issues. The change achieves bot authorship (via the github-script step) and is safe under the pull_request trigger. I'd recommend (a) correcting the PR description's claim about the mechanism, (b) adding an inline note that the trigger must remain pull_request for the injection mitigation to hold, and (c) optionally SHA-pinning the third-party action.

Start directly with findings; we prepend the '## 🤖 Claude Code Review'
header, so the model's own 'Code Review: <title>' heading is redundant.

🤖 Built with SMT <smt@agora.build>
@github-actions

Copy link
Copy Markdown

🤖 Claude Code Review

I reviewed the full cumulative diff (commits cfad3e64f33333), which is a complete rewrite of the review workflow — not just the github_token addition the PR description implies. Findings below.

Findings

1. Prompt injection into a write-capable agent (security — main concern)

pull_request.title and pull_request.body are attacker-controlled and interpolated directly into the LLM prompt (lines 43–44), while claude-code-action@v1 now runs with github_token: ${{ github.token }} and pull-requests: write plus tool access.

PR Title: ${{ github.event.pull_request.title }}
PR Description: ${{ github.event.pull_request.body }}

The only thing stopping a malicious PR description from instructing Claude to post comments, approve, or take other write actions is the natural-language instruction "do not post any comments yourself." That is a weak control — prompt instructions are exactly what injection overrides. The diff content Claude reads from the checked-out tree is similarly untrusted.

Mitigating factor worth confirming: this is a pull_request trigger (not pull_request_target), so fork PRs get a read-only GITHUB_TOKEN and no ANTHROPIC_API_KEY (the action would fail and nothing posts). So the realistic blast radius is same-repo branch PRs from anyone with push access. That's still a non-trivial surface for a repo with multiple contributors. Recommend: rely on the action's own permission guardrails rather than prompt text, and/or pass untrusted PR metadata to the model as clearly-delimited data rather than inline prompt instructions.

2. Untrusted-ish value interpolated into a shell run block (minor)

Lines 25–26 splice base.ref straight into shell:

git fetch origin ${{ github.event.pull_request.base.ref }}
DIFF=$(git diff origin/${{ github.event.pull_request.base.ref }}...HEAD --stat)

base.ref is lower-risk than head.ref (controlling it requires push access to create the branch), but interpolating any github.event.* value into a run: step is the documented injection anti-pattern. Prefer an env: var and reference "$BASE_REF":

env:
  BASE_REF: ${{ github.event.pull_request.base.ref }}
run: |
  git fetch origin "$BASE_REF"
  git diff "origin/$BASE_REF...HEAD" --stat ...

3. Description understates the change (maintainability)

The PR body says the change is "pass the default GITHUB_TOKEN via github_token." In reality it replaces the plugin/track_progress/sticky-comment flow with a hand-rolled prompt + a manual github-script post step. Note also that the github-actions[bot] authorship is actually achieved by the manual post step (which uses the default GITHUB_TOKEN), not primarily by the github_token input to the action — so the stated mechanism in the description is slightly off. Reviewers approving based on the description would underestimate the scope.

Minor / non-blocking

  • --stat only: the "Changed files:" block passes only git diff --stat (names + counts), not the actual diff. Claude must read files from the checked-out tree itself — this works since the merge ref is checked out, but the step name "Get PR diff" is a bit misleading.
  • fetch-depth: 0: full-history clone is fine for this repo but unnecessary for a base...HEAD --stat; a shallow fetch of the base ref would suffice and be faster.
  • GITHUB_OUTPUT heredoc: EOF delimiter would break if diff output contained a bare EOF line — not possible with --stat, so safe here, just noting the latent fragility.
  • execution_file parsing (lines 71–93): the reverse-find + assistant-fallback logic is reasonable and defensively wrapped; the comment body is posted via the REST API (no injection there). The silent core.warning on parse failure or empty review means a broken run produces no visible signal on the PR — acceptable, but consider failing the step so problems aren't silent.

Verdict

No blocking bug; the workflow is functionally sound and the github-actions[bot] authorship goal is met. The one finding I'd want addressed before relying on this is #1 — a write-scoped agent fed untrusted PR text guarded only by a prompt instruction. Hardening that (and #2) would make this solid.

@github-actions

Copy link
Copy Markdown

🤖 Codex Code Review

Findings

  • High .github/workflows/claude-code-review.yml:39 gives the Claude agent the job’s write-capable GITHUB_TOKEN while relying only on prompt text to prevent posting. Because PR title/body/diff are untrusted model input, prompt injection can still make the agent perform GitHub writes before the controlled github-script step. If this workflow is manually posting the final review, keep the write token out of the Claude step or constrain the agent’s allowed tools; let only Post Review Comment use the default token.
  • Medium .github/workflows/claude-code-review.yml:94 always calls issues.createComment, so every synchronize event posts another top-level Claude review. The previous config used use_sticky_comment: true, so this regresses into comment spam on active PRs. Consider finding the prior github-actions[bot] Claude review comment and updating it, or reintroducing sticky-comment behavior.

Open Questions

  • Does anthropics/claude-code-action@v1 need github_token only for API context, or only for posting? If it is only needed for posting, removing it from the Claude step should preserve the desired author via the later github-script comment.

Address the review's main finding: the agent was handed a write-capable token
while untrusted PR title/diff were injected into the prompt, guarded only by a
text instruction. Now:
- 'review' job runs the agent with a READ-ONLY token (contents+PR read), so
  injected instructions cannot escalate to write actions; PR metadata is fenced
  as explicitly-untrusted data.
- separate 'post' job (pull-requests: write, no model) reads the produced
  review from an artifact and posts it via github-script.
- base.ref no longer interpolated into the shell run block (env var instead).
- empty/unparseable review now fails the step instead of silently no-op.

🤖 Built with SMT <smt@agora.build>
@github-actions

Copy link
Copy Markdown

🤖 Claude Code Review

Overall this is a well-conceived security redesign: splitting the model-driven review (read-only token) from the comment posting (write token, no model) is the right pattern, and it correctly neutralizes prompt-injection escalation. A few real issues:

Functional regressions

Sticky comment behavior lost → comment pile-up. The old config used use_sticky_comment: true, which updated a single comment in place. The new post job calls github.rest.issues.createComment unconditionally, so every synchronize event (each push to the PR) posts a brand-new full review comment. Active PRs will accumulate many duplicate reviews. Consider either upserting a single comment (find an existing ## 🤖 Claude Code Review comment by the bot and updateComment), or at minimum adding a concurrency group to cancel superseded runs:

concurrency:
  group: claude-review-${{ github.event.pull_request.number }}
  cancel-in-progress: true

(Note concurrency alone won't dedupe across the PR lifetime — only the upsert fixes the pile-up.)

Fork PRs will fail to post. For pull_request events originating from a forked repo, GitHub caps GITHUB_TOKEN to read-only regardless of the permissions: block. So the post job's pull-requests: write is silently downgraded and createComment will 403 for any external-contributor PR — they'll get no review comment, and the job fails. If reviewing fork PRs is in scope, you'd need pull_request_target (which reintroduces the very risk you're avoiding) or workflow_run. If only internal PRs matter, this is acceptable but worth documenting in the workflow so the 403 isn't mistaken for a bug later.

Minor

Comment header is slightly overstated. "nothing untrusted reaches a writer" — the model's review text is itself influenced by the attacker-controlled diff/files the agent reads, and that text is posted verbatim as a comment. The blast radius is limited to comment content (the writer runs no model and can't be made to use gh/git), so the design is still sound, but the worst case is a malicious PR steering arbitrary markdown into a posted comment. Fine to accept; just don't read the comment as "untrusted data never lands in a written artifact."

files[0] ordering. fs.readdirSync order isn't guaranteed. In practice the artifact holds the single execution_file, so this is fine — just fragile if the upload path ever expands to multiple files.

Looks good

  • Heredoc with a fixed __DIFF_EOF__ delimiter into $GITHUB_OUTPUT is fine; even a crafted filename only becomes model context, not a writer input.
  • --stat-only in the prompt (vs. full diff) is correct since the agent reviews the checked-out merge ref directly.
  • The robust result-extraction (prefer type === 'result', fall back to last assistant text, fail on empty) and if-no-files-found: error gating are solid.

The core security goal is achieved. The comment pile-up is the one I'd fix before merging.

@github-actions

Copy link
Copy Markdown

🤖 Codex Code Review

Findings

  • High: .github/workflows/claude-code-review.yml:40 and .github/workflows/claude-code-review.yml:127 still leave a prompt-injection-to-secret-exfiltration path. The review job gives the Claude action ANTHROPIC_API_KEY/ANTHROPIC_BASE_URL, then feeds untrusted PR metadata/diff to an agent that can produce arbitrary output; the post job publishes that output verbatim with a write-capable token. So a malicious same-repo PR could try to induce the agent to read/emit environment secrets, and the isolated post job would still publish them. The read-only GitHub token prevents direct GitHub writes, but it does not protect secrets once the model output is automatically posted. Consider disabling shell/env-accessible tools for the review action, running only for trusted authors, or adding a human gate before posting model output from secret-bearing runs.

  • Medium: .github/workflows/claude-code-review.yml:100 appears to parse the Claude execution file as one JSON document. Claude Code execution streams are commonly newline-delimited JSON events, so JSON.parse(fs.readFileSync(...)) will fail as soon as the file contains multiple JSON objects, causing the post job to fail and no review to be posted. If execution_file is JSONL in this action version, parse line-by-line and then search the resulting events for the final type: "result" entry.

@guohai guohai changed the title ci: claude-review posts as github-actions[bot] ci: claude-review as github-actions[bot], branded, injection-hardened Jun 16, 2026
The agent gets ANTHROPIC_API_KEY/BASE_URL in env and its output is posted
verbatim, so a malicious same-repo PR could try to exfiltrate secrets via the
review comment (read-only GitHub token doesn't stop that). Restrict the review
job to OWNER/MEMBER/COLLABORATOR authors — they already have repo-secret access
via branch workflows, so this adds no exposure; fork/untrusted PRs never reach
the secret-bearing agent. post job cascades via needs.

🤖 Built with SMT <smt@agora.build>
@github-actions

Copy link
Copy Markdown

🤖 Claude Code Review

I've reviewed the full diff. This is a well-reasoned security hardening of the review workflow — the split into a read-only review job (holds the API key, no write scope) and a model-free post job (write scope, only handles already-produced text) is a sound privilege-separation design, and the trusted-author gate plus pull_request (not pull_request_target) trigger correctly close the secret-exfil path. The rationale comments are accurate.

A few real issues:

Comment spam — sticky behavior was dropped (maintainability regression)

The old config used use_sticky_comment: true, so repeated runs updated a single comment. The new post job calls github.rest.issues.createComment unconditionally, and the workflow triggers on opened, synchronize, ready_for_review, reopened. Every push to an open PR now posts a brand-new review comment, flooding active PRs.

Recommend restoring sticky behavior: tag the comment body with a hidden marker (e.g. <!-- claude-code-review -->), list existing comments, and updateComment if one is found, else createComment. This keeps the "branded" goal while avoiding the spam.

files[0] relies on undefined ordering

const files = fs.readdirSync(dir);
...
JSON.parse(fs.readFileSync(`${dir}/${files[0]}`, 'utf8'));

readdirSync order is not guaranteed, and the artifact could contain more than the one execution file. Today it's likely a single file so it works, but it's fragile. Prefer filtering to .json files (e.g. files.find(f => f.endsWith('.json'))) rather than blindly taking index 0.

Minor

  • post job has no explicit if. It's safe today because needs: review makes it skip when the gated review job is skipped/fails — but the security depends entirely on that implicit propagation. Worth a comment noting it, since someone later adding if: always() or a second needs entry could silently re-open the path.
  • fetch-depth: 0 does a full-history clone just to produce a --stat. Functionally fine, but a shallow fetch of base + head would be cheaper on large repos. Low priority.
  • The injected ${{ github.event.pull_request.title }} lands in a with: prompt: value (not a run: shell), so there's no shell-injection vector, and it's explicitly fenced as untrusted — good. The residual model-prompt-injection risk is acceptably bounded by the trusted-author gate, as the comment claims.

Net: the security model is correct and is the right call. The comment-spam regression is the one thing I'd fix before merging; the rest are robustness nits.

@github-actions

Copy link
Copy Markdown

🤖 Codex Code Review

Findings

  • .github/workflows/claude-code-review.yml:18 — The trusted-author gate does not exclude fork PRs. MEMBER/COLLABORATOR fork PRs can still satisfy this if, but pull_request workflows from forks do not receive ANTHROPIC_API_KEY, so the review job will run and likely fail rather than skip. If the intent is “only same-repo trusted branches,” add a head-repo check such as github.event.pull_request.head.repo.full_name == github.repository.

  • .github/workflows/claude-code-review.yml:39 — The diff stat is computed against the current origin/$BASE_REF, not the PR event’s base SHA. If the base branch advances after the PR event/merge ref is created, Claude receives a changed-files summary for a different comparison than the checked-out merge ref. Fetch/diff against github.event.pull_request.base.sha to keep the context stable.

No other security or credential-exposure issues stood out in the workflow split; the read-only model job plus separate posting job is a solid improvement.

@guohai guohai merged commit 2d965d9 into main Jun 16, 2026
3 checks passed
@guohai guohai deleted the ci/claude-review-as-github-actions branch June 16, 2026 22:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant