Skip to content

fix: cap adversarial review prompt size to stay under Codex API 1MB input limit#313

Open
cardene777 wants to merge 1 commit into
openai:mainfrom
cardene777:feature/buildAdversarialReviewPrompt-1mb-cap
Open

fix: cap adversarial review prompt size to stay under Codex API 1MB input limit#313
cardene777 wants to merge 1 commit into
openai:mainfrom
cardene777:feature/buildAdversarialReviewPrompt-1mb-cap

Conversation

@cardene777
Copy link
Copy Markdown

Summary

/codex:adversarial-review --wait (and therefore Skill('parallel-review', ...) flows that wrap it) fails on heavy PRs with:

Input exceeds the maximum length of 1048576 characters.

even after PR #179 capped the embedded diff at 256KB. PR #179 fixed the Node spawnSync ENOBUFS path, but the Codex API thread input is hard-limited to 1,048,576 characters, and template + 256KB diff + collection guidance + focus text can still cross that line in practice — especially when the diff contains multi-byte UTF-8.

This PR adds a small second-tier budget inside buildAdversarialReviewPrompt so the prompt as a whole always fits the API window.

Repro

Reported downstream as cardene777/claude-config#1467 and triggered on AlphaTrader heavy-tier PR +540/-31. Observed exit:

[codex] Starting Codex task thread.
[codex] Thread ready (...).
Input exceeds the maximum length of 1048576 characters.

focus_text short-circuit (200 → 100 chars) did not change the outcome, which is what pointed at the assembled prompt rather than the focus string.

Change

Inside plugins/codex/scripts/codex-companion.mjs:

  • Add two budgets:
    • MAX_ADVERSARIAL_PROMPT_BYTES = 850 * 1024
    • MAX_ADVERSARIAL_PROMPT_CHARS = 900 * 1024
      Both checked together so emoji-heavy diffs are clamped by chars and long ASCII diffs are clamped by bytes. Both stay safely under the 1,048,576-char API cap so the system-prompt / thread metadata that the API stitches on every turn has headroom.
  • Refactor buildAdversarialReviewPrompt into a three-step fallback chain:
    1. render the full prompt, return it if it already fits;
    2. otherwise binary-search the largest prefix of REVIEW_INPUT that fits, append a [truncated: REVIEW_INPUT was trimmed by N bytes ...] notice, and re-render;
    3. if no useful prefix survives, drop the inline diff entirely and switch REVIEW_COLLECTION_GUIDANCE to the existing lightweight self-collect language so the reviewer is told to use read-only git commands.
  • USER_FOCUS is never dropped — it is small and high-signal.
  • Export buildAdversarialReviewPrompt, MAX_ADVERSARIAL_PROMPT_BYTES, and MAX_ADVERSARIAL_PROMPT_CHARS so they are unit-testable.
  • Guard the bottom-of-file main() invocation behind isDirectInvocation() so importing the module from tests no longer runs the CLI entry point. This is the minimal side-effect cleanup that the export change made necessary.

Tests

New tests/adversarial-prompt-cap.test.mjs (7 cases, all green locally):

  • small input passthrough — full content retained
  • ~900KB ASCII content — prompt truncated under both budgets
  • truncated prompt — contains truncation marker or self-collect guidance
  • 5MB unfittable input — falls back to self-collect mode
  • multi-byte UTF-8 (250k poop emojis) — both byte and char budgets respected
  • truncation preserves USER_FOCUS
  • sanity: byte/char budgets stay under the 1,048,576-char API cap

Full npm test result on this branch: 89 passed / 4 failed. The 4 failures are pre-existing and reproduce on a clean main checkout (resolveStateDir uses a temp-backed per-workspace directory, result returns the stored output for the latest finished job by default, and two status runtime tests that depend on a live Codex job). They are unrelated to this change.

Compatibility

  • Default behavior unchanged for prompts that already fit (still returns the full interpolated template).
  • No new public CLI flags.
  • No change to the prompts/adversarial-review.md template wording. Only the chosen REVIEW_INPUT and REVIEW_COLLECTION_GUIDANCE values can change when a fallback fires.

Non-goals

  • Re-tuning DEFAULT_INLINE_DIFF_MAX_BYTES (still 256KB; this PR layers an upper-bound budget on top, the two are orthogonal).
  • Reworking the generator-verifier / codex-review paths — they were never affected.
  • Token-level budget accounting — bytes + chars is enough to keep us under the documented API limit and matches how the API error message itself is phrased.

Smoke (pending downstream)

End-to-end smoke on the original heavy-tier repro PR requires a Claude Code restart so the updated plugin gets loaded; that will be run in a follow-up session and reported on cardene777/claude-config#1467. The unit test coverage above exercises every fallback branch directly without needing a live API.

Refs: cardene777/claude-config#1467

…nput limit

The `adversarial-review --wait` command fails on heavy PRs with
`Input exceeds the maximum length of 1048576 characters.` even after PR
openai#179 capped the embedded diff to 256KB. The Codex API thread input is
hard-limited to 1,048,576 characters, and the template plus 256KB diff
plus collection guidance plus focus text can still cross that line in
practice, especially when the diff contains multi-byte UTF-8.

Add `MAX_ADVERSARIAL_PROMPT_BYTES = 850 * 1024` and
`MAX_ADVERSARIAL_PROMPT_CHARS = 900 * 1024` budgets and a small fallback
chain inside `buildAdversarialReviewPrompt`:

1. render the full prompt and return it if it already fits both budgets;
2. otherwise binary-search the largest prefix of `REVIEW_INPUT` that fits,
   append a truncation notice, and re-render;
3. if no useful prefix survives, drop the inline diff entirely and switch
   the collection guidance to the existing lightweight self-collect path.

Both budgets are checked together, so a prompt of mostly multi-byte
content (e.g. emoji-heavy diffs) is constrained by chars while a long
ASCII diff is constrained by bytes. `USER_FOCUS` is never dropped, since
it is small and high-signal.

The helper is now exported alongside the budget constants so it can be
unit-tested directly. The bottom-of-file `main()` call is guarded behind
`isDirectInvocation()` so importing the module from tests no longer
runs the CLI entry point.

New tests in `tests/adversarial-prompt-cap.test.mjs` cover: small input
passthrough, ~900KB ASCII content truncation, truncation marker presence,
unfittable (5MB) input falling back to self-collect mode, multi-byte
content respecting both budgets, `USER_FOCUS` preservation, and a sanity
check that the byte/char budgets stay under the 1,048,576-char API cap.

Refs: cardene777/claude-config#1467
@cardene777 cardene777 requested a review from a team May 11, 2026 12:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant