Skip to content

Add Codex plugin support for planman#1

Open
RusDyn wants to merge 1 commit into
mainfrom
codex-cross-agent-planman
Open

Add Codex plugin support for planman#1
RusDyn wants to merge 1 commit into
mainfrom
codex-cross-agent-planman

Conversation

@RusDyn

@RusDyn RusDyn commented Jun 2, 2026

Copy link
Copy Markdown
Owner

Summary

  • add Codex plugin metadata, marketplace entry, packaged runtime files, and Stop hook adapter
  • route plan evaluation across agents with configurable Codex/Claude evaluators
  • update docs, commands, config loading, and tests for cross-agent support
  • add pytest config so plain pytest only collects project tests

Tests

  • pytest -q

Summary by CodeRabbit

  • New Features

    • Extended plan evaluation to work across both Claude Code and Codex environments with automatic evaluator detection
    • Added support for project-root configuration file (.planman.jsonc) with environment variable overrides
    • Introduced initialization command for guided configuration setup
  • Documentation

    • Updated guides to cover workflows for both Claude Code and Codex
    • Added installation and troubleshooting documentation for cross-platform evaluation
    • Clarified configuration options and command syntax for each environment
  • Tests

    • Added comprehensive test coverage for cross-environment evaluation functionality

@coderabbitai

coderabbitai Bot commented Jun 2, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

This PR extends Planman with cross-agent plan evaluation: Codex sessions evaluate Claude plans and vice versa. It adds a Codex plugin with Stop hooks, implements a unified evaluator engine that invokes either Codex or Claude CLI, manages session state across rounds, updates all existing hooks to support the new dispatch model, and documents the dual-agent setup with comprehensive test coverage.

Changes

Dual-Agent Evaluation Architecture

Layer / File(s) Summary
Codex Plugin Configuration & Marketplace
.codex-plugin/plugin.json, hooks.json, plugins/planman/schemas/evaluation.json, .agents/plugins/marketplace.json, plugins/planman/.codex-plugin/plugin.json, plugins/planman/hooks.json
Register Codex plugin with display name, capabilities, default prompts; define Stop hook to invoke plan evaluation; add evaluation JSON schema with score/breakdown/feedback structure; register marketplace entry pointing to packaged plugin.
Configuration System: Evaluator Selection & File Paths
scripts/config.py, plugins/planman/scripts/config.py
Add evaluator, codex_bin, claude_bin config fields; support .planman.jsonc (primary) and .claude/planman.jsonc (legacy) file discovery; extend PLANMAN_* env overrides; normalize and validate evaluator mode (auto/codex/claude).
Evaluator Engine: Codex & Claude Dispatch
scripts/evaluator.py, plugins/planman/scripts/evaluator.py
Implement provider detection (detect_host, resolve_evaluator), dual-provider plan evaluation (evaluate_plan_codex, evaluate_plan_claude), shared prompt building with provider-specific source-verification hints, schema-based JSON output validation, subprocess environment injection, and CLI installation checks with caching.
Session State & Evaluation Utilities
scripts/hook_utils.py, plugins/planman/scripts/hook_utils.py, plugins/planman/scripts/state.py, plugins/planman/scripts/path_utils.py
Add session state persistence (plan hash, rounds, feedback history); implement plan file discovery with session markers and TTL; build evaluation feedback/system messages with score trends and first-round vs subsequent-round messaging; implement multi-round evaluation control flow with max-round auto-approval and stress-test early rejection.
Codex Stop Hook: Plan Detection & Evaluation
scripts/codex_stop_hook.py, plugins/planman/scripts/codex_stop_hook.py
Detect plan-like text via nested JSON traversal and regex scoring; fail open (allow) when no plan found, evaluator not installed, or _PLANMAN_EVALUATOR bypass set; route evaluation results to JSON block/allow decision with optional feedback.
Existing Hooks: Evaluator Support & Host Dispatch
scripts/hook_utils.py, scripts/pre_ask_hook.py, scripts/pre_exec_gate_hook.py, scripts/pre_exit_plan_hook.py, scripts/post_tool_hook.py, scripts/post_exit_plan_hook.py
Add _PLANMAN_EVALUATOR early-exit guard to all hooks; replace Codex-specific checks with unified check_evaluator_installed(); forward host=claude parameter to run_evaluation(); update hook imports.
Documentation & User Commands
README.md, commands/help.md, commands/init.md, commands/status.md, commands/clear.md, .gitignore
Document dual Claude/Codex setup; introduce .planman.jsonc as primary config file; clarify Claude Code slash-command namespace vs Codex Stop-hook behavior; update command examples and init template; expand .gitignore for benchmark/log/test artifacts.
Test Coverage: Config, Evaluator, Hooks, Plugin Packaging
tests/test_config.py, tests/test_evaluator.py, tests/test_codex_stop_hook.py, tests/test_codex_plugin_package.py, pytest.ini
Add config tests for evaluator/bin field loading, env overrides, and file precedence; add evaluator routing and host-detection tests; add Claude CLI tests; add Codex stop-hook plan extraction and decision output tests; add plugin packaging verification tests; configure pytest discovery.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 Whiskers twitch with glee
A plan walks into a room (twice!),
Codex and Claude review the view,
Cross-agent harmony, states persist through,
Evaluation rounds grow wise and true! 🌟

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 52.71% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title 'Add Codex plugin support for planman' clearly summarizes the primary change: introducing Codex plugin support to the planman project for cross-agent plan evaluation.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch codex-cross-agent-planman

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
scripts/hook_utils.py (1)

441-444: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Inconsistent exception handling for save_state calls.

Same issue as in the plugin copy - lines 443, 465, 476, 497 catch only OSError while other save_state calls catch (OSError, ValueError). Update all to catch both exceptions for consistency.

Also applies to: 463-466, 474-477, 495-498

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/hook_utils.py` around lines 441 - 444, The save_state calls currently
catch only OSError in the try/except blocks (e.g., the block invoking
save_state(state) and logging via log(..., config, cwd)); update these exception
handlers to catch both OSError and ValueError (except (OSError, ValueError)) for
consistency with other save_state usages, and ensure the same change is applied
to the other save_state locations mentioned (the blocks around the calls that
log failures with log(..., config, cwd) at the other occurrences).
scripts/evaluator.py (1)

354-355: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Score validation rejects 0 but breakdown sum can be 0.

Same issue as the plugin copy - if allowing score: 0 in the schema, update this validation to match.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/evaluator.py` around lines 354 - 355, The validation for the variable
score currently rejects 0; update the check so scores in the allowed schema
range (0–10) pass: modify the conditional that references score (the block
returning "invalid score: {score} (must be integer 1-10)") to allow 0 as a valid
integer and update the error message to reflect "integer 0-10"; ensure the same
variable/condition in scripts/evaluator.py that performs isinstance(score, int)
and range checks is changed accordingly.
commands/status.md (1)

18-50: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add error handling for missing plugin.json.

The one-liner assumes that either .claude-plugin/plugin.json or .codex-plugin/plugin.json exists. If neither file is present (e.g., when run from a non-plugin directory), line 25's open(plugin_json) will raise a FileNotFoundError.

Consider documenting the assumption that this command must be run from a plugin root, or wrap the file access in a try/except block.

🛡️ Proposed fix to add error handling
   python3 -c "
   import json, sys, os
   root = os.environ.get('CLAUDE_PLUGIN_ROOT') or os.environ.get('CODEX_PLUGIN_ROOT') or os.getcwd()
   sys.path.insert(0, os.path.join(root, 'scripts'))
   plugin_json = os.path.join(root, '.claude-plugin', 'plugin.json')
   if not os.path.isfile(plugin_json):
       plugin_json = os.path.join(root, '.codex-plugin', 'plugin.json')
+  if not os.path.isfile(plugin_json):
+      print('Error: plugin.json not found. Run this command from the plugin root.', file=sys.stderr)
+      sys.exit(1)
   with open(plugin_json) as f:
       ver = json.load(f)['version']
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@commands/status.md` around lines 18 - 50, The one-liner assumes plugin_json
exists and will raise FileNotFoundError at open(plugin_json); modify the script
that builds plugin_json (the plugin_json variable and the open(plugin_json)
call) to check existence and handle the missing-file case: after probing
.claude-plugin and .codex-plugin, if neither exists, print a clear message (or
raise a user-friendly error) and exit non-zero, or wrap open(plugin_json) in
try/except to catch FileNotFoundError and handle it (log the error and exit);
keep references to plugin_json and the open(plugin_json) operation so the guard
or try/except is added immediately before reading the file.
🧹 Nitpick comments (3)
plugins/planman/scripts/evaluator.py (2)

187-187: 💤 Low value

Remove extraneous f prefix from string literal.

This f-string has no placeholders.

Proposed fix
-        return None, f"codex CLI not found. Install: npm install -g `@openai/codex`"
+        return None, "codex CLI not found. Install: npm install -g `@openai/codex`"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@plugins/planman/scripts/evaluator.py` at line 187, The return statement that
currently reads a prefixed f-string (return None, f"codex CLI not found.
Install: npm install -g `@openai/codex`") should remove the unnecessary f prefix
because there are no interpolations; update the return in the evaluator.py
function where that check occurs to use a plain string literal ("codex CLI not
found. Install: npm install -g `@openai/codex`") so the value is returned
correctly without the extraneous f.

331-331: 💤 Low value

Remove extraneous f prefix from string literal.

Proposed fix
-        return None, f"codex returned malformed output. Set PLANMAN_VERBOSE=true for details."
+        return None, "codex returned malformed output. Set PLANMAN_VERBOSE=true for details."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@plugins/planman/scripts/evaluator.py` at line 331, The return statement
currently uses an unnecessary f-string prefix: change the tuple return from
return None, f"codex returned malformed output. Set PLANMAN_VERBOSE=true for
details." to use a plain string (remove the leading f) so it becomes return
None, "codex returned malformed output. Set PLANMAN_VERBOSE=true for details.";
locate the return in plugins/planman/scripts/evaluator.py (the malformed output
handling branch) and update that return expression accordingly.
scripts/evaluator.py (1)

187-187: 💤 Low value

Remove extraneous f prefixes from string literals (lines 187, 331).

Same issue as in the plugin copy - these f-strings have no placeholders.

Also applies to: 331-331

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/evaluator.py` at line 187, Remove the unnecessary f-string prefixes
on plain string literals in scripts/evaluator.py: replace the return that
currently uses f"codex CLI not found. Install: npm install -g `@openai/codex`"
with a normal string literal, and likewise remove the f prefix from the other
plain string at the second occurrence (the one flagged around line 331). Search
for the exact f"codex CLI not found..." text and any other f-strings in
evaluator.py that contain no {} placeholders and change them to regular string
literals.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@commands/help.md`:
- Line 41: Update the help text row for the `threshold` option
(`PLANMAN_THRESHOLD`) to reflect the actual valid range `0-10` (not `1-10`) and
note that `0` means "pass all" so it matches README/config semantics; locate the
table row containing `threshold | PLANMAN_THRESHOLD | 7 | Minimum score (1-10)
to pass` and change the range text and/or description accordingly.

In `@plugins/planman/schemas/evaluation.json`:
- Around line 4-8: The schema for the evaluation object has a mismatch: the
"score" property is declared with "minimum": 1 while parse_codex_output enforces
score == sum(breakdown) and each breakdown entry allows 0, so the sum (and
therefore score) can be 0; update the "score" schema to allow 0 by changing its
minimum to 0 (and optionally adjust its description to reflect that 0 is
permitted), and ensure parse_codex_output validation remains consistent with the
"score" property and the breakdown fields.

In `@plugins/planman/scripts/evaluator.py`:
- Around line 354-355: The score validation currently rejects 0 (if block
checking "if not isinstance(score, int) or score < 1 or score > 10") but the
schema may allow score == 0; update the validation in evaluator.py (the score
check around the evaluator function that processes `score`/returns "invalid
score" message) to accept integers in the range 0–10 instead of 1–10 by allowing
score >= 0, and preserve the existing error message text/format to reflect the
new valid range.

In `@plugins/planman/scripts/hook_utils.py`:
- Around line 463-466: The exception handlers around save_state calls currently
only catch OSError; update each of those except blocks that call save_state (the
ones using save_state(state) and calling log(..., config, cwd)) to catch both
OSError and ValueError instead (use except (OSError, ValueError) as e) and pass
the exception into log as before to keep behavior consistent with the earlier
save_state handlers (see other occurrences of save_state and log in this module
for reference).
- Around line 441-444: The catch around save_state currently only handles
OSError but must also catch ValueError (because save_state uses json.dump(...,
allow_nan=False) which can raise ValueError); update the try/except that wraps
save_state (the one calling save_state(state) near the end of the file) to
except (OSError, ValueError) and log the error the same way as the other
save_state callers (consistent with the handlers at the save_state calls around
the save_state usages at the other locations).

In `@README.md`:
- Around line 233-235: The fenced code block containing the command "/plugin
uninstall planman@planman" in README.md lacks a language tag; add a language
identifier (e.g., "bash") immediately after the opening triple backticks so the
block becomes fenced as a bash snippet to satisfy markdownlint MD040 and improve
highlighting for the command.
- Line 65: Update the README sentence so the customization step is
host-specific: mention that users of hosts with the /planman:init command (e.g.,
Claude Code) can run /planman:init, while Codex users should instead configure
via .planman.jsonc or with environment variables prefixed PLANMAN_*; ensure both
options are presented clearly and the wording replaces the current single
recommendation to run /planman:init.

In `@scripts/codex_stop_hook.py`:
- Around line 23-25: The current _PLAN_RE is too permissive (it includes
patterns like ^\*\*summary\*\* and a generic numbered-list ^\s*1\.) and is
matching ordinary markdown; tighten it by restricting matches to explicit plan
headings or nearby keywords: remove or require a colon for the bold summary
token (e.g., match /^\*\*summary:\*\*/i), and narrow the numbered-list pattern
so it only triggers when the word "plan" (or "proposed plan"/"implementation
plan") appears in the same heading/preceding text (e.g., require "plan" within
the same line or via a lookbehind/preceding-group before ^\s*1\.). Update the
_PLAN_RE definition accordingly and apply the same tightened regex to the other
places in this file where the plan detector is used (the other usages of
_PLAN_RE in this module).

In `@tests/test_codex_plugin_package.py`:
- Around line 19-21: Add a bounds check before indexing into
marketplace["plugins"] to avoid IndexError: assert that "plugins" exists and is
a non-empty list (e.g., use self.assertIn("plugins", marketplace) and
self.assertGreater(len(marketplace["plugins"]), 0) or
self.assertTrue(marketplace["plugins"])) before referencing
marketplace["plugins"][0]; then proceed to validate plugin["name"] and
plugin["source"]["path"] as currently done.

---

Outside diff comments:
In `@commands/status.md`:
- Around line 18-50: The one-liner assumes plugin_json exists and will raise
FileNotFoundError at open(plugin_json); modify the script that builds
plugin_json (the plugin_json variable and the open(plugin_json) call) to check
existence and handle the missing-file case: after probing .claude-plugin and
.codex-plugin, if neither exists, print a clear message (or raise a
user-friendly error) and exit non-zero, or wrap open(plugin_json) in try/except
to catch FileNotFoundError and handle it (log the error and exit); keep
references to plugin_json and the open(plugin_json) operation so the guard or
try/except is added immediately before reading the file.

In `@scripts/evaluator.py`:
- Around line 354-355: The validation for the variable score currently rejects
0; update the check so scores in the allowed schema range (0–10) pass: modify
the conditional that references score (the block returning "invalid score:
{score} (must be integer 1-10)") to allow 0 as a valid integer and update the
error message to reflect "integer 0-10"; ensure the same variable/condition in
scripts/evaluator.py that performs isinstance(score, int) and range checks is
changed accordingly.

In `@scripts/hook_utils.py`:
- Around line 441-444: The save_state calls currently catch only OSError in the
try/except blocks (e.g., the block invoking save_state(state) and logging via
log(..., config, cwd)); update these exception handlers to catch both OSError
and ValueError (except (OSError, ValueError)) for consistency with other
save_state usages, and ensure the same change is applied to the other save_state
locations mentioned (the blocks around the calls that log failures with log(...,
config, cwd) at the other occurrences).

---

Nitpick comments:
In `@plugins/planman/scripts/evaluator.py`:
- Line 187: The return statement that currently reads a prefixed f-string
(return None, f"codex CLI not found. Install: npm install -g `@openai/codex`")
should remove the unnecessary f prefix because there are no interpolations;
update the return in the evaluator.py function where that check occurs to use a
plain string literal ("codex CLI not found. Install: npm install -g
`@openai/codex`") so the value is returned correctly without the extraneous f.
- Line 331: The return statement currently uses an unnecessary f-string prefix:
change the tuple return from return None, f"codex returned malformed output. Set
PLANMAN_VERBOSE=true for details." to use a plain string (remove the leading f)
so it becomes return None, "codex returned malformed output. Set
PLANMAN_VERBOSE=true for details."; locate the return in
plugins/planman/scripts/evaluator.py (the malformed output handling branch) and
update that return expression accordingly.

In `@scripts/evaluator.py`:
- Line 187: Remove the unnecessary f-string prefixes on plain string literals in
scripts/evaluator.py: replace the return that currently uses f"codex CLI not
found. Install: npm install -g `@openai/codex`" with a normal string literal, and
likewise remove the f prefix from the other plain string at the second
occurrence (the one flagged around line 331). Search for the exact f"codex CLI
not found..." text and any other f-strings in evaluator.py that contain no {}
placeholders and change them to regular string literals.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: b1b8548f-a6d8-468f-b5a6-fd9b6d301db6

📥 Commits

Reviewing files that changed from the base of the PR and between 6eb230b and 7534e1f.

📒 Files selected for processing (34)
  • .agents/plugins/marketplace.json
  • .claude-plugin/marketplace.json
  • .claude-plugin/plugin.json
  • .codex-plugin/plugin.json
  • .gitignore
  • README.md
  • commands/clear.md
  • commands/help.md
  • commands/init.md
  • commands/status.md
  • hooks.json
  • plugins/planman/.codex-plugin/plugin.json
  • plugins/planman/hooks.json
  • plugins/planman/schemas/evaluation.json
  • plugins/planman/scripts/codex_stop_hook.py
  • plugins/planman/scripts/config.py
  • plugins/planman/scripts/evaluator.py
  • plugins/planman/scripts/hook_utils.py
  • plugins/planman/scripts/path_utils.py
  • plugins/planman/scripts/state.py
  • pytest.ini
  • scripts/codex_stop_hook.py
  • scripts/config.py
  • scripts/evaluator.py
  • scripts/hook_utils.py
  • scripts/post_exit_plan_hook.py
  • scripts/post_tool_hook.py
  • scripts/pre_ask_hook.py
  • scripts/pre_exec_gate_hook.py
  • scripts/pre_exit_plan_hook.py
  • tests/test_codex_plugin_package.py
  • tests/test_codex_stop_hook.py
  • tests/test_config.py
  • tests/test_evaluator.py

Comment thread commands/help.md

| Setting | Env Var | Default | Description |
|---------|---------|---------|-------------|
| `threshold` | `PLANMAN_THRESHOLD` | `7` | Minimum score (1-10) to pass |

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Align the threshold range with the README/config semantics.

This says 1-10, but the README documents 0 as valid and meaning "pass all". Please make the help text match the actual behavior so users don't get conflicting setup guidance.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@commands/help.md` at line 41, Update the help text row for the `threshold`
option (`PLANMAN_THRESHOLD`) to reflect the actual valid range `0-10` (not
`1-10`) and note that `0` means "pass all" so it matches README/config
semantics; locate the table row containing `threshold | PLANMAN_THRESHOLD | 7 |
Minimum score (1-10) to pass` and change the range text and/or description
accordingly.

Comment on lines +4 to +8
"score": {
"type": "integer",
"minimum": 1,
"maximum": 10,
"description": "Overall plan quality score (1-10). Sum of breakdown categories."

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Schema mismatch: score minimum of 1 conflicts with breakdown sum validation.

The schema defines score with minimum: 1, but the description says it's the "Sum of breakdown categories." Since each breakdown field has minimum: 0, the sum can be 0. The parse_codex_output function validates that score == breakdown sum, creating an impossible state when all breakdown values are 0.

Either allow score: 0 or document that at least one breakdown category must be non-zero.

Proposed fix: Allow score of 0
     "score": {
       "type": "integer",
-      "minimum": 1,
+      "minimum": 0,
       "maximum": 10,
       "description": "Overall plan quality score (1-10). Sum of breakdown categories."
     },
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"score": {
"type": "integer",
"minimum": 1,
"maximum": 10,
"description": "Overall plan quality score (1-10). Sum of breakdown categories."
"score": {
"type": "integer",
"minimum": 0,
"maximum": 10,
"description": "Overall plan quality score (1-10). Sum of breakdown categories."
},
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@plugins/planman/schemas/evaluation.json` around lines 4 - 8, The schema for
the evaluation object has a mismatch: the "score" property is declared with
"minimum": 1 while parse_codex_output enforces score == sum(breakdown) and each
breakdown entry allows 0, so the sum (and therefore score) can be 0; update the
"score" schema to allow 0 by changing its minimum to 0 (and optionally adjust
its description to reflect that 0 is permitted), and ensure parse_codex_output
validation remains consistent with the "score" property and the breakdown
fields.

Comment on lines +354 to +355
if not isinstance(score, int) or score < 1 or score > 10:
return None, f"invalid score: {score} (must be integer 1-10)"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Score validation rejects 0 but breakdown sum can be 0.

If the schema is updated to allow score: 0, this validation also needs updating to allow score >= 0.

Proposed fix (if schema allows 0)
     score = data.get("score")
-    if not isinstance(score, int) or score < 1 or score > 10:
-        return None, f"invalid score: {score} (must be integer 1-10)"
+    if not isinstance(score, int) or score < 0 or score > 10:
+        return None, f"invalid score: {score} (must be integer 0-10)"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if not isinstance(score, int) or score < 1 or score > 10:
return None, f"invalid score: {score} (must be integer 1-10)"
if not isinstance(score, int) or score < 0 or score > 10:
return None, f"invalid score: {score} (must be integer 0-10)"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@plugins/planman/scripts/evaluator.py` around lines 354 - 355, The score
validation currently rejects 0 (if block checking "if not isinstance(score, int)
or score < 1 or score > 10") but the schema may allow score == 0; update the
validation in evaluator.py (the score check around the evaluator function that
processes `score`/returns "invalid score" message) to accept integers in the
range 0–10 instead of 1–10 by allowing score >= 0, and preserve the existing
error message text/format to reflect the new valid range.

Comment on lines +441 to +444
try:
save_state(state)
except OSError as e:
log(f"failed to save state: {e}", config, cwd)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Inconsistent exception handling for save_state.

Line 443 catches only OSError, but other save_state calls (e.g., lines 353, 376, 403) catch (OSError, ValueError). Since save_state uses json.dump(..., allow_nan=False) which can raise ValueError, this should be consistent.

Proposed fix
         try:
             save_state(state)
-        except OSError as e:
+        except (OSError, ValueError) as e:
             log(f"failed to save state: {e}", config, cwd)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
try:
save_state(state)
except OSError as e:
log(f"failed to save state: {e}", config, cwd)
try:
save_state(state)
except (OSError, ValueError) as e:
log(f"failed to save state: {e}", config, cwd)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@plugins/planman/scripts/hook_utils.py` around lines 441 - 444, The catch
around save_state currently only handles OSError but must also catch ValueError
(because save_state uses json.dump(..., allow_nan=False) which can raise
ValueError); update the try/except that wraps save_state (the one calling
save_state(state) near the end of the file) to except (OSError, ValueError) and
log the error the same way as the other save_state callers (consistent with the
handlers at the save_state calls around the save_state usages at the other
locations).

Comment on lines +463 to +466
try:
save_state(state)
except OSError as e:
log(f"failed to save state: {e}", config, cwd)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Same inconsistent exception handling pattern for other save_state calls.

Lines 463, 475, and 497 also catch only OSError. Apply the same fix to catch (OSError, ValueError) for consistency with lines 353, 376, 403.

Also applies to: 474-477, 495-498

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@plugins/planman/scripts/hook_utils.py` around lines 463 - 466, The exception
handlers around save_state calls currently only catch OSError; update each of
those except blocks that call save_state (the ones using save_state(state) and
calling log(..., config, cwd)) to catch both OSError and ValueError instead (use
except (OSError, ValueError) as e) and pass the exception into log as before to
keep behavior consistent with the earlier save_state handlers (see other
occurrences of save_state and log in this module for reference).

Comment thread README.md
5. Restart the host agent

That's it. The next time Claude exits plan mode, planman evaluates the plan and blocks with feedback if the score is below threshold (default 7/10). Run `/planman:init` to customize settings.
That's it. The next time the host agent presents a detected plan, planman evaluates it and blocks with feedback if the score is below threshold (default 7/10). Run `/planman:init` to customize settings.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Make the quick-start customization step host-specific.

After presenting both Claude Code and Codex install paths, this line sends everyone to /planman:init, but Codex users do not have that command. Point Codex users here to .planman.jsonc or PLANMAN_* instead.

Suggested wording
-That's it. The next time the host agent presents a detected plan, planman evaluates it and blocks with feedback if the score is below threshold (default 7/10). Run `/planman:init` to customize settings.
+That's it. The next time the host agent presents a detected plan, planman evaluates it and blocks with feedback if the score is below threshold (default 7/10). In Claude Code, run `/planman:init` to customize settings. In Codex, create `.planman.jsonc` or set `PLANMAN_*` environment variables.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
That's it. The next time the host agent presents a detected plan, planman evaluates it and blocks with feedback if the score is below threshold (default 7/10). Run `/planman:init` to customize settings.
That's it. The next time the host agent presents a detected plan, planman evaluates it and blocks with feedback if the score is below threshold (default 7/10). In Claude Code, run `/planman:init` to customize settings. In Codex, create `.planman.jsonc` or set `PLANMAN_*` environment variables.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@README.md` at line 65, Update the README sentence so the customization step
is host-specific: mention that users of hosts with the /planman:init command
(e.g., Claude Code) can run /planman:init, while Codex users should instead
configure via .planman.jsonc or with environment variables prefixed PLANMAN_*;
ensure both options are presented clearly and the wording replaces the current
single recommendation to run /planman:init.

Comment thread README.md
Comment on lines 233 to 235
```
/plugin uninstall planman@planman
```

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add a language tag to this fenced block.

This trips markdownlint (MD040) as written; bash would be enough here.

🧰 Tools
🪛 markdownlint-cli2 (0.22.1)

[warning] 233-233: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@README.md` around lines 233 - 235, The fenced code block containing the
command "/plugin uninstall planman@planman" in README.md lacks a language tag;
add a language identifier (e.g., "bash") immediately after the opening triple
backticks so the block becomes fenced as a bash snippet to satisfy markdownlint
MD040 and improve highlighting for the command.

Comment on lines +23 to +25
_PLAN_RE = re.compile(
r"(?ims)(<proposed_plan>|implementation plan|^# .{0,80}plan\b|^\*\*summary\*\*|^\s*1\.\s+.+\n\s*2\.)"
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Tighten the Codex plan detector before it starts gating normal replies.

^\*\*summary\*\* and the generic numbered-list pattern are broad enough to treat many ordinary markdown answers as plans. Because this hook runs on every Stop, that can trigger unexpected evaluations and blocks on non-plan turns.

Suggested narrowing
-_PLAN_RE = re.compile(
-    r"(?ims)(<proposed_plan>|implementation plan|^# .{0,80}plan\b|^\*\*summary\*\*|^\s*1\.\s+.+\n\s*2\.)"
-)
+_PLAN_RE = re.compile(
+    r"(?ims)(<proposed_plan>|implementation plan|^# .{0,80}plan\b)"
+)
+_NUMBERED_STEPS_RE = re.compile(r"(?m)^\s*1\.\s+.+\n\s*2\.\s+")

 ...
-        if _PLAN_RE.search(stripped):
+        has_plan_marker = bool(_PLAN_RE.search(stripped))
+        if has_plan_marker:
             score += 4
-        if score >= 4:
+        if has_plan_marker and _NUMBERED_STEPS_RE.search(stripped):
+            score += 1
+        if has_plan_marker and score >= 4:
             candidates.append((score, order, stripped))

Also applies to: 52-58

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/codex_stop_hook.py` around lines 23 - 25, The current _PLAN_RE is too
permissive (it includes patterns like ^\*\*summary\*\* and a generic
numbered-list ^\s*1\.) and is matching ordinary markdown; tighten it by
restricting matches to explicit plan headings or nearby keywords: remove or
require a colon for the bold summary token (e.g., match /^\*\*summary:\*\*/i),
and narrow the numbered-list pattern so it only triggers when the word "plan"
(or "proposed plan"/"implementation plan") appears in the same heading/preceding
text (e.g., require "plan" within the same line or via a
lookbehind/preceding-group before ^\s*1\.). Update the _PLAN_RE definition
accordingly and apply the same tightened regex to the other places in this file
where the plan detector is used (the other usages of _PLAN_RE in this module).

Comment on lines +19 to +21
plugin = marketplace["plugins"][0]
self.assertEqual(plugin["name"], "planman")
self.assertEqual(plugin["source"]["path"], "./plugins/planman")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add bounds check before accessing plugins[0].

If the marketplace contains no plugins, this test will raise IndexError instead of a more informative assertion failure.

🛡️ Proposed fix
     plugin = marketplace["plugins"][0]
+    self.assertGreater(len(marketplace["plugins"]), 0, "marketplace.json contains no plugins")
+    plugin = marketplace["plugins"][0]
     self.assertEqual(plugin["name"], "planman")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
plugin = marketplace["plugins"][0]
self.assertEqual(plugin["name"], "planman")
self.assertEqual(plugin["source"]["path"], "./plugins/planman")
self.assertGreater(len(marketplace["plugins"]), 0, "marketplace.json contains no plugins")
plugin = marketplace["plugins"][0]
self.assertEqual(plugin["name"], "planman")
self.assertEqual(plugin["source"]["path"], "./plugins/planman")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_codex_plugin_package.py` around lines 19 - 21, Add a bounds check
before indexing into marketplace["plugins"] to avoid IndexError: assert that
"plugins" exists and is a non-empty list (e.g., use self.assertIn("plugins",
marketplace) and self.assertGreater(len(marketplace["plugins"]), 0) or
self.assertTrue(marketplace["plugins"])) before referencing
marketplace["plugins"][0]; then proceed to validate plugin["name"] and
plugin["source"]["path"] as currently done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant