OpenCode++ is evolving from "generate files that help an agent read a repo" into a Code Agent Enhancement Layer / Agent Reliability Layer. It does not compete with Codex, Claude Code, Cursor, OpenCode, or MiMoCode. Those tools own code execution. OpenCode++ provides a bounded reliability loop around them: context, boundaries, evidence, impact, regression protection, hallucination checks, gate evaluation, and repair/finalize decision reports.
The roadmap is organized around the harness lifecycle:
Before execution -> During execution -> After execution -> Loop improvementMake existing coding agents safer, more verifiable, and less regression-prone in complex repositories.
The long-term product shape:
User task
-> OpenCode++ Context / Boundary / Regression preparation
-> choose executor: Codex / Claude Code / Cursor / OpenCode / MiMoCode
-> code agent edits code
-> OpenCode++ collects diff / trace / test evidence
-> Guard modules evaluate the run
-> Loop Guard reports finalize / repair / repack / block / human reviewGoal: make OpenCode++ installable and testable by external users without cloning the repository.
This is the nearest milestone. The project already has enough MVP surface to demonstrate the OpenCode sidecar, batch executor, guard reports, benchmark harness, and generated context package. The next unlock is distribution:
npm i -g opencode-plusplus opencode-ai
cd your-repo
opencode-plusplusRelease gate:
npm run checknpm run lintnpm run format:checknpm run docs:cli:checknpm testnpm run benchmarknpm run benchmark:agentnpm run buildnpm run pack:dry-run
CI already runs the same baseline. prepublishOnly also runs the publish gate so the npm package cannot be published without regenerated CLI docs, passing tests, benchmark smoke checks, a build, and a dry-run package inspection.
Status: published on npm; patch releases keep the public install path and package metadata aligned.
Goal: make agents guess less before editing.
- Repository scanner.
- Static file index.
- Symbol and dependency extraction.
- File and module dependency graph.
- Importance ranking.
- Minimal
AGENTS.mdgeneration. - Manual/generated/composed
AGENTS.mdarchitecture. - Task plan and task pack.
- Related tests detection.
- Token savings and actual output token reports.
- Readiness score with dimensions and hard caps.
Status: implemented foundation.
Goal: make edits bounded, reviewable, and verifiable.
- Contracts for architecture, module boundaries, commands, tests, and safety.
opencode-plusplus validate-contracts.opencode-plusplus policy --fail-on forbidden|required|risk.- Execution trace with manual / command / CI evidence.
opencode-plusplus trace runfor command-captured evidence.- Exit code and command evidence recording.
- Test selection for files and diffs.
- Change impact report with direct and transitive dependents.
opencode-plusplus verify --diff.- Freshness / drift / manifest checks.
Status: implemented foundation.
Goal: stop trusting the agent鈥檚 鈥渄one鈥?claim and make the next action explicit.
- Runtime state persisted under
.agent-context/runs/<task-id>/state.json. - Loop decisions with priority, confidence, blocking state, and signals.
- Trace-aware loop controller.
- Stale evidence detection after later edits.
- Repair planner that can request missing tests, contract repair, context refresh, or wider impact analysis.
- Finalize gate through policy and loop reports.
Status: implemented foundation; orchestrate now runs multiple bounded iterations, while richer evidence-driven repair planning remains ongoing.
Goal: make OpenCode++ work as an external control plane for multiple code agents.
AgentExecutorinterface for external coding agents:
export interface AgentExecutor {
name: "opencode" | "mimocode" | "codex" | "claude-code" | "cursor" | "mock";
run(input: { repo: string; task: string; prompt: string; agent?: string; outputDir: string; env?: Record<string, string> }): Promise<{
exitCode: number;
eventsPath?: string;
finalText?: string;
changedFiles: string[];
diffPath: string;
}>;
}opencode-plusplus agent run "<task>" . --executor opencodeopencode-plusplus agent run "<task>" . --executor mimocode- Mock executor for CI and deterministic tests.
- Generic
--executor-commandadapter for Codex, Claude Code, Cursor, OpenCode, MiMoCode, and other scriptable code agents. - One-shot flow through
opencode-plusplus agent run:pack -> run agent -> collect diff -> policy/tests/impact/verify. - Multi-loop harness flow through
opencode-plusplus orchestrate:pack -> run agent -> evaluate -> repair/repack/finalize/block.
Status: mock executor, generic command adapter, and OpenCode stdout/transcript/fallback event normalizer implemented; MiMoCode, Codex, and Claude native event normalizers planned.
Goal: make repository evidence the source of truth for APIs, commands, config, and conventions.
Implemented MVP checks:
- Missing file references.
- Missing symbols or exports.
- Nonexistent package scripts or test commands.
- Nonexistent config keys and environment variables.
- Missing dependencies.
Implemented outputs:
.agent-context/hallucination/<task-id>.json.agent-context/runs/<task-id>/hallucination.md- policy findings for missing commands, missing symbols, missing local import files, missing dependencies, missing config keys, and missing file references.
- evidence references and repair suggestions.
- 鈥渧erify existence first鈥?prompts
Planned expansion:
- APIs or paths that contradict local conventions.
- Framework-specific route/config checks.
- Agent-specific transcript parsers beyond the current OpenCode foundation.
Status: deterministic Hallucination Guard MVP implemented; semantic convention checks remain planned.
Goal: prevent agents from reintroducing old bugs.
Implemented MVP inputs:
.agent-context/regression/known-issues.json.agent-context/regression/fix-history.json.agent-context/regression/fragile-modules.json.agent-context/regression/anti-regression-tests.json- task text, changed files, affected modules, and trace evidence
Implemented MVP outputs:
- anti-regression notes in task packs
- required regression tests
- historical risk findings
.agent-context/runs/<task-id>/regression.md.agent-context/regression/<task-id>.json- policy required failure when matched memory lacks required regression test evidence
Planned expansion:
- import issue / PR notes automatically
- richer pattern matching over historical failures
- repair prompts when old bug patterns reappear
Status: structured Regression Guard MVP implemented; richer history ingestion remains planned.
Goal: let coding agents call OpenCode++ as a native reliability backend.
- MCP tools for build, plan, pack, retrieve, tests, impact, verify, evaluate, repair, finalize.
- Verifiable MCP runtime fields:
nextAction,blocking,requiredCommands,mustInspect,allowedEditGlobs,avoidEditGlobs, andmissingEvidence. - Client usage guides for Codex, OpenCode, Claude Code, and Cursor under
docs/integrations/. - OpenCode / MiMoCode / MiMoCodex MCP usage guide and native event validation.
- Agent-led mode documentation: code agent calls OpenCode++ tools, with documented limitations that gates are advisory unless the host agent follows them.
- Harness-led mode documentation: OpenCode++ invokes the executor, evaluates bounded gates, and writes decision reports.
- Codex and Claude Code adapters.
- Cursor integration guide.
- Unified retriever adapters for static, ripgrep, LightRAG, embedding, and hybrid retrieval.
Status: MCP stdio server, core tools, structured runtime gate fields, and Codex/OpenCode/Claude Code/Cursor integration guides are implemented as a foundation; Agent Native Runtime tools remain experimental, and per-client end-to-end validation plus native event normalization remain planned.
Goal: make OpenCode++ a bounded runtime controller while the code agent remains a replaceable executor.
opencode-plusplus opencode doctor .opencode-plusplus opencode run "<task>" . --max-loops 3 --checkpoint git-worktree --fail-on requiredopencode-plusplus oc run "<task>" .opencode-plusplus orchestrate "<task>" . --executor mimocode --executor-command "mimocode run {prompt}" --max-loops 3 --checkpoint git-worktree --fail-on required- Flow:
user task -> plan/pack -> choose executor -> execute -> collect diff/trace/test evidence -> guard gates -> decision report. - Decision reports:
finalize,repair,repack,block,rollback,require human review. - Multi-iteration loop runner with per-iteration artifacts under
.agent-context/runs/<task-id>/iterations/<nnn>/. - Native OpenCode event parsing for
opencode run --format json, transcript files, and stdout/stderr fallback. - Native MiMoCode / Codex / Claude event parsing.
- Git worktree sandbox integration through
--checkpoint git-worktree; executors run in an isolated worktree, patches are exported back to the host run directory, and destructive rollback is intentionally not automatic.
Status: multi-loop orchestrator implemented with mock executor, generic command adapter, OpenCode event normalizer, per-iteration artifacts, decision gates, checkpoint patch output, and git-worktree executor sandbox; MiMoCode, Codex, and Claude event normalizers remain planned.
Goal: prove the reliability layer improves coding-agent behavior.
Compare:
- no context
AGENTS.mdonly- context pack
- loop-enabled harness
- harness + Guard modules
Measure:
- wrong file edits
- test failures
- steps per task
- token usage
- stale evidence reuse
- hallucinated APIs / commands
- regression reintroduction
- repair loops
- human-review blocks
First targets:
- OpenCode
- MiMoCode / MiMoCodex
- Codex CLI
- Claude Code
- Cursor
Current MVP:
opencode-plusplus benchmark-agent benchmarks --executor mock --dry-run- Real executor command hook through
--executor opencode|mimocode|codex|claude-code|cursor. - Same task, same executor, same fixture, four modes:
no-context,agents-md,context-pack,loop-enabled-harness. - Phase 6 task set: 10 tasks covering 3 bugfix, 2 feature, 2 refactor, 1 hallucinated-command trigger, 1 protected-path trigger, and 1 regression trigger.
- Per-run metrics:
wrong_files_changed,forbidden_files_changed,tests_missing,tests_failed,hallucinated_commands,iterations_to_finish,final_decision_accuracy, andhuman_review_needed. - Mode summary table: wrong files, forbidden files, missing tests, failed tests, hallucinated commands, iterations, decision accuracy, human review rate, and final gate strength.
benchmarks/agent-runs/*.jsonsupports manual or automated OpenCode/Codex/Claude/Cursor/MiMoCode run records for repeatable behavior comparison.
OpenCode example:
opencode-plusplus benchmark-agent benchmarks \
--executor opencode \
--executor-command "opencode run --format json --dir {repo} \"Follow the attached OpenCode++ task prompt.\" --file {prompt}" \
--max-loops 3 \
--fail-on requiredStatus: deterministic benchmark harness implemented; Phase 6 10-task value benchmark implemented with mock/generic executor support and manual four-mode records; repeated OpenCode/MiMoCode/Codex/Claude real-agent data collection remains ongoing.
Goal: keep the harness maintainable as real executors, guards, and normalizers grow.
Target module boundaries:
src/
guards/
boundary/
evidence/
impact/
hallucination/
regression/
loop/
runtime/
state-machine.ts
orchestrator.ts
decision-router.ts
checkpoint.ts
iteration-store.ts
executors/
index.ts
mock.ts
generic-command.ts
opencode.ts
codex.ts
claude-code.ts
normalizers/
opencode.ts
codex.ts
claude.ts
generic.ts
integrations/
mcp/
codegraph/
lightrag/
outputs/
markdown/
json/Rationale:
outputs/should eventually focus on rendering.runtime/should own loop orchestration and decisions.guards/should own checks and findings.executors/should own external code-agent invocation.normalizers/should own event and transcript parsing.
Status: planned refactor. Current implementation keeps compatibility while new behavior lands.
- Keep TypeScript/JavaScript on the TypeScript Compiler API for project-aware semantics.
- Strengthen Python with Tree-sitter plus stdlib
astfallback. - Add Go through
tree-sitter-goplusgo.modmetadata. - Add Rust through
tree-sitter-rustplusCargo.tomlmetadata. - Add Java through
tree-sitter-javaplus Maven/Gradle metadata. - Add C/C++ through
tree-sitter-cpppluscompile_commands.json.
- Repository scanner.
- Static file index.
- Symbol and dependency extraction.
- File and module dependency graph.
- Importance ranking.
AGENTS.mdgeneration.- Manual/generated/composed AGENTS architecture.
- Readiness score.
- Token savings.
- RAG export and retrieval protocol.
- Task context, impact, test selection, and benchmark foundations.
- Incremental cache for repeated builds and MCP/editor sessions.
- Harness-led
orchestratecommand. agent runexecutor wrapper.- Mock executor and generic executor command adapter.
- Multi-loop orchestrator iterations with prompt, executor events, diff, trace, policy, verify, loop, and decision artifacts.