OpenCode++ benchmarks compare context and harness modes.
opencode-plusplus benchmark benchmarks --top-k 8Measures retrieval quality such as relevant files and required tests.
opencode-plusplus benchmark-agent benchmarks --executor mock --dry-runModes:
no-contextagents-mdcontext-packloop-enabled-harness
Metrics:
wrong_files_changedforbidden_files_changedtests_missingtests_failedhallucinated_commandsiterations_to_finishfinal_decision_accuracyhuman_review_needed
Use the generic executor hook for real-agent comparisons:
opencode-plusplus benchmark-agent benchmarks \
--executor opencode \
--executor-command "opencode run --format json --dir {repo} --file {prompt} \"Follow the attached OpenCode++ task prompt.\"" \
--max-loops 3 \
--fail-on required