Objective
Align the eval engine contract, examples, and compatibility text so supported and unsupported eval runtimes are described consistently.
Priority
P0 — Must Fix for v1.0
Details
The spec currently defines claude-code and codex as supported eval engines, with copilot and cursor reserved for future use. However, some eval execution-flow examples still show a Copilot headless command. This leaves the supported-engine contract internally inconsistent.
Acceptance Criteria
- Eval examples use only supported engines or clearly mark future placeholders as non-normative
- Unsupported-engine behavior is defined consistently
- The package format and compatibility sections agree on eval engine support
Notes
Source: work/eval-design-discussion.md
Objective
Align the eval engine contract, examples, and compatibility text so supported and unsupported eval runtimes are described consistently.
Priority
P0 — Must Fix for v1.0
Details
The spec currently defines
claude-codeandcodexas supported eval engines, withcopilotandcursorreserved for future use. However, some eval execution-flow examples still show a Copilot headless command. This leaves the supported-engine contract internally inconsistent.Acceptance Criteria
Notes
Source: work/eval-design-discussion.md