Objective
Specify how eval tooling records requested versus actual runtime and model metadata.
Priority
P0 — Must Fix for v1.0
Details
The eval report format already distinguishes config values from observed runtime metadata, but the spec does not define the source-of-truth order between API responses, runtime JSON output, CLI metadata, and requested config/env values. This makes report provenance underspecified.
Acceptance Criteria
- The spec defines the precedence order for runtime/model provenance
- Requested and observed fields are defined consistently in eval report prose
- Fallback behavior is defined for runtimes without model introspection support
Notes
Source: work/eval-design-discussion.md
Objective
Specify how eval tooling records requested versus actual runtime and model metadata.
Priority
P0 — Must Fix for v1.0
Details
The eval report format already distinguishes config values from observed runtime metadata, but the spec does not define the source-of-truth order between API responses, runtime JSON output, CLI metadata, and requested config/env values. This makes report provenance underspecified.
Acceptance Criteria
Notes
Source: work/eval-design-discussion.md