test: strengthen eval slice — realistic cases or explicit scaffold framing

## Problem

The eval slice is the thinnest part of the harness:

- [eval/golden_qa.json](eval/golden_qa.json) contains a single toy case (echo-hello).
- The nightly eval workflow is disabled by default.
- [docs/EVAL_HARNESS.md](docs/EVAL_HARNESS.md) documents the protocol but doesn't frame the "why would I add a case?" question.

A reader expecting an LLM-eval story finds infrastructure without conviction.

## Proposed solution

Pick **one** of the following:

**Option A — Add real cases.** Add 3–4 realistic golden cases (e.g. a Q/A pair, a code-generation prompt with a deterministic expected substring, a tool-use trace check). Wire the nightly workflow to run them by default. Update EVAL_HARNESS.md with a "how to add a case" section.

**Option B — Reframe as scaffold.** Add a paragraph at the top of EVAL_HARNESS.md explicitly framing the eval slice as "scaffold for your project's eval cases, not a benchmark." Document the contract for adding cases. Keep the toy case as a smoke test and explain why the nightly is opt-in.

## Acceptance criteria

- [ ] One of Option A or Option B chosen and implemented
- [ ] EVAL_HARNESS.md tells a reader either *how to add cases* or *why this is a scaffold*
- [ ] Nightly workflow status (enabled / opt-in) is consistent with the chosen framing

## Priority rationale

Currently the weakest organ of the harness story; either path closes the gap.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: strengthen eval slice — realistic cases or explicit scaffold framing #94

Problem

Proposed solution

Acceptance criteria

Priority rationale

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

test: strengthen eval slice — realistic cases or explicit scaffold framing #94

Description

Problem

Proposed solution

Acceptance criteria

Priority rationale

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions