docs: add deterministic multi-family admissibility benchmark documentation#137
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces documentation for the Deterministic Multi-Family Admissibility Benchmark, outlining its purpose, pipeline, and regression protections. The review identified an inconsistency in the regression protection logic where the stated requirement for distinct behavior between 'mild' and 'moderate' levels conflicted with the use of a non-strict inequality operator in the documentation. A correction was suggested to enforce strict inequality across all degradation levels.
| - baseline and severe behavior is explicitly checked | ||
| - mild and moderate behavior must be distinct | ||
| - degradation must be progressive: | ||
| - `baseline > mild >= moderate > severe` |
There was a problem hiding this comment.
The progression formula should use a strict inequality (>) between mild and moderate to be consistent with the requirement stated on line 86 that these behaviors must be distinct. Using >= allows for identical scores, which contradicts the stated goal of ensuring distinct behavior across these levels.
| - `baseline > mild >= moderate > severe` | |
| - baseline > mild > moderate > severe |
Motivation
Description
docs/benchmarks/multi_family_admissibility_benchmark.mdthat includes a Mermaid pipeline diagram, the current fixture families, the four standard degradation levels, determinism guarantees, regeneration commands, validation commands, regression protections, and non-goals; no implementation, fixture, artifact, CI, or package changes were made.docs/benchmarks/multi_family_admissibility_benchmark.md.Testing
npm run check, which executes layout, typecheck, validation, build, and the full Python test suite, and the command completed successfully.pytestwhere all tests passed (213 passed), and the repository checks are green.Codex Task