Skip to content

docs: add deterministic multi-family admissibility benchmark documentation#137

Merged
ProfRandom92 merged 1 commit into
mainfrom
codex/add-documentation-for-multi-family-benchmark
May 19, 2026
Merged

docs: add deterministic multi-family admissibility benchmark documentation#137
ProfRandom92 merged 1 commit into
mainfrom
codex/add-documentation-for-multi-family-benchmark

Conversation

@ProfRandom92
Copy link
Copy Markdown
Owner

Motivation

  • Provide a focused, contributor-facing guide that explains the deterministic multi-family admissibility benchmark purpose, pipeline, invariants, regeneration, validation, and regression protections.
  • Summary: Document deterministic multi-family admissibility benchmark behavior, pipeline, determinism guarantees, regeneration commands, validation commands, and regression protections.

Description

  • Added a new benchmark doc at docs/benchmarks/multi_family_admissibility_benchmark.md that includes a Mermaid pipeline diagram, the current fixture families, the four standard degradation levels, determinism guarantees, regeneration commands, validation commands, regression protections, and non-goals; no implementation, fixture, artifact, CI, or package changes were made.
  • Changed files: docs/benchmarks/multi_family_admissibility_benchmark.md.

Testing

  • Ran npm run check, which executes layout, typecheck, validation, build, and the full Python test suite, and the command completed successfully.
  • The test run included pytest where all tests passed (213 passed), and the repository checks are green.
  • Risks: Low; this is a docs-only change with no code or artifact modifications, and Next: optionally cross-link this benchmark doc from broader benchmark index docs in a follow-up docs-only PR.

Codex Task

@ProfRandom92 ProfRandom92 merged commit 9ea36c3 into main May 19, 2026
4 checks passed
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces documentation for the Deterministic Multi-Family Admissibility Benchmark, outlining its purpose, pipeline, and regression protections. The review identified an inconsistency in the regression protection logic where the stated requirement for distinct behavior between 'mild' and 'moderate' levels conflicted with the use of a non-strict inequality operator in the documentation. A correction was suggested to enforce strict inequality across all degradation levels.

- baseline and severe behavior is explicitly checked
- mild and moderate behavior must be distinct
- degradation must be progressive:
- `baseline > mild >= moderate > severe`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The progression formula should use a strict inequality (>) between mild and moderate to be consistent with the requirement stated on line 86 that these behaviors must be distinct. Using >= allows for identical scores, which contradicts the stated goal of ensuring distinct behavior across these levels.

Suggested change
- `baseline > mild >= moderate > severe`
- baseline > mild > moderate > severe

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant