Skip to content

Add AIME2025, GPQA, HealthBench evaluation_test suites; unify row-limiting via pytest flag; clean up examples#44

Merged
benjibc merged 3 commits into
mainfrom
implement_aime_gpqa_health
Aug 10, 2025
Merged

Add AIME2025, GPQA, HealthBench evaluation_test suites; unify row-limiting via pytest flag; clean up examples#44
benjibc merged 3 commits into
mainfrom
implement_aime_gpqa_health

fixed per comments

613d8d1
Select commit
Loading
Failed to load commit list.