Think about the following users in writing the readme: - someone trying to verify the claims in our paper - someone trying to run the benchmark on their model/agent - someone trying to add more tasks/evals/etc to the benchmark
Think about the following users in writing the readme: