Skip to content

Add MedHALT evaluation environment#65

Open
geetua wants to merge 2 commits into
MedARC-AI:mainfrom
geetua:add-medhalt-environment
Open

Add MedHALT evaluation environment#65
geetua wants to merge 2 commits into
MedARC-AI:mainfrom
geetua:add-medhalt-environment

Conversation

@geetua

@geetua geetua commented Oct 30, 2025

Copy link
Copy Markdown
Contributor

Adds support for the MedHALT dataset for evaluating medical LLMs on multiple-choice questions.
Summary

New environment: medhalt
Two configs: reasoning_FCT and reasoning_nota
~18,866 examples per config
Supports answer shuffling

Testing
Tested on qwen2.5:3b (1000 examples):

reasoning_FCT: 50.7% accuracy
reasoning_nota: 33.1% accuracy

See environments/medhalt/README.md for full documentation and usage.

@CLAassistant

CLAassistant commented Oct 30, 2025

Copy link
Copy Markdown

CLA assistant check
All committers have signed the CLA.

@warner-benjamin

Copy link
Copy Markdown
Collaborator

Looks like a good start, we need to update the environment to use the medarc_verifiers helpers for randomization randomize_multiple_choice and multiple_choice_accuracy introduced in #63.

Is the question format and lack of a system prompt match the author's code? If there's no system prompt, then we should add the boxed/xml system prompts with thinking and non-thinking options and parsers so it's easier to grab the correct model output.

@geetua

geetua commented Nov 1, 2025

Copy link
Copy Markdown
Contributor Author

Thanks for the feedback! Good flag on using the original MedHALT prompts and add boxed/XML variants.

I have a few clarification questions about data filtering and scope - I'll DM you on Discord to avoid cluttering the PR. Will implement once I understand your preferences.

Working on the medarc_verifiers integration now!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants