Add Mistral provider support#5

Open

meaningfool wants to merge 1 commit intokwindla:mainfrom

meaningfool:meaningfool/mistral-code

meaningfool commented Apr 9, 2026 •

edited

Loading

Summary

add direct mistral service support using Mistral's OpenAI-compatible API
document the new service alias and required env vars in the README
fix --judge-model so the selected Claude judge model is actually used
make the comprehensive eval helper usable for unstable providers by supporting sequential retries and skipping incomplete runs
fix the run-directory race by returning the created run dir directly from the CLI run path

Mistral Config Used For Benchmarking

model: mistral-small-2603
endpoint: https://api.mistral.ai/v1
interface: OpenAI-compatible /v1/chat/completions
benchmark used provider defaults for sampling and tool behavior: no explicit temperature, top_p, seed, tool_choice, or parallel_tool_calls overrides
later stable runs used MTE_TEXT_IDLE_TIMEOUT_SECS=240 in the harness to avoid killing slow turns; this is a runner setting, not a model setting

Validation

uv run python -m py_compile scripts/run_comprehensive_eval.py src/multi_turn_eval/cli.py src/multi_turn_eval/judging/claude_judge.py src/multi_turn_eval/pipelines/base.py
smoke and full Mistral benchmark runs completed successfully with Opus judging


          Add Mistral provider support

65e0b74

meaningfool mentioned this pull request

Add Mistral Small 4 benchmark results #6

Open

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet