Skip to content

Add Mistral provider support#5

Open
meaningfool wants to merge 1 commit intokwindla:mainfrom
meaningfool:meaningfool/mistral-code
Open

Add Mistral provider support#5
meaningfool wants to merge 1 commit intokwindla:mainfrom
meaningfool:meaningfool/mistral-code

Conversation

@meaningfool
Copy link
Copy Markdown

@meaningfool meaningfool commented Apr 9, 2026

Summary

  • add direct mistral service support using Mistral's OpenAI-compatible API
  • document the new service alias and required env vars in the README
  • fix --judge-model so the selected Claude judge model is actually used
  • make the comprehensive eval helper usable for unstable providers by supporting sequential retries and skipping incomplete runs
  • fix the run-directory race by returning the created run dir directly from the CLI run path

Mistral Config Used For Benchmarking

  • model: mistral-small-2603
  • endpoint: https://api.mistral.ai/v1
  • interface: OpenAI-compatible /v1/chat/completions
  • benchmark used provider defaults for sampling and tool behavior: no explicit temperature, top_p, seed, tool_choice, or parallel_tool_calls overrides
  • later stable runs used MTE_TEXT_IDLE_TIMEOUT_SECS=240 in the harness to avoid killing slow turns; this is a runner setting, not a model setting

Validation

  • uv run python -m py_compile scripts/run_comprehensive_eval.py src/multi_turn_eval/cli.py src/multi_turn_eval/judging/claude_judge.py src/multi_turn_eval/pipelines/base.py
  • smoke and full Mistral benchmark runs completed successfully with Opus judging

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant