#

mt-bench

Here are 2 public repositories matching this topic...

Liquid4All / mt_bench

Modified mt_bench with API and HF scripts for LFMs.

benchmark evaluation liquid-ai llm mt-bench

Updated Jul 9, 2025
Python

JosephHu04 / dual-judge

⚖️ Dual-Judge: 让AI测试结果真正有说服力 | 双LLM交叉验证消除单模型偏见 | 独立于具体Agent的通用评估框架 | Making AI Evaluation Trustworthy

python hotel quality-assurance quality-metrics evaluation-framework hotel-service fastapi ai-agent ai-testing ollama llm-evaluation deepseek react-agent langgraph llm-as-judge mt-bench llm-judge inter-rater-reliability

Updated Jun 18, 2026
Python

Improve this page

Add a description, image, and links to the mt-bench topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the mt-bench topic, visit your repo's landing page and select "manage topics."