chinese-benchmark

Here is 1 public repository matching this topic...

wchao6891 / ChineseStressBench

中文高压复杂任务Benchmark。主要是测模型会不会在真实工作里误事。This is a Chinese-language high-pressure complex task benchmark. The main purpose is to test whether the model will cause problems in real-world applications.

benchmark decision-making stress-testing ai-safety reasoning ai-evaluation llm-evaluation llm-arena llm-benchmark chinese-benchmark

Updated May 9, 2026
HTML

Improve this page

Add a description, image, and links to the chinese-benchmark topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the chinese-benchmark topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chinese-benchmark

Here is 1 public repository matching this topic...

wchao6891 / ChineseStressBench

Improve this page

Add this topic to your repo