To streamline the use of the MME Benchmark, we have consolidated all the necessary repositories and configured them for one-button deployment. Follow the steps below to get started. We take LVLM LLaVA as an example.
Begin by creating the required directory structure inside your LLaVA workspace:
mkdir -p LLaVA/playground/data/eval/MMEThis ensures all evaluation data is well-organized. We strongly recommend placing the MME folder inside eval for better clarity and separation.
Now, clone the benchmark repository:
git clone https://github.com/DAILtech/Evaluation-benchmark-MME/
This repository contains everything needed for running the MME Benchmark in one go.
Put the files out of the Evaluation-benchmark-MME folder, then make a folder named answers in eval/MME, like eval/MME/answers
mkdir -p LLaVA/playground/data/eval/MME/answers(your should in LLaVA folder)
#!/bin/bash
# cd LLaVA
python -m llava.eval.model_vqa_loader \
--model-path /root/autodl-tmp/LLaVA/llava-v1.5-7b \
--question-file ./playground/data/eval/MME/llava_mme_test.jsonl \
--image-folder ./playground/data/eval/MME/MME_Benchmark_release_version \
--answers-file ./playground/data/eval/MME/answers/llava-v1.5-7b.jsonl \
--temperature 0 \
--conv-mode vicuna_v1 \
cd ./playground/data/eval/MME
python convert_answer_to_mme.py --experiment llava-v1.5-7b
cd eval_tool
python calculation.py --results_dir answers/llava-v1.5-7b