[Project Page] [arXiv] [PDF] [Supplemental] [Slides] [BibTeX]
pip install -r requirements.txtFor local VLM models, install VLMEvalKit:
git clone https://github.com/open-compass/VLMEvalKit.git
cd VLMEvalKit && pip install -e .python evaluate_ghost.py --pred-path predictions.jsonLocal VLM:
python run_predictions.py \
--data-path dataset/ghost_full_merged.json \
--image-dir dataset/images/ \
--model-name llava_v1.5_7b \
--model-type vlm \
--output-path predictions/llava_predictions.jsonAPI Model:
python run_predictions.py \
--data-path dataset/ghost_full_merged.json \
--image-dir dataset/images/ \
--model-name gpt-4o \
--model-type api \
--output-path predictions/gpt4o_predictions.json \
--api-key YOUR_API_KEYCheckpoint/Resume:
- Predictions are automatically saved every 10 questions
- If interrupted, rerun the same command to resume
- Use
--no-resumeto start from scratch - Use
--checkpoint-every Nto change checkpoint frequency
JSON format with question keys: {image_id}_{object_id}_{question_type}_{pos/neg}
{
"2406158_obj3_1pos": "A wheels is present in the image.",
"2406158_obj3_attr1_1pos": "The color of the wheels present in the image is white.",
"2406158_obj3_rel1_1pos": "The spatial relation between the wheels and man is that the wheels is to the left of the man."
}Question Types:
- Object:
1pos,1neg,2neg, ... - Attribute:
attr1_1pos,attr1_1neg, ... - Relation:
rel1_1pos,rel1_2neg, ...
Categories:
- Objects GCS: Consistency on object presence questions
- Attributes GCS: Consistency on object attribute questions
- Relations GCS: Consistency on spatial relation questions
Predictions are saved as JSON:
[
{
"question_id": "2406158_obj3_1pos",
"object_id": "2406158_obj3",
"image": "2406158.jpg",
"text": "A wheels is present in the image.",
"label": "yes",
"model_name": "llava_v1.5_7b",
"prediction": "true"
}
]ghost-evaluation/
├── dataset/
│ ├── ghost_full_merged.json
│ └── images/
├── ghost_consistency_score.py
├── utils.py
├── evaluate_ghost.py
├── run_predictions.py
├── requirements.txt
├── .gitignore
└── README.md
from evaluate_ghost import evaluate
results = evaluate('predictions.json')
print(f"Objects GCS: {results['objects_gcs']:.2f}%")
print(f"Attributes GCS: {results['attributes_gcs']:.2f}%")
print(f"Relations GCS: {results['relations_gcs']:.2f}%")To use API models, implement the get_api_prediction() function in run_predictions.py:
def get_api_prediction(model_name: str, image_path: str, prompt: str, api_key: str = None) -> str:
if model_name == 'gpt-4o':
# Implement OpenAI API call
pass
elif model_name == 'gemini-pro':
# Implement Google Gemini API call
pass
# Add more models as needed@article{ghost2024,
title={GHOST: Getting to the Bottom of Hallucinations with a Multi-round Consistency Benchmark},
author={[Authors]},
journal={[Journal/Conference]},
year={2024}
}[license]
