Code, datasets, models for the paper "Automatic Evaluation of Attribution by Large Language Models"
-
Updated
Jul 3, 2023 - Python
Code, datasets, models for the paper "Automatic Evaluation of Attribution by Large Language Models"
[NeurIPS'25] MLLM-CompBench evaluates the comparative reasoning of MLLMs with 40K image pairs and questions across 8 dimensions of relative comparison: visual attribute, existence, state, emotion, temporality, spatiality, quantity, and quality. CompBench covers diverse visual domains, including animals, fashion, sports, and scenes
This repo contains detailed notes on LangSmith concepts including traces, runs, observability, and integrations with LangChain, RAG, and LangGraph.
Add a description, image, and links to the evaluation-llms topic page so that developers can more easily learn about it.
To associate your repository with the evaluation-llms topic, visit your repo's landing page and select "manage topics."