The paper says there are two metrics evaluated:
- the correctness of the natural language answer, evaluated by GPT-as-judge answer accuracy score
- the accuracy of the grounding coordinates, measured by grounding IoU
Can you show how / where to retrieve these two metrics?
The paper says there are two metrics evaluated:
Can you show how / where to retrieve these two metrics?