How can I get the metrics of the evaluation step?

The paper says there are two metrics evaluated:  

1. the correctness of the natural language answer, evaluated by GPT-as-judge answer accuracy score 
2. the accuracy of the grounding coordinates, measured by grounding IoU

Can you show how / where to retrieve these two metrics?