Hi, thank you for sharing your excellent work and code.
I am currently trying to reproduce the experimental results reported in your paper. I have a question regarding the evaluation metrics in the results table.
Could you please clarify whether the reported values are the average results over multiple runs, or the best results obtained from a single run/checkpoint?
If they are averaged results, could you also let me know how many runs were used and whether different random seeds were applied?
Thank you very much for your time and help!
Hi, thank you for sharing your excellent work and code.
I am currently trying to reproduce the experimental results reported in your paper. I have a question regarding the evaluation metrics in the results table.
Could you please clarify whether the reported values are the average results over multiple runs, or the best results obtained from a single run/checkpoint?
If they are averaged results, could you also let me know how many runs were used and whether different random seeds were applied?
Thank you very much for your time and help!