Hi, thanks for sharing the awesome work!
I found in the dataset files there is a column of "position" for each video, seems to be a list of index. Could you provide some details for these data? What are they and how they are collected. If they are localized information, it would be super helpful for the research community of long video understanding.
Thanks! Any help would be highly appreciated!