Why one frame belongs to different actions?

Hi~ 
In the Charades and CharadesEgo dataset, one video always contains several actions. In the code, you divide the video into several clips according to the start and end time, but I have observed that one frame may belongs to multiple action tags. 
In this case, Can the loss function be trained normally?