if visual_only: picked_image_embeds = hidden_states[0, abs_topk, :] else: picked_image_embeds = inputs_embeds_step[0, abs_topk, :] inputs_embeds_step[0, abs_topk, :] is the image placeholders embedding not the actual image embedding
inputs_embeds_step[0, abs_topk, :] is the image placeholders embedding not the actual image embedding