The checkpoint, results in the paper seem to be not reliable

I conducted several experiments using the checkpoints released in the paper and observed the following insights and issues:

I. Poor Performance

- The overall performance of the model is very weak, as shown below:

<img width="847" height="102" alt="Image" src="https://github.com/user-attachments/assets/f69c8a6b-0d0f-4323-a8c5-e0c4a74f6bf8" />

II. Weakness of the Checkpoint Model

- My experiments suggest that visual information has little to no impact on the model’s predictions.

- The model seems to rely primarily on the question or textual features instead of the image.

- I applied adversarial attacks using the downstream loss, but the attacks had little effect, the output was not change.

- Furthermore, when I provided random noise images and asked the model questions, it produced the same responses as it did with real medical images.

<img width="1383" height="447" alt="Image" src="https://github.com/user-attachments/assets/09b888ff-f0c7-4515-8f1b-24d80b0dad3f" />

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The checkpoint, results in the paper seem to be not reliable #130

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

The checkpoint, results in the paper seem to be not reliable #130

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions