CKRA-MedVQA

This is the implementation of “Beyond Static Knowledge: Dynamic Context-Aware Cross-Modal Contrastive Learning for Medical Visual Question Answering”, published in IEEE Transactions on Medical Imaging (IEEE TMI).

Abstract

Medical Visual Question Answering (Med-VQA) aims to analyze medical images and accurately respond to natural language queries, thereby optimizing clinical workflows and improving diagnostic and therapeutic outcomes. Although medical images contain rich visual information, the corresponding textual queries frequently lack sufficient descriptive content. This imbalance of information and modality differences leads to significant semantic bias. Furthermore, existing approaches integrate external medical knowledge to enhance model performance, they primarily rely on static knowledge that lacks dynamic adaptation to specific input samples, leading to redundant information and noise interference. To address these challenges, we propose a Contextual Knowledge-Aware Dynamic Perception for the Cross-Modal Reasoning and Alignment (CKRA) Model. To mitigate knowledge redundancy, CKRA employs a dynamic perception mechanism that leverages semantic cues from the query to selectively filter relevant medical knowledge specific to the current sample’s context. To alleviate cross-modal semantic bias, CKRA bridges the distance between visual and linguistic features through knowledge-image contrastive learning, optimizing knowledge feature representation and directing the model’s attention to key image regions. Further, we design a dual-stream guided attention network that facilitates cross-modal interaction and alignment across multiple dimensions. Experimental results show that the proposed CKRA model outperforms the state-of-the-art method on SLAKE and VQA-RAD datasets. In addition, ablation studies validate the effectiveness of each module, while Grad-CAM maps further demonstrate the feasibility of CKRA for medical visual questioning tasks. The source code and weights of the model are available at https://github.com/cloneiq/CKRA-MedVQA.

Requirements

Run the following command to install the required packages:

conda env create -f environment.yaml # method 1
pip install -r requirements.txt # method 2

Preparation

├── checkpoints
├── data
│   ├── vqa_medvqa_2019_test.arrow
│   ├── ......
├── download
│   ├── checkpoints
│   ├── biobert_v1.1
│   ├── pretrained
│   │   ├── m3ae.ckpt
│   ├── roberta-base
├── m3ae
├── prepro
├── run_scripts

Dataset

Please follow here and only use the SLAKE and VQA-RAD datasets.

Pretrained

Download the m3ae pretrained weight and put it in the download/pretrained.

roberta-base

Download the roberta-base and put it in the download/roberta-base.

BioBert

Download the BioBert and put it in the download/biobert_v1.1.

Checkpoints

Download the checkpoints we trained and put it in the download/checkpoints.

Train & Test

# cd this file 
bash run_scripts/ckra_train.sh
# cd this file
bash run_scripts/ckra_test.sh

Citations

If this repository is useful for your research, please cite:

@article{Yang2025CKRA-MedVQA,
  title={Beyond Static Knowledge: Dynamic Context-Aware Cross-Modal Contrastive Learning for Medical Visual Question Answering},
  author={Rui Yang, Lijun Liu*,Xupeng Feng,Wei Peng, Xiaobing Yang},
  journal={IEEE Transactions on Medical Imaging},
  year={2025},
  publisher={IEEE}
}

@inproceedings{chen2022m3ae,
  title={Multi-Modal Masked Autoencoders for Medical Vision-and-Language Pre-Training},
  author={Chen, Zhihong and Du, Yuhao and Hu, Jinpeng and Liu, Yang and Li, Guanbin and Wan, Xiang and Chang, Tsung-Hui},
  booktitle={International Conference on Medical Image Computing and Computer-Assisted Intervention},
  year={2022},
  organization={Springer}
}

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
.idea		.idea
data		data
m3ae		m3ae
myenv		myenv
prepro		prepro
run_scripts		run_scripts
Overall_framework.svg		Overall_framework.svg
README.md		README.md
environment.yaml		environment.yaml
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CKRA-MedVQA

Abstract

Requirements

Preparation

Dataset

Pretrained

roberta-base

BioBert

Checkpoints

Train & Test

Citations

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CKRA-MedVQA

Abstract

Requirements

Preparation

Dataset

Pretrained

roberta-base

BioBert

Checkpoints

Train & Test

Citations

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages