Chest X-Ray Multi-Modal Intelligence System

DSAI 413 — Assignment 2. A dual-mode system over the MIMIC-CXR dataset:

Report Generation — chest X-ray image → radiology report
QA Mode — natural-language question → RAG-grounded answer over reports

Mandatory models: google/medgemma-1.5-4b-it and ColPali (vidore/colpali-v1.3), each compared against a lighter baseline (OpenCLIP ViT-B/32 for report retrieval; sentence-transformers MiniLM for text RAG).

Overview

The codebase is developed locally and executed on Google Colab (T4, 16 GB VRAM) — a 4 GB laptop GPU cannot host MedGemma-1.5-4B + ColPali. Heavy scripts (indexing, evaluation) are runnable as python -m src.eval and from notebooks/run_on_colab.ipynb.

Design priorities (in order):

Simplicity over sophistication — zero-shot, no fine-tuning.
Small subsets — 400 image/report pairs (300 indexed, 100 test).
Lazy loading — models load on first call.
One config file — config.yaml owns every knob.

Setup

# Local (dev only — no inference)
git clone <this-repo>.git chest-xray-system
cd chest-xray-system
pip install -r requirements.txt

# Required: Hugging Face token (for MedGemma + ColPali weights)
cp .env.example .env
# then edit .env and set HF_TOKEN=hf_...

Note. Local install is for editing only. All inference runs on Colab — see the Colab Quickstart below.

Colab Quickstart

Open notebooks/run_on_colab.ipynb in Google Colab on a T4 runtime. The notebook will:

git clone this repo
pip install -r requirements.txt
Prompt for HF_TOKEN via getpass
Run python data/download.py to pull the MIMIC-CXR subset (400 pairs)
Launch python app/app.py with Gradio share=True → a public URL

End-to-end on a fresh T4 should take < 15 minutes to a working share link.

Data

Source. simhadrisadaram/mimic-cxr-dataset via kagglehub.
Subset. 400 (image, report) pairs sampled with seed=42, split 300 index / 100 test.
Manifest. data/sample/manifest.csv with columns id, image_path, report.
QA dataset. data/sample/qa_dataset.json — 3 clinical QA pairs per report, generated by MedGemma (yes/no, open-ended, location/laterality).

All data lives under data/sample/ (gitignored) and is regenerated on Colab.

Running Each Mode

Report Generation

# Smoke test: 10 test images, MedGemma vs CLIP retrieval baseline
python -m src.report_mode

QA (RAG)

# Build the ColPali and text indexes (one-time, persisted to data/sample/)
python -m src.qa_mode --build-index

# Smoke test
python -m src.qa_mode

Full Evaluation

python -m src.eval
# Writes results/comparison.json + results/comparison.md

Gradio Demo

python app/app.py
# Opens local UI; on Colab, share=True from config.yaml prints a public URL.

Results

Report Generation (n=30)

Model	ROUGE-L F1	BERTScore F1	Latency mean (s)
MedGemma 1.5-4B	0.347	0.890	10.90
OpenCLIP retrieval	0.283	0.887	0.33

QA RAG (n=15)

Retriever	Recall@3	Judge accuracy	correct	partial	wrong	unparseable+err	latency mean (s)
ColPali v1.3	0.000	0.400	6	3	5	1	70.17
MiniLM-L6 text	0.133	0.467	7	1	7	0	11.01

Headlines: MedGemma beats CLIP retrieval by +23% on ROUGE-L for report generation. MiniLM-L6 narrowly beats ColPali v1.3 on Recall@3, strict-correct judge accuracy, and latency (6.4× faster). See report/REPORT.md for methodology, qualitative observations, and limitations.

Limitations

Filled in Phase 7. Highlights expected to include:

Zero-shot only; no domain fine-tuning.
400-pair subset is not representative of full MIMIC-CXR distribution.
LLM-as-judge introduces evaluation bias (same model family as the system under test).
Rendered-report ColPali pages are synthetic, not native PDFs.

Project Layout

chest-xray-system/
├── README.md
├── requirements.txt
├── config.yaml
├── .env.example
├── data/
│   ├── download.py
│   ├── build_qa_dataset.py
│   └── sample/                 # populated at runtime (gitignored)
├── src/
│   ├── models/
│   │   ├── medgemma.py
│   │   ├── colpali_retriever.py
│   │   ├── clip_retriever.py
│   │   └── text_retriever.py
│   ├── report_mode.py
│   ├── qa_mode.py
│   └── eval.py
├── app/
│   └── app.py
├── notebooks/
│   └── run_on_colab.ipynb
├── results/
└── report/
    └── REPORT.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Chest X-Ray Multi-Modal Intelligence System

Overview

Setup

Colab Quickstart

Data

Running Each Mode

Report Generation

QA (RAG)

Full Evaluation

Gradio Demo

Results

Report Generation (n=30)

QA RAG (n=15)

Limitations

Project Layout

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
app		app
data		data
notebooks		notebooks
report		report
results		results
src		src
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
config.yaml		config.yaml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Chest X-Ray Multi-Modal Intelligence System

Overview

Setup

Colab Quickstart

Data

Running Each Mode

Report Generation

QA (RAG)

Full Evaluation

Gradio Demo

Results

Report Generation (n=30)

QA RAG (n=15)

Limitations

Project Layout

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages