Skip to content

ustafaa/assignment-2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Chest X-Ray Multi-Modal Intelligence System

DSAI 413 — Assignment 2. A dual-mode system over the MIMIC-CXR dataset:

  1. Report Generation — chest X-ray image → radiology report
  2. QA Mode — natural-language question → RAG-grounded answer over reports

Mandatory models: google/medgemma-1.5-4b-it and ColPali (vidore/colpali-v1.3), each compared against a lighter baseline (OpenCLIP ViT-B/32 for report retrieval; sentence-transformers MiniLM for text RAG).


Overview

The codebase is developed locally and executed on Google Colab (T4, 16 GB VRAM) — a 4 GB laptop GPU cannot host MedGemma-1.5-4B + ColPali. Heavy scripts (indexing, evaluation) are runnable as python -m src.eval and from notebooks/run_on_colab.ipynb.

Design priorities (in order):

  1. Simplicity over sophistication — zero-shot, no fine-tuning.
  2. Small subsets — 400 image/report pairs (300 indexed, 100 test).
  3. Lazy loading — models load on first call.
  4. One config fileconfig.yaml owns every knob.

Setup

# Local (dev only — no inference)
git clone <this-repo>.git chest-xray-system
cd chest-xray-system
pip install -r requirements.txt

# Required: Hugging Face token (for MedGemma + ColPali weights)
cp .env.example .env
# then edit .env and set HF_TOKEN=hf_...

Note. Local install is for editing only. All inference runs on Colab — see the Colab Quickstart below.


Colab Quickstart

Open notebooks/run_on_colab.ipynb in Google Colab on a T4 runtime. The notebook will:

  1. git clone this repo
  2. pip install -r requirements.txt
  3. Prompt for HF_TOKEN via getpass
  4. Run python data/download.py to pull the MIMIC-CXR subset (400 pairs)
  5. Launch python app/app.py with Gradio share=True → a public URL

End-to-end on a fresh T4 should take < 15 minutes to a working share link.


Data

  • Source. simhadrisadaram/mimic-cxr-dataset via kagglehub.
  • Subset. 400 (image, report) pairs sampled with seed=42, split 300 index / 100 test.
  • Manifest. data/sample/manifest.csv with columns id, image_path, report.
  • QA dataset. data/sample/qa_dataset.json — 3 clinical QA pairs per report, generated by MedGemma (yes/no, open-ended, location/laterality).

All data lives under data/sample/ (gitignored) and is regenerated on Colab.


Running Each Mode

Report Generation

# Smoke test: 10 test images, MedGemma vs CLIP retrieval baseline
python -m src.report_mode

QA (RAG)

# Build the ColPali and text indexes (one-time, persisted to data/sample/)
python -m src.qa_mode --build-index

# Smoke test
python -m src.qa_mode

Full Evaluation

python -m src.eval
# Writes results/comparison.json + results/comparison.md

Gradio Demo

python app/app.py
# Opens local UI; on Colab, share=True from config.yaml prints a public URL.

Results

Report Generation (n=30)

Model ROUGE-L F1 BERTScore F1 Latency mean (s)
MedGemma 1.5-4B 0.347 0.890 10.90
OpenCLIP retrieval 0.283 0.887 0.33

QA RAG (n=15)

Retriever Recall@3 Judge accuracy correct partial wrong unparseable+err latency mean (s)
ColPali v1.3 0.000 0.400 6 3 5 1 70.17
MiniLM-L6 text 0.133 0.467 7 1 7 0 11.01

Headlines: MedGemma beats CLIP retrieval by +23% on ROUGE-L for report generation. MiniLM-L6 narrowly beats ColPali v1.3 on Recall@3, strict-correct judge accuracy, and latency (6.4× faster). See report/REPORT.md for methodology, qualitative observations, and limitations.


Limitations

Filled in Phase 7. Highlights expected to include:

  • Zero-shot only; no domain fine-tuning.
  • 400-pair subset is not representative of full MIMIC-CXR distribution.
  • LLM-as-judge introduces evaluation bias (same model family as the system under test).
  • Rendered-report ColPali pages are synthetic, not native PDFs.

Project Layout

chest-xray-system/
├── README.md
├── requirements.txt
├── config.yaml
├── .env.example
├── data/
│   ├── download.py
│   ├── build_qa_dataset.py
│   └── sample/                 # populated at runtime (gitignored)
├── src/
│   ├── models/
│   │   ├── medgemma.py
│   │   ├── colpali_retriever.py
│   │   ├── clip_retriever.py
│   │   └── text_retriever.py
│   ├── report_mode.py
│   ├── qa_mode.py
│   └── eval.py
├── app/
│   └── app.py
├── notebooks/
│   └── run_on_colab.ipynb
├── results/
└── report/
    └── REPORT.md

About

DSAI 413 Assignment 2: Chest X-Ray Multi-Modal System (MedGemma + ColPali RAG)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors