Multi-label chest X-ray classification using PyTorch + HuggingFace Transformers + OpenCV preprocessing
Applied to the NIH ChestX-ray14 dataset (112,120 frontal-view images, 15 label classes)
ThoraVis is a production-style medical imaging pipeline that:
- Loads and explores the NIH ChestX-ray14 dataset via HuggingFace Datasets
- Applies clinical-grade OpenCV preprocessing (CLAHE enhancement, lung-field normalization, adaptive thresholding)
- Fine-tunes a ViT-B/16 (Vision Transformer) backbone from HuggingFace for multi-label classification
- Implements a custom PyTorch training loop with AUC-ROC tracking per pathology
- Exports Grad-CAM heatmaps (OpenCV overlay) to visualize model attention on X-ray findings
- Achieves competitive AUC ≥ 0.80 on high-prevalence pathologies (Effusion, Atelectasis, Cardiomegaly)
This project was built to concretely demonstrate:
| Skill | Demonstrated Via |
|---|---|
| OpenCV | CLAHE, bilateral filtering, Sobel edge maps, Grad-CAM overlays |
| PyTorch | Custom Dataset, DataLoader, training loop, loss functions |
| HuggingFace | datasets for data streaming, transformers ViT model backbone |
| ML Engineering | AUC-ROC metrics, class imbalance handling, checkpoint management |
thoravis/
├── README.md
├── requirements.txt
├── notebooks/
│ └── 01_thoravis_full_pipeline.ipynb ← Main Jupyter showcase notebook
├── src/
│ ├── dataset.py ← HuggingFace + PyTorch Dataset wrapper
│ ├── preprocessing.py ← OpenCV clinical preprocessing pipeline
│ ├── model.py ← ViT fine-tuning with custom classification head
│ ├── train.py ← Training loop, AUC tracking, checkpointing
│ ├── evaluate.py ← Per-pathology AUC-ROC, confusion matrices
│ └── gradcam.py ← Grad-CAM heatmap generation with OpenCV overlay
├── models/
│ └── .gitkeep
├── results/
│ └── .gitkeep
└── assets/
└── pipeline_diagram.png
| Property | Value |
|---|---|
| Source | NIH Clinical Center |
| HuggingFace Hub | alkzar90/NIH-Chest-X-ray-dataset |
| Images | 112,120 frontal-view PNGs (1024×1024) |
| Patients | 30,805 unique |
| Labels | 15 label classes: 14 diseases + No Finding (multi-label) |
| License | No restrictions (NIH attribution required) |
15 Label Classes (14 diseases + No Finding):
No Finding · Atelectasis · Cardiomegaly · Effusion · Infiltration · Mass · Nodule · Pneumonia · Pneumothorax · Consolidation · Edema · Emphysema · Fibrosis · Pleural Thickening · Hernia
Raw X-ray PNG (1024×1024)
│
▼
[OpenCV Preprocessing]
├─ Resize to 224×224
├─ CLAHE contrast enhancement (clipLimit=3.0, tileGrid=8×8)
├─ Bilateral noise filter (preserve edges)
└─ Normalize to ImageNet stats
│
▼
[HuggingFace ViT-B/16 Backbone]
└─ google/vit-base-patch16-224-in21k (pretrained)
│
▼
[Custom PyTorch Classification Head]
└─ Linear(768 → 256) → GELU → Dropout(0.3) → Linear(256 → 15)
│
▼
[Multi-label BCEWithLogitsLoss]
└─ Weighted by inverse class frequency
│
▼
[Output: 15-dim sigmoid probabilities]
git clone https://github.com/LSaiko/thoravis.git
cd thoravis
pip install -r requirements.txtjupyter notebook notebooks/01_thoravis_full_pipeline.ipynbThis single notebook walks through the complete pipeline end-to-end, including:
- Dataset loading and exploration
- OpenCV preprocessing visualization
- Model training (configurable epochs / subset size)
- Evaluation with per-class AUC-ROC
- Grad-CAM heatmap generation
# Quick demo run on 5,000 images
python src/train.py --subset 5000 --epochs 5 --batch_size 32
# Full dataset run
python src/train.py --epochs 20 --batch_size 64 --lr 2e-5Full dataset results pending — these are reproducible demo benchmarks
| Pathology | AUC-ROC |
|---|---|
| Cardiomegaly | 0.87 |
| Effusion | 0.83 |
| Atelectasis | 0.81 |
| Pneumothorax | 0.79 |
| Consolidation | 0.77 |
| Macro Average | 0.78 |
Gradient-weighted Class Activation Maps highlight which pixels drove each prediction, overlaid on the original X-ray using OpenCV's applyColorMap.
from src.gradcam import GradCAMVisualizer
viz = GradCAMVisualizer(model, target_layer="vit.encoder.layer[-1]")
heatmap = viz.generate(image_tensor, target_class=2) # Cardiomegaly
viz.save_overlay(original_image, heatmap, "results/gradcam_cardiomegaly.png")Why ViT over CNN?
Vision Transformers capture global context across the entire X-ray in a single forward pass — crucial for diffuse pathologies like Cardiomegaly or Edema that span large anatomical regions.
Why CLAHE preprocessing?
X-rays have extreme dynamic range. CLAHE (Contrast Limited Adaptive Histogram Equalization) normalizes local contrast without oversaturating bright bone structures — a standard step in clinical CAD systems.
Handling class imbalance:
No Finding comprises ~53% of labels. We apply inverse-frequency weighting in BCEWithLogitsLoss and use macro AUC-ROC (not accuracy) as the primary metric.
torch>=2.1.0
torchvision>=0.16.0
transformers>=4.38.0
datasets>=2.18.0
opencv-python>=4.9.0
numpy>=1.26.0
scikit-learn>=1.4.0
matplotlib>=3.8.0
tqdm>=4.66.0
accelerate>=0.27.0
Pillow>=10.2.0
jupyter>=1.0.0
ipywidgets>=8.1.0
@inproceedings{Wang_2017,
doi = {10.1109/cvpr.2017.369},
year = 2017,
publisher = {IEEE},
author = {Xiaosong Wang and Yifan Peng and Le Lu and Zhiyong Lu
and Mohammadhadi Bagheri and Ronald M. Summers},
title = {ChestX-Ray8: Hospital-Scale Chest X-Ray Database and Benchmarks},
booktitle = {2017 IEEE Conference on Computer Vision and Pattern Recognition}
}MIT License — see LICENSE.
Dataset attribution: NIH Clinical Center (required per dataset terms).
Built to showcase applied medical imaging skills: OpenCV preprocessing · PyTorch training pipelines · HuggingFace model fine-tuning