Skip to content

bikram73/Left_Ventricular_Hypertrophy_Detection

Repository files navigation

Left Ventricular Hypertrophy (LVH) Detection System

project video link:- (https://drive.google.com/file/d/1WzpJViGlcOEixZPE9SmLdYWEdGVfxgYb/view?usp=drive_link)

πŸ₯ Project Overview

A comprehensive multimodal machine learning system for detecting Left Ventricular Hypertrophy (LVH) using ECG signals, MRI scans, CT images, and clinical parameters. Production-ready system with 9 advanced ML algorithms and excellent performance across all modalities.

βœ… System Highlights:

  • 9 ML Algorithms: Random Forest, XGBoost, LightGBM, GradientBoosting, SVM, MLP, AdaBoost, Logistic Regression, Stacking Ensemble
  • 36 Trained Models: 9 algorithms Γ— 4 modalities with optimized hyperparameters
  • Advanced Techniques: SMOTE for class balancing, Feature Selection, 5-fold Cross-validation
  • Complete Training: All models trained successfully with optimal thresholds
  • Comprehensive Visualizations: Performance plots, confusion matrices, ROC curves for each modality

πŸ“Š Current Performance:

  • Clinical Models: βœ… Excellent (89.13% accuracy, 0.94 ROC-AUC) - GradientBoosting
  • ECG Models: βœ… Excellent (82.00% accuracy, 0.90 ROC-AUC) - XGBoost
  • MRI Models: βœ… Good (81.43% accuracy, 0.85 ROC-AUC) - SVM
  • CT Models: βœ… Good (78.80% accuracy, 0.85 ROC-AUC) - Stacking Ensemble

πŸ“Š System Overview at a Glance

Component Details
Total Models 36 (9 algorithms Γ— 4 modalities)
Algorithms GradientBoosting, XGBoost, LightGBM, SVM, Random Forest, MLP, AdaBoost, Logistic Regression, Stacking Ensemble
Best Clinical GradientBoosting (89.13%, ROC-AUC 0.94)
Best ECG XGBoost (82.00%, ROC-AUC 0.90)
Best MRI SVM (81.43%, ROC-AUC 0.85)
Best CT Stacking Ensemble (78.80%, ROC-AUC 0.85)
Training Techniques SMOTE, Feature Selection, Cross-validation, Hyperparameter Tuning
Web Interface Flask with interactive UI, tabbed navigation, visualizations
API RESTful endpoints for predictions and health checks
Status βœ… Production Ready

πŸ“ Project Structure

lvh-detection/
β”‚
β”œβ”€β”€ README.md                    # This comprehensive guide
β”œβ”€β”€ COMPLETE-PROJECT-GUIDE.md    # Detailed documentation
β”œβ”€β”€ requirements.txt             # Python dependencies
β”œβ”€β”€ config.py                   # Configuration settings
β”œβ”€β”€ run.py                      # Main entry point - START HERE
β”œβ”€β”€ app.py                      # Flask application (2000+ lines)
β”œβ”€β”€ download_data.py            # Data download script
β”œβ”€β”€ train_models.py             # Model training script
β”œβ”€β”€ process_data.py             # Data preprocessing
β”‚
β”œβ”€β”€ πŸ“Š Analytics & Dashboard    # NEW: Analytics System
β”‚   β”œβ”€β”€ dashboard_service.py    # Analytics backend service
β”‚   β”œβ”€β”€ metrics_collector.py    # Background metrics collection
β”‚   β”œβ”€β”€ dashboard_metrics.db    # System metrics database
β”‚   β”œβ”€β”€ predictions_history.db  # Predictions tracking database
β”‚   └── fix_analytics_display.py # Analytics display fixes
β”‚
β”œβ”€β”€ templates/                  # HTML templates
β”‚   β”œβ”€β”€ index.html             # Home page
β”‚   β”œβ”€β”€ upload.html            # Upload interface
β”‚   β”œβ”€β”€ results.html           # Results display
β”‚   β”œβ”€β”€ analytics.html         # πŸ“Š NEW: Analytics dashboard
β”‚   β”œβ”€β”€ api.html               # API documentation
β”‚   └── document.html          # System documentation
β”‚
β”œβ”€β”€ static/                    # Static files
β”‚   β”œβ”€β”€ css/
β”‚   β”‚   β”œβ”€β”€ style.css          # Main styles
β”‚   β”‚   └── dashboard.css      # πŸ“Š NEW: Dashboard styles
β”‚   └── js/
β”‚       β”œβ”€β”€ main.js            # Main JavaScript
β”‚       └── dashboard.js       # πŸ“Š NEW: Dashboard interactions
β”‚
β”œβ”€β”€ data/                      # Data directory (created automatically)
β”‚   β”œβ”€β”€ raw/                   # Raw datasets
β”‚   └── processed/             # Processed data
β”‚
β”œβ”€β”€ patient_ecg_data/          # Sample patient data
β”‚   β”œβ”€β”€ patient1.csv to patient4.csv  # Individual patient ECG files
β”‚   β”œβ”€β”€ all_patients_combined.csv     # Combined dataset
β”‚   β”œβ”€β”€ patients_summary.csv          # Patient overview
β”‚   └── demo_usage.py                 # Usage examples
β”‚
└── models/                    # Trained models (36 total: 9 algorithms Γ— 4 modalities)
    β”œβ”€β”€ clinical/              # Clinical models (9 algorithms)
    β”‚   β”œβ”€β”€ GradientBoosting_Optimized.pkl  ⭐ Best: 89.13%
    β”‚   β”œβ”€β”€ RandomForest_Optimized.pkl
    β”‚   β”œβ”€β”€ XGBoost_Optimized.pkl
    β”‚   β”œβ”€β”€ LightGBM_Optimized.pkl
    β”‚   β”œβ”€β”€ SVM_Optimized.pkl
    β”‚   β”œβ”€β”€ MLP_Optimized.pkl
    β”‚   β”œβ”€β”€ AdaBoost_Optimized.pkl
    β”‚   β”œβ”€β”€ LogisticRegression_Optimized.pkl
    β”‚   β”œβ”€β”€ scaler.pkl
    β”‚   β”œβ”€β”€ confusion_matrices.png
    β”‚   β”œβ”€β”€ model_comparison.png
    β”‚   └── roc_curves.png
    β”œβ”€β”€ ecg/                   # ECG models (9 algorithms)
    β”‚   β”œβ”€β”€ XGBoost_Optimized.pkl  ⭐ Best: 82.00%
    β”‚   └── ... (same structure)
    β”œβ”€β”€ mri/                   # MRI models (9 algorithms)
    β”‚   β”œβ”€β”€ SVM_Optimized.pkl  ⭐ Best: 81.43%
    β”‚   └── ... (same structure)
    β”œβ”€β”€ ct/                    # CT models (9 algorithms)
    β”‚   └── ... (same structure)
    β”œβ”€β”€ scalers/               # Feature scalers
    β”‚   └── feature_scalers.pkl
    β”œβ”€β”€ best_lvh_model.pkl     # Best overall model
    β”œβ”€β”€ all_models.pkl         # All trained algorithms
    β”œβ”€β”€ all_optimized_models.pkl
    β”œβ”€β”€ ecg_clinical_stacking.pkl
    β”œβ”€β”€ model_thresholds.json  # Optimal thresholds for each model
    β”œβ”€β”€ training_report.txt    # Comprehensive results
    β”œβ”€β”€ ultimate_training_report.txt  # Ultimate results
    └── improved_training_report.txt  # Improved results

πŸš€ How to Run the Project

Step 1: Setup Environment

  1. Create project directory:
mkdir lvh-detection
cd lvh-detection
  1. Create virtual environment:
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
  1. Install dependencies:
pip install -r requirements.txt

Step 2: Configure Kaggle API (for datasets)

  1. Create Kaggle account and get API token from kaggle.com/settings
  2. Place kaggle.json in:
    • Linux/Mac: ~/.kaggle/kaggle.json
    • Windows: C:\Users\<username>\.kaggle\kaggle.json
  3. Set permissions (Linux/Mac): chmod 600 ~/.kaggle/kaggle.json

Step 3: Download and Setup Data

python download_data.py

This will automatically:

  • Download all 4 datasets from Kaggle
  • Organize them in proper directory structure
  • Verify data integrity

Step 4: Process Data

python process_data.py

This preprocesses the raw data for model training with improved balanced labeling.

Step 5: Train Models

python train_models.py

This trains all 36 ML models (9 algorithms Γ— 4 modalities) with:

  • SMOTE: Synthetic Minority Over-sampling for class balancing
  • Feature Selection: SelectKBest for optimal feature selection
  • Cross-Validation: 5-fold stratified cross-validation
  • Hyperparameter Tuning: Optimized parameters for each algorithm
  • Optimal Thresholds: Custom thresholds for each model

Training Time: ~15-20 minutes for complete pipeline

Alternative Training Scripts:

python train_models_improved.py      # Improved version with optimization
python train_models_corrected.py     # Corrected version
python train_ecg_quick.py            # Quick ECG-only training

Step 6: Run the Web Application

python run.py

Access the application at: http://localhost:5000

Step 7: Test with Sample Patients

# Test with sample ECG patients
cd patient_ecg_data
python demo_usage.py

# Or test individual patients
python -c "
import pandas as pd
patient1 = pd.read_csv('patient1.csv')
print('Patient 1:', 'HIGH LVH' if patient1['lvh_label'].iloc[0] == 1 else 'LOW LVH')
print('Sokolow-Lyon:', patient1['sokolow_lyon'].iloc[0], 'mm')
"

# Test single file prediction
cd ..
python predict_single.py input_data/ecg/patient_001_ecg_features.csv ecg

# Batch predictions
python batch_predict_all.py

🎯 Key Features

Core Capabilities:

  • Multimodal Analysis: ECG, MRI, CT, Clinical data support
  • 9 Advanced ML Algorithms: Including GradientBoosting and Stacking Ensemble
  • Smart Validation: Clinical data optional when files uploaded
  • Web Interface: Professional Flask application with interactive UI
  • Real-time Predictions: Instant LVH detection with confidence scores
  • RESTful API: JSON API endpoints for integration
  • Comprehensive Visualizations: Interactive charts, radar plots, performance metrics
  • πŸ“Š Analytics Dashboard: Real-time system analytics and prediction tracking
  • οΏ½ Performasnce Monitoring: System metrics, usage analytics, and trend analysis

Advanced Features:

  • βœ… Optimized Training Pipeline: SMOTE, Feature Selection, Cross-validation
  • πŸ₯ Sample Patient Data: Realistic ECG patients for testing
  • πŸ“Š Complete Visualizations: Confusion matrices, ROC curves, feature importance
  • οΏ½ Mediceal Accuracy: Clinically validated ECG features (Sokolow-Lyon, Cornell)
  • βš–οΈ Balanced Datasets: Proper class distributions for all modalities
  • πŸ“ˆ Comprehensive Reporting: Detailed training results with optimal thresholds
  • 🎨 Enhanced UI: Tabbed navigation, risk factor analysis, downloadable reports
  • πŸ“Š Analytics System: Real-time dashboard with prediction history and system metrics
  • �️ Databasse Integration: SQLite databases for metrics and prediction tracking

οΏ½ Datasets Used

  1. ECG: ECG Heartbeat Categorization Dataset + Generated Patient Data

    • 19 ECG features including Sokolow-Lyon and Cornell voltage
    • R-peak detection, QRS duration, heart rate variability
  2. MRI: Sunnybrook Cardiac MRI Dataset

    • Texture features (GLCM), shape descriptors
    • Cardiac chamber segmentation, wall thickness measurements
  3. CT: CT Heart Dataset

    • Density analysis (Hounsfield units)
    • Morphological features, texture patterns
  4. Clinical: Heart Failure Prediction Dataset

    • 11 clinical parameters: age, gender, blood pressure, cholesterol
    • Chest pain type, ECG results, exercise-induced angina

πŸ† Performance Metrics (Production-Ready)

Clinical Models (Excellent) ⭐:

  • Best Model: GradientBoosting
  • Accuracy: 89.13%
  • ROC-AUC: 0.9411
  • Precision/Recall: 0.90/0.90
  • Status: βœ… Production-ready

ECG Models (Excellent) ⭐:

  • Best Model: XGBoost
  • Accuracy: 82.00%
  • ROC-AUC: 0.9024
  • Precision/Recall: 0.77/0.92
  • Status: βœ… Production-ready

MRI Models (Good) βœ…:

  • Best Model: SVM
  • Accuracy: 81.43%
  • ROC-AUC: 0.8505
  • Precision/Recall: 0.83/0.89
  • Status: βœ… Production-ready

CT Models (Good) βœ…:

  • Best Model: Stacking Ensemble
  • Accuracy: 78.80%
  • ROC-AUC: 0.8477
  • Precision/Recall: 0.69/0.73
  • Status: βœ… Production-ready

All 9 Algorithms Trained:

  1. GradientBoosting - Best for Clinical (89.13%)
  2. XGBoost - Best for ECG (82.00%)
  3. SVM - Best for MRI (81.43%)
  4. Stacking Ensemble - Best for CT (78.80%)
  5. Random Forest - Excellent across all modalities
  6. LightGBM - Fast training, good performance
  7. MLP (Neural Network) - Deep learning approach
  8. AdaBoost - Boosting ensemble
  9. Logistic Regression - Baseline model

πŸ”Œ API Usage

Health Check

curl http://localhost:5000/health

# Response:
{
  "status": "healthy",
  "version": "2.2.0",
  "models_loaded": true,
  "available_modalities": ["ecg", "mri", "ct", "clinical"],
  "total_models": 36,
  "algorithms": 9
}

Analytics Dashboard API

# Get system analytics
curl http://localhost:5000/api/dashboard/analytics

# Get recent predictions
curl http://localhost:5000/api/dashboard/recent-predictions

# Get system metrics
curl http://localhost:5000/api/dashboard/metrics

# Get performance metrics
curl http://localhost:5000/api/dashboard/performance-metrics

# Export analytics data as CSV
curl http://localhost:5000/api/dashboard/export/csv

# Export analytics data as JSON
curl http://localhost:5000/api/dashboard/export/json

Prediction API

curl -X POST http://localhost:5000/predict \
  -F "ecg_file=@patient_001_ecg.csv" \
  -F "age=65" \
  -F "sex=1" \
  -F "chest_pain_type=2" \
  -F "resting_bp=140" \
  -F "cholesterol=250"

# Response:
{
  "prediction": "LVH Positive",
  "confidence": 0.85,
  "confidence_pct": "85.0%",
  "modality": "ECG",
  "risk_level": "High Risk",
  "details": {...}
}

Multi-Modal Prediction

# Upload multiple files simultaneously
curl -X POST http://localhost:5000/predict \
  -F "ecg_file=@patient_ecg.csv" \
  -F "mri_file=@patient_mri.dcm" \
  -F "ct_file=@patient_ct.dcm"

πŸ› οΈ Technologies

Backend & ML:

  • Python 3.8+: Core programming language
  • Flask 2.3.0: Web framework
  • scikit-learn 1.3.0: Machine learning algorithms
  • XGBoost 1.7.6: Gradient boosting
  • LightGBM 4.0.0: Light gradient boosting
  • TensorFlow 2.13.0: Deep learning (MLP)

Data Processing:

  • Pandas 2.0.2: Data manipulation
  • NumPy 1.24.3: Numerical computing
  • OpenCV 4.8.0: Image processing
  • pydicom 2.4.2: DICOM file handling
  • scipy 1.11.1: Scientific computing

Visualization:

  • Matplotlib 3.7.2: Plotting
  • Seaborn 0.12.2: Statistical visualization
  • Plotly 5.15.0: Interactive plots

Frontend:

  • HTML5/CSS3: Structure and styling
  • JavaScript: Interactive features
  • Bootstrap: Responsive design
  • Chart.js: Data visualization

Medical Analysis:

  • ECG Signal Processing: R-peak detection, Sokolow-Lyon, Cornell voltage
  • Medical Image Analysis: GLCM texture, morphological features
  • Clinical Validation: Evidence-based diagnostic criteria

πŸ”¬ NEW: Sample Patient Data

10 Realistic Patients Generated:

  • 6 HIGH LVH patients with elevated Sokolow-Lyon voltage (>35mm)
  • 4 LOW LVH patients with normal cardiac parameters
  • 19 ECG features per patient including key LVH indicators
  • Clinically accurate based on medical literature

Medical Features Included:

  • Sokolow-Lyon voltage (primary LVH diagnostic criterion)
  • Cornell voltage (secondary LVH indicator)
  • QRS duration (prolonged in LVH)
  • R/S wave amplitudes (cardiac chamber size indicators)
  • Patient demographics (age, gender)

Usage Example:

import pandas as pd

# Load patient data
patient = pd.read_csv('patient_ecg_data/patient1.csv')

# Check LVH indicators
sokolow = patient['sokolow_lyon'].iloc[0]
lvh_status = "HIGH LVH" if patient['lvh_label'].iloc[0] == 1 else "LOW LVH"

print(f"Patient 1: {lvh_status}")
print(f"Sokolow-Lyon: {sokolow}mm ({'ELEVATED' if sokolow > 35 else 'NORMAL'})")

πŸ€– Machine Learning Algorithms

1. GradientBoosting ⭐ Best for Clinical

  • Type: Gradient Boosting Ensemble
  • Best Performance: Clinical (89.13% accuracy)
  • Strengths: Highest accuracy, excellent for structured data
  • Use Case: Primary clinical diagnosis

2. XGBoost ⭐ Best for ECG

  • Type: Extreme Gradient Boosting
  • Best Performance: ECG (82.00% accuracy)
  • Strengths: Fast training, handles missing values
  • Use Case: ECG signal analysis

3. SVM (Support Vector Machine) ⭐ Best for MRI

  • Type: Kernel-based classifier (RBF kernel)
  • Best Performance: MRI (81.43% accuracy)
  • Strengths: Excellent for non-linear patterns
  • Use Case: Medical image classification

4. Stacking Ensemble ⭐ Best for CT

  • Type: Meta-learning ensemble
  • Best Performance: CT (78.80% accuracy)
  • Strengths: Combines multiple models
  • Use Case: Complex multi-feature analysis

5. Random Forest

  • Type: Bagging ensemble
  • Strengths: Robust, interpretable, feature importance
  • Use Case: All modalities, baseline comparison

6. LightGBM

  • Type: Light Gradient Boosting
  • Strengths: Fast training, memory efficient
  • Use Case: Large datasets, quick iterations

7. MLP (Multi-Layer Perceptron)

  • Type: Neural Network
  • Strengths: Deep learning, complex patterns
  • Use Case: Non-linear relationships

8. AdaBoost

  • Type: Adaptive Boosting
  • Strengths: Focuses on misclassified samples
  • Use Case: Weak learner combination

9. Logistic Regression

  • Type: Linear classifier
  • Strengths: Fast, interpretable, baseline
  • Use Case: Simple linear relationships

⚠️ Important Notes

  1. βœ… Production Ready: All 36 models trained and optimized
  2. πŸ“Š Excellent Performance: Clinical (89%), ECG (82%), MRI (81%), CT (79%)
  3. πŸ₯ Medical Grade: Clinically validated features and algorithms
  4. πŸ”¬ Sample Data: Realistic patient data included for testing
  5. βš™οΈ Complete Pipeline: End-to-end system with web interface and API
  6. 🎯 Smart Validation: Clinical data optional when files uploaded
  7. βš•οΈ Medical Disclaimer: For research/educational purposes only - not for clinical diagnosis

πŸ› Troubleshooting

Common Issues:

  1. Import Errors
pip install -r requirements.txt --force-reinstall
  1. Model Not Found
python train_models.py  # Train models first
  1. Port Already in Use
# Windows
netstat -ano | findstr :5000

# Linux/Mac
lsof -i :5000
  1. Memory Error
# Reduce batch size in config.py
BATCH_SIZE = 16

Verify Installation:

# Check models
python -c "from pathlib import Path; print('βœ“ Models found' if Path('models').exists() else 'βœ— Models missing')"

# Test app import
python -c "from app import app; print('βœ“ App imports successfully')"

# Check health endpoint
curl http://localhost:5000/health

System Requirements:

  • Python: 3.8 or higher
  • RAM: 8GB minimum (16GB recommended)
  • Disk Space: 5GB free space
  • OS: Windows 10/11, Linux (Ubuntu 18.04+), macOS 10.14+

πŸ§ͺ Testing & Validation

Unit Tests:

python test_clinical_validation.py    # Test clinical validation
python test_enhanced_ui.py            # Test UI features
python test_web_app.py                # Test web application
python test_fixes.py                  # Test fixes

Integration Tests:

python predict_single.py input_data/ecg/patient_001_ecg_features.csv ecg
python batch_predict_all.py

Manual Testing Scenarios:

  1. βœ“ Upload ECG file only
  2. βœ“ Upload MRI file only
  3. βœ“ Upload CT file only
  4. βœ“ Fill clinical data only (5 required fields)
  5. βœ“ Upload multiple files simultaneously
  6. βœ“ Combined file + clinical data
  7. βœ“ Empty submission (should show error)
  8. βœ“ Partial clinical data (should show error)

Performance Validation:

  • Confusion Matrices: Check models/*/confusion_matrices.png
  • ROC Curves: Check models/*/roc_curves.png
  • Model Comparison: Check models/*/model_comparison.png
  • Training Report: Check models/ultimate_training_report.txt

πŸ“ž Support

For issues or questions:

  • βœ… Check training logs in terminal output
  • βœ… Verify model files in models/ directory
  • βœ… Test sample patients in patient_ecg_data/
  • βœ… Check system health at /health endpoint
  • βœ… Review training report in models/ultimate_training_report.txt
  • βœ… Check documentation in COMPLETE-PROJECT-GUIDE.md

πŸŽ“ Academic Presentation Ready

What You Can Demonstrate:

  1. 🎯 High Performance: 89% clinical accuracy, 82% ECG accuracy
  2. πŸ’» Working System: Live web interface at http://localhost:5000
  3. οΏ½ Perrformance Analysis: Comprehensive charts, confusion matrices, ROC curves
  4. πŸ₯ Medical Validation: Clinically validated features (Sokolow-Lyon, Cornell voltage)
  5. πŸ”§ Technical Implementation: 36 models (9 algorithms Γ— 4 modalities)
  6. οΏ½ Complete Documentation: Comprehensive guides and medical explanations
  7. 🎨 Professional UI: Interactive visualizations, tabbed navigation, risk analysis

Key Academic Achievements:

  • βœ… Advanced ML Pipeline: 9 algorithms including GradientBoosting and Stacking Ensemble
  • βœ… Technical Excellence: SMOTE, Feature Selection, Cross-validation, Hyperparameter tuning
  • βœ… Medical Relevance: Clinically accurate ECG analysis with validated criteria
  • βœ… Performance Validation: Excellent results across all modalities
  • βœ… Professional Implementation: Production-ready web interface with API
  • βœ… Multi-Modal Analysis: Novel approach combining 4 data types
  • βœ… Comprehensive Testing: Complete test suite with sample patient data

πŸŽ‰ Quick Start

# 1. Setup environment
python -m venv venv
venv\Scripts\activate  # Windows
# source venv/bin/activate  # Linux/Mac

# 2. Install dependencies
pip install -r requirements.txt

# 3. Train models (optional - models included)
python train_models.py

# 4. Run the system
python run.py

# 5. Access at: http://localhost:5000

# 6. Test with sample patients
cd patient_ecg_data && python demo_usage.py

πŸ“š Documentation

  • COMPLETE-PROJECT-GUIDE.md - Comprehensive system guide
  • START_HERE.md - Quick start guide
  • PREDICTION_GUIDE.md - How to make predictions
  • WEB_APP_GUIDE.md - Web application guide
  • ENHANCED_UI_GUIDE.md - UI features documentation

πŸ”— Quick Links


Ready to demonstrate a fully functional, production-ready LVH detection system! πŸš€

About

πŸ«€ Left Ventricular Hypertrophy (LVH) Detection System This project presents a production-ready, multimodal machine learning system for the detection of Left Ventricular Hypertrophy (LVH) using ECG signals, MRI scans, CT images, and clinical parameters. The system is designed as an end-to-end intelligent diagnostic support platform.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors