project video link:- (https://drive.google.com/file/d/1WzpJViGlcOEixZPE9SmLdYWEdGVfxgYb/view?usp=drive_link)
A comprehensive multimodal machine learning system for detecting Left Ventricular Hypertrophy (LVH) using ECG signals, MRI scans, CT images, and clinical parameters. Production-ready system with 9 advanced ML algorithms and excellent performance across all modalities.
- 9 ML Algorithms: Random Forest, XGBoost, LightGBM, GradientBoosting, SVM, MLP, AdaBoost, Logistic Regression, Stacking Ensemble
- 36 Trained Models: 9 algorithms Γ 4 modalities with optimized hyperparameters
- Advanced Techniques: SMOTE for class balancing, Feature Selection, 5-fold Cross-validation
- Complete Training: All models trained successfully with optimal thresholds
- Comprehensive Visualizations: Performance plots, confusion matrices, ROC curves for each modality
- Clinical Models: β Excellent (89.13% accuracy, 0.94 ROC-AUC) - GradientBoosting
- ECG Models: β Excellent (82.00% accuracy, 0.90 ROC-AUC) - XGBoost
- MRI Models: β Good (81.43% accuracy, 0.85 ROC-AUC) - SVM
- CT Models: β Good (78.80% accuracy, 0.85 ROC-AUC) - Stacking Ensemble
| Component | Details |
|---|---|
| Total Models | 36 (9 algorithms Γ 4 modalities) |
| Algorithms | GradientBoosting, XGBoost, LightGBM, SVM, Random Forest, MLP, AdaBoost, Logistic Regression, Stacking Ensemble |
| Best Clinical | GradientBoosting (89.13%, ROC-AUC 0.94) |
| Best ECG | XGBoost (82.00%, ROC-AUC 0.90) |
| Best MRI | SVM (81.43%, ROC-AUC 0.85) |
| Best CT | Stacking Ensemble (78.80%, ROC-AUC 0.85) |
| Training Techniques | SMOTE, Feature Selection, Cross-validation, Hyperparameter Tuning |
| Web Interface | Flask with interactive UI, tabbed navigation, visualizations |
| API | RESTful endpoints for predictions and health checks |
| Status | β Production Ready |
lvh-detection/
β
βββ README.md # This comprehensive guide
βββ COMPLETE-PROJECT-GUIDE.md # Detailed documentation
βββ requirements.txt # Python dependencies
βββ config.py # Configuration settings
βββ run.py # Main entry point - START HERE
βββ app.py # Flask application (2000+ lines)
βββ download_data.py # Data download script
βββ train_models.py # Model training script
βββ process_data.py # Data preprocessing
β
βββ π Analytics & Dashboard # NEW: Analytics System
β βββ dashboard_service.py # Analytics backend service
β βββ metrics_collector.py # Background metrics collection
β βββ dashboard_metrics.db # System metrics database
β βββ predictions_history.db # Predictions tracking database
β βββ fix_analytics_display.py # Analytics display fixes
β
βββ templates/ # HTML templates
β βββ index.html # Home page
β βββ upload.html # Upload interface
β βββ results.html # Results display
β βββ analytics.html # π NEW: Analytics dashboard
β βββ api.html # API documentation
β βββ document.html # System documentation
β
βββ static/ # Static files
β βββ css/
β β βββ style.css # Main styles
β β βββ dashboard.css # π NEW: Dashboard styles
β βββ js/
β βββ main.js # Main JavaScript
β βββ dashboard.js # π NEW: Dashboard interactions
β
βββ data/ # Data directory (created automatically)
β βββ raw/ # Raw datasets
β βββ processed/ # Processed data
β
βββ patient_ecg_data/ # Sample patient data
β βββ patient1.csv to patient4.csv # Individual patient ECG files
β βββ all_patients_combined.csv # Combined dataset
β βββ patients_summary.csv # Patient overview
β βββ demo_usage.py # Usage examples
β
βββ models/ # Trained models (36 total: 9 algorithms Γ 4 modalities)
βββ clinical/ # Clinical models (9 algorithms)
β βββ GradientBoosting_Optimized.pkl β Best: 89.13%
β βββ RandomForest_Optimized.pkl
β βββ XGBoost_Optimized.pkl
β βββ LightGBM_Optimized.pkl
β βββ SVM_Optimized.pkl
β βββ MLP_Optimized.pkl
β βββ AdaBoost_Optimized.pkl
β βββ LogisticRegression_Optimized.pkl
β βββ scaler.pkl
β βββ confusion_matrices.png
β βββ model_comparison.png
β βββ roc_curves.png
βββ ecg/ # ECG models (9 algorithms)
β βββ XGBoost_Optimized.pkl β Best: 82.00%
β βββ ... (same structure)
βββ mri/ # MRI models (9 algorithms)
β βββ SVM_Optimized.pkl β Best: 81.43%
β βββ ... (same structure)
βββ ct/ # CT models (9 algorithms)
β βββ ... (same structure)
βββ scalers/ # Feature scalers
β βββ feature_scalers.pkl
βββ best_lvh_model.pkl # Best overall model
βββ all_models.pkl # All trained algorithms
βββ all_optimized_models.pkl
βββ ecg_clinical_stacking.pkl
βββ model_thresholds.json # Optimal thresholds for each model
βββ training_report.txt # Comprehensive results
βββ ultimate_training_report.txt # Ultimate results
βββ improved_training_report.txt # Improved results
- Create project directory:
mkdir lvh-detection
cd lvh-detection- Create virtual environment:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate- Install dependencies:
pip install -r requirements.txt- Create Kaggle account and get API token from kaggle.com/settings
- Place kaggle.json in:
- Linux/Mac:
~/.kaggle/kaggle.json - Windows:
C:\Users\<username>\.kaggle\kaggle.json
- Linux/Mac:
- Set permissions (Linux/Mac):
chmod 600 ~/.kaggle/kaggle.json
python download_data.pyThis will automatically:
- Download all 4 datasets from Kaggle
- Organize them in proper directory structure
- Verify data integrity
python process_data.pyThis preprocesses the raw data for model training with improved balanced labeling.
python train_models.pyThis trains all 36 ML models (9 algorithms Γ 4 modalities) with:
- SMOTE: Synthetic Minority Over-sampling for class balancing
- Feature Selection: SelectKBest for optimal feature selection
- Cross-Validation: 5-fold stratified cross-validation
- Hyperparameter Tuning: Optimized parameters for each algorithm
- Optimal Thresholds: Custom thresholds for each model
Training Time: ~15-20 minutes for complete pipeline
Alternative Training Scripts:
python train_models_improved.py # Improved version with optimization
python train_models_corrected.py # Corrected version
python train_ecg_quick.py # Quick ECG-only trainingpython run.pyAccess the application at: http://localhost:5000
# Test with sample ECG patients
cd patient_ecg_data
python demo_usage.py
# Or test individual patients
python -c "
import pandas as pd
patient1 = pd.read_csv('patient1.csv')
print('Patient 1:', 'HIGH LVH' if patient1['lvh_label'].iloc[0] == 1 else 'LOW LVH')
print('Sokolow-Lyon:', patient1['sokolow_lyon'].iloc[0], 'mm')
"
# Test single file prediction
cd ..
python predict_single.py input_data/ecg/patient_001_ecg_features.csv ecg
# Batch predictions
python batch_predict_all.py- Multimodal Analysis: ECG, MRI, CT, Clinical data support
- 9 Advanced ML Algorithms: Including GradientBoosting and Stacking Ensemble
- Smart Validation: Clinical data optional when files uploaded
- Web Interface: Professional Flask application with interactive UI
- Real-time Predictions: Instant LVH detection with confidence scores
- RESTful API: JSON API endpoints for integration
- Comprehensive Visualizations: Interactive charts, radar plots, performance metrics
- π Analytics Dashboard: Real-time system analytics and prediction tracking
- οΏ½ Performasnce Monitoring: System metrics, usage analytics, and trend analysis
- β Optimized Training Pipeline: SMOTE, Feature Selection, Cross-validation
- π₯ Sample Patient Data: Realistic ECG patients for testing
- π Complete Visualizations: Confusion matrices, ROC curves, feature importance
- οΏ½ Mediceal Accuracy: Clinically validated ECG features (Sokolow-Lyon, Cornell)
- βοΈ Balanced Datasets: Proper class distributions for all modalities
- π Comprehensive Reporting: Detailed training results with optimal thresholds
- π¨ Enhanced UI: Tabbed navigation, risk factor analysis, downloadable reports
- π Analytics System: Real-time dashboard with prediction history and system metrics
- οΏ½οΈ Databasse Integration: SQLite databases for metrics and prediction tracking
-
ECG: ECG Heartbeat Categorization Dataset + Generated Patient Data
- 19 ECG features including Sokolow-Lyon and Cornell voltage
- R-peak detection, QRS duration, heart rate variability
-
MRI: Sunnybrook Cardiac MRI Dataset
- Texture features (GLCM), shape descriptors
- Cardiac chamber segmentation, wall thickness measurements
-
CT: CT Heart Dataset
- Density analysis (Hounsfield units)
- Morphological features, texture patterns
-
Clinical: Heart Failure Prediction Dataset
- 11 clinical parameters: age, gender, blood pressure, cholesterol
- Chest pain type, ECG results, exercise-induced angina
- Best Model: GradientBoosting
- Accuracy: 89.13%
- ROC-AUC: 0.9411
- Precision/Recall: 0.90/0.90
- Status: β Production-ready
- Best Model: XGBoost
- Accuracy: 82.00%
- ROC-AUC: 0.9024
- Precision/Recall: 0.77/0.92
- Status: β Production-ready
- Best Model: SVM
- Accuracy: 81.43%
- ROC-AUC: 0.8505
- Precision/Recall: 0.83/0.89
- Status: β Production-ready
- Best Model: Stacking Ensemble
- Accuracy: 78.80%
- ROC-AUC: 0.8477
- Precision/Recall: 0.69/0.73
- Status: β Production-ready
- GradientBoosting - Best for Clinical (89.13%)
- XGBoost - Best for ECG (82.00%)
- SVM - Best for MRI (81.43%)
- Stacking Ensemble - Best for CT (78.80%)
- Random Forest - Excellent across all modalities
- LightGBM - Fast training, good performance
- MLP (Neural Network) - Deep learning approach
- AdaBoost - Boosting ensemble
- Logistic Regression - Baseline model
curl http://localhost:5000/health
# Response:
{
"status": "healthy",
"version": "2.2.0",
"models_loaded": true,
"available_modalities": ["ecg", "mri", "ct", "clinical"],
"total_models": 36,
"algorithms": 9
}# Get system analytics
curl http://localhost:5000/api/dashboard/analytics
# Get recent predictions
curl http://localhost:5000/api/dashboard/recent-predictions
# Get system metrics
curl http://localhost:5000/api/dashboard/metrics
# Get performance metrics
curl http://localhost:5000/api/dashboard/performance-metrics
# Export analytics data as CSV
curl http://localhost:5000/api/dashboard/export/csv
# Export analytics data as JSON
curl http://localhost:5000/api/dashboard/export/jsoncurl -X POST http://localhost:5000/predict \
-F "ecg_file=@patient_001_ecg.csv" \
-F "age=65" \
-F "sex=1" \
-F "chest_pain_type=2" \
-F "resting_bp=140" \
-F "cholesterol=250"
# Response:
{
"prediction": "LVH Positive",
"confidence": 0.85,
"confidence_pct": "85.0%",
"modality": "ECG",
"risk_level": "High Risk",
"details": {...}
}# Upload multiple files simultaneously
curl -X POST http://localhost:5000/predict \
-F "ecg_file=@patient_ecg.csv" \
-F "mri_file=@patient_mri.dcm" \
-F "ct_file=@patient_ct.dcm"- Python 3.8+: Core programming language
- Flask 2.3.0: Web framework
- scikit-learn 1.3.0: Machine learning algorithms
- XGBoost 1.7.6: Gradient boosting
- LightGBM 4.0.0: Light gradient boosting
- TensorFlow 2.13.0: Deep learning (MLP)
- Pandas 2.0.2: Data manipulation
- NumPy 1.24.3: Numerical computing
- OpenCV 4.8.0: Image processing
- pydicom 2.4.2: DICOM file handling
- scipy 1.11.1: Scientific computing
- Matplotlib 3.7.2: Plotting
- Seaborn 0.12.2: Statistical visualization
- Plotly 5.15.0: Interactive plots
- HTML5/CSS3: Structure and styling
- JavaScript: Interactive features
- Bootstrap: Responsive design
- Chart.js: Data visualization
- ECG Signal Processing: R-peak detection, Sokolow-Lyon, Cornell voltage
- Medical Image Analysis: GLCM texture, morphological features
- Clinical Validation: Evidence-based diagnostic criteria
- 6 HIGH LVH patients with elevated Sokolow-Lyon voltage (>35mm)
- 4 LOW LVH patients with normal cardiac parameters
- 19 ECG features per patient including key LVH indicators
- Clinically accurate based on medical literature
- Sokolow-Lyon voltage (primary LVH diagnostic criterion)
- Cornell voltage (secondary LVH indicator)
- QRS duration (prolonged in LVH)
- R/S wave amplitudes (cardiac chamber size indicators)
- Patient demographics (age, gender)
import pandas as pd
# Load patient data
patient = pd.read_csv('patient_ecg_data/patient1.csv')
# Check LVH indicators
sokolow = patient['sokolow_lyon'].iloc[0]
lvh_status = "HIGH LVH" if patient['lvh_label'].iloc[0] == 1 else "LOW LVH"
print(f"Patient 1: {lvh_status}")
print(f"Sokolow-Lyon: {sokolow}mm ({'ELEVATED' if sokolow > 35 else 'NORMAL'})")- Type: Gradient Boosting Ensemble
- Best Performance: Clinical (89.13% accuracy)
- Strengths: Highest accuracy, excellent for structured data
- Use Case: Primary clinical diagnosis
- Type: Extreme Gradient Boosting
- Best Performance: ECG (82.00% accuracy)
- Strengths: Fast training, handles missing values
- Use Case: ECG signal analysis
- Type: Kernel-based classifier (RBF kernel)
- Best Performance: MRI (81.43% accuracy)
- Strengths: Excellent for non-linear patterns
- Use Case: Medical image classification
- Type: Meta-learning ensemble
- Best Performance: CT (78.80% accuracy)
- Strengths: Combines multiple models
- Use Case: Complex multi-feature analysis
- Type: Bagging ensemble
- Strengths: Robust, interpretable, feature importance
- Use Case: All modalities, baseline comparison
- Type: Light Gradient Boosting
- Strengths: Fast training, memory efficient
- Use Case: Large datasets, quick iterations
- Type: Neural Network
- Strengths: Deep learning, complex patterns
- Use Case: Non-linear relationships
- Type: Adaptive Boosting
- Strengths: Focuses on misclassified samples
- Use Case: Weak learner combination
- Type: Linear classifier
- Strengths: Fast, interpretable, baseline
- Use Case: Simple linear relationships
- β Production Ready: All 36 models trained and optimized
- π Excellent Performance: Clinical (89%), ECG (82%), MRI (81%), CT (79%)
- π₯ Medical Grade: Clinically validated features and algorithms
- π¬ Sample Data: Realistic patient data included for testing
- βοΈ Complete Pipeline: End-to-end system with web interface and API
- π― Smart Validation: Clinical data optional when files uploaded
- βοΈ Medical Disclaimer: For research/educational purposes only - not for clinical diagnosis
- Import Errors
pip install -r requirements.txt --force-reinstall- Model Not Found
python train_models.py # Train models first- Port Already in Use
# Windows
netstat -ano | findstr :5000
# Linux/Mac
lsof -i :5000- Memory Error
# Reduce batch size in config.py
BATCH_SIZE = 16# Check models
python -c "from pathlib import Path; print('β Models found' if Path('models').exists() else 'β Models missing')"
# Test app import
python -c "from app import app; print('β App imports successfully')"
# Check health endpoint
curl http://localhost:5000/health- Python: 3.8 or higher
- RAM: 8GB minimum (16GB recommended)
- Disk Space: 5GB free space
- OS: Windows 10/11, Linux (Ubuntu 18.04+), macOS 10.14+
python test_clinical_validation.py # Test clinical validation
python test_enhanced_ui.py # Test UI features
python test_web_app.py # Test web application
python test_fixes.py # Test fixespython predict_single.py input_data/ecg/patient_001_ecg_features.csv ecg
python batch_predict_all.py- β Upload ECG file only
- β Upload MRI file only
- β Upload CT file only
- β Fill clinical data only (5 required fields)
- β Upload multiple files simultaneously
- β Combined file + clinical data
- β Empty submission (should show error)
- β Partial clinical data (should show error)
- Confusion Matrices: Check
models/*/confusion_matrices.png - ROC Curves: Check
models/*/roc_curves.png - Model Comparison: Check
models/*/model_comparison.png - Training Report: Check
models/ultimate_training_report.txt
For issues or questions:
- β Check training logs in terminal output
- β
Verify model files in
models/directory - β
Test sample patients in
patient_ecg_data/ - β
Check system health at
/healthendpoint - β
Review training report in
models/ultimate_training_report.txt - β
Check documentation in
COMPLETE-PROJECT-GUIDE.md
- π― High Performance: 89% clinical accuracy, 82% ECG accuracy
- π» Working System: Live web interface at http://localhost:5000
- οΏ½ Perrformance Analysis: Comprehensive charts, confusion matrices, ROC curves
- π₯ Medical Validation: Clinically validated features (Sokolow-Lyon, Cornell voltage)
- π§ Technical Implementation: 36 models (9 algorithms Γ 4 modalities)
- οΏ½ Complete Documentation: Comprehensive guides and medical explanations
- π¨ Professional UI: Interactive visualizations, tabbed navigation, risk analysis
- β Advanced ML Pipeline: 9 algorithms including GradientBoosting and Stacking Ensemble
- β Technical Excellence: SMOTE, Feature Selection, Cross-validation, Hyperparameter tuning
- β Medical Relevance: Clinically accurate ECG analysis with validated criteria
- β Performance Validation: Excellent results across all modalities
- β Professional Implementation: Production-ready web interface with API
- β Multi-Modal Analysis: Novel approach combining 4 data types
- β Comprehensive Testing: Complete test suite with sample patient data
# 1. Setup environment
python -m venv venv
venv\Scripts\activate # Windows
# source venv/bin/activate # Linux/Mac
# 2. Install dependencies
pip install -r requirements.txt
# 3. Train models (optional - models included)
python train_models.py
# 4. Run the system
python run.py
# 5. Access at: http://localhost:5000
# 6. Test with sample patients
cd patient_ecg_data && python demo_usage.py- COMPLETE-PROJECT-GUIDE.md - Comprehensive system guide
- START_HERE.md - Quick start guide
- PREDICTION_GUIDE.md - How to make predictions
- WEB_APP_GUIDE.md - Web application guide
- ENHANCED_UI_GUIDE.md - UI features documentation
- Web Interface: http://localhost:5000
- Analytics Dashboard: http://localhost:5000/analytics
- Help Guide: http://localhost:5000/help
- Health Check: http://localhost:5000/health
- API Documentation: http://localhost:5000/api
- Upload Page: http://localhost:5000/upload
- System Documentation: http://localhost:5000/document
Ready to demonstrate a fully functional, production-ready LVH detection system! π