A machine learning system that predicts whether a bacterial isolate will be resistant or sensitive to a given antibiotic — using clinical metadata, species information, prior antibiotic use, and genomic features where available.
Designed to help clinicians pick the right antibiotic faster with explainable output.
- Preprocesses and engineers features from clinical AMR datasets
- Trains and compares Logistic Regression, Random Forest, and XGBoost models
- Evaluates model performance with appropriate metrics for imbalanced medical data
- Outputs interpretable predictions to support clinical decision-making
| Model | Accuracy | AUC-ROC | Precision | Recall | F1-Score |
|---|---|---|---|---|---|
| Logistic Regression | 78% | 0.81 | 0.75 | 0.70 | 0.72 |
| Random Forest | 85% | 0.88 | 0.83 | 0.81 | 0.82 |
| XGBoost | 89% | 0.91 | 0.88 | 0.85 | 0.86 |
Dataset: BVBRC_genome_amr (Bacterial and Viral Bioinformatics Resource Center) 2,150 clinical samples | 42 features | Class balance: 60% sensitive, 40% resistant
Top 3 Most Important Features (SHAP):
- Prior antibiotic exposure - 28% impact
- Bacterial species type - 24% impact
- Patient age - 18% impact
Python, Scikit-learn, Pandas, NumPy, Google Colab
Raw Clinical Data → Preprocessing → Feature Engineering → Model Training → Evaluation → Explainable Output
# Open in Google Colab or locally
pip install scikit-learn pandas numpy
jupyter notebook Antibiotic_Resistence_preduction.ipynbMohammad Ayesha Summaiyya — msumaiya03579@gmail.com