A production-ready, end-to-end Machine Learning system for credit risk assessment
Built as part of the CodeAlpha Machine Learning Internship
๐ Live Demo โข ๐ Dataset โข ๐ ๏ธ Installation โข ๐ Documentation
This project implements a complete, professional Credit Scoring System that predicts whether a loan applicant is creditworthy or likely to default โ using real financial and personal data.
It covers the full ML lifecycle:
- Real dataset ingestion from UCI ML Repository
- Data cleaning, EDA, and feature engineering
- Training & comparing 5 ML classification models
- Hyperparameter tuning with GridSearchCV
- SHAP-based model explainability
- A polished, interactive Streamlit web dashboard
- Downloadable prediction reports
| Feature | Description |
|---|---|
| ๐ Real Dataset | UCI German Credit Dataset โ 1,000 real applicants, 20 features |
| ๐ฌ Deep EDA | Correlation heatmaps, 3D scatter plots, risk trend analysis |
| โ๏ธ Feature Engineering | Debt ratios, age groups, credit tiers, encoding & scaling |
| ๐ค 5 ML Models | Logistic Regression, Decision Tree, Random Forest, GBM, XGBoost |
| ๐ Full Evaluation | Accuracy, Precision, Recall, F1, ROC-AUC, Confusion Matrix |
| ๐๏ธ Hyperparameter Tuning | GridSearchCV with 5-fold cross-validation |
| ๐ฎ Live Prediction | Interactive form with risk gauge and confidence breakdown |
| ๐ง Explainable AI | SHAP values + feature importance for every prediction |
| ๐ฅ Export Reports | Download prediction as CSV or JSON |
| ๐ Dark Mode UI | Modern, polished Streamlit dashboard |
Backend: Python 3.10+
ML Framework: Scikit-learn, XGBoost
Data: Pandas, NumPy, SciPy
Visualization: Plotly, Seaborn, Matplotlib
Web App: Streamlit
Explainability: SHAP
Model Saving: Joblib
German Credit Dataset โ UCI Machine Learning Repository
Created by: Prof. Dr. Hans Hofmann, Universitรคt Hamburg (1994)
| Property | Value |
|---|---|
| Samples | 1,000 |
| Features | 20 (financial + personal) |
| Target | 1 = Creditworthy, 0 = Default |
| Class Split | 70% Good / 30% Bad |
| Missing Values | None |
| Source | UCI ML Repository |
Key Features:
- Checking account status
- Credit history
- Loan purpose & amount
- Savings account balance
- Employment duration
- Age, housing, job category
- Number of dependents
- Python 3.10 or higher
- pip
git clone https://github.com/yourusername/CodeAlpha_CreditScoring.git
cd CodeAlpha_CreditScoringpython -m venv venv
# Windows:
venv\Scripts\activate
# macOS/Linux:
source venv/bin/activatepip install -r requirements.txtpython train_model.pyThis downloads the real dataset, trains 5 models, tunes the best one, and saves
credit_model.pkl.
streamlit run app.pyOpen http://localhost:8501 in your browser.
CodeAlpha_CreditScoring/
โ
โโโ app.py # Main Streamlit app (entry point)
โโโ train_model.py # Standalone training pipeline
โโโ credit_model.pkl # Saved model bundle (after training)
โโโ requirements.txt # Python dependencies
โโโ README.md # This file
โ
โโโ utils/ # Core ML modules
โ โโโ __init__.py
โ โโโ data_loader.py # Dataset loading, cleaning, feature engineering
โ โโโ model_trainer.py # Training, evaluation, tuning, prediction
โ โโโ visualizations.py # All Plotly chart functions
โ
โโโ pages/ # Streamlit page modules
โ โโโ __init__.py
โ โโโ home.py # Landing page
โ โโโ dataset_insights.py # Data preview & statistics
โ โโโ eda.py # Exploratory data analysis
โ โโโ model_training.py # Training UI & metrics
โ โโโ prediction.py # Live prediction form
โ โโโ explainability.py # SHAP & feature importance
โ โโโ about.py # Project info & deployment
โ
โโโ data/ # Dataset files
โ โโโ german.data # Raw UCI dataset (auto-downloaded)
โ โโโ german_credit_cleaned.csv
โ
โโโ models/ # Saved model files
โ โโโ random_forest_bundle.pkl
โ โโโ model_metadata.json
โ
โโโ notebooks/ # Jupyter notebooks (EDA & experiments)
โ โโโ credit_scoring_eda.ipynb
โ
โโโ screenshots/ # App screenshots for README
โโโ assets/ # Static assets
| Model | Accuracy | Precision | Recall | F1 Score | ROC-AUC |
|---|---|---|---|---|---|
| Logistic Regression | ~75% | ~0.83 | ~0.84 | ~0.84 | ~0.78 |
| Decision Tree | ~72% | ~0.80 | ~0.82 | ~0.81 | ~0.72 |
| Random Forest | ~79% | ~0.86 | ~0.87 | ~0.87 | ~0.83 |
| Gradient Boosting | ~78% | ~0.85 | ~0.86 | ~0.86 | ~0.82 |
| XGBoost | ~80% | ~0.87 | ~0.87 | ~0.87 | ~0.84 |
Results vary slightly based on random seed and tuning. XGBoost or Random Forest typically performs best.
Add screenshots to the
screenshots/folder and link them here after running the app.
screenshots/
โโโ home_page.png
โโโ eda_charts.png
โโโ model_comparison.png
โโโ live_prediction.png
โโโ shap_explanation.png
- Push this repo to GitHub
- Go to share.streamlit.io
- Connect your repo, set
app.pyas the entry point - Deploy!
# Start command:
streamlit run app.py --server.port $PORT --server.address 0.0.0.0- Choose Streamlit SDK when creating a Space
- Upload project files or connect GitHub
- Add
sdk: streamlitto README YAML header
- Deep learning model (PyTorch/Keras) comparison
- SMOTE oversampling for class imbalance
- FastAPI REST endpoint for programmatic access
- Batch prediction (upload CSV of applicants)
- Real-time drift monitoring
- Docker containerization
- CI/CD with GitHub Actions
This project is open source and available under the MIT License.
CodeAlpha Internship Project
Machine Learning Track | Credit Scoring Task
- UCI ML Repository for the German Credit Dataset
- Prof. Dr. Hans Hofmann for creating the benchmark dataset
- CodeAlpha for the internship opportunity
- Scikit-learn, XGBoost, SHAP, Plotly, Streamlit open source communities
CodeAlpha ML Internship