Skip to content

Nightcoder-26/CodeAlpha_Credit-Scoring-App

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

2 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿฆ Credit Scoring AI โ€” Machine Learning Project

Python Scikit-learn Streamlit XGBoost Plotly

A production-ready, end-to-end Machine Learning system for credit risk assessment
Built as part of the CodeAlpha Machine Learning Internship

๐Ÿš€ Live Demo โ€ข ๐Ÿ“Š Dataset โ€ข ๐Ÿ› ๏ธ Installation โ€ข ๐Ÿ“– Documentation


๐Ÿ“Œ Project Overview

This project implements a complete, professional Credit Scoring System that predicts whether a loan applicant is creditworthy or likely to default โ€” using real financial and personal data.

It covers the full ML lifecycle:

  • Real dataset ingestion from UCI ML Repository
  • Data cleaning, EDA, and feature engineering
  • Training & comparing 5 ML classification models
  • Hyperparameter tuning with GridSearchCV
  • SHAP-based model explainability
  • A polished, interactive Streamlit web dashboard
  • Downloadable prediction reports

โœจ Features

Feature Description
๐Ÿ“Š Real Dataset UCI German Credit Dataset โ€” 1,000 real applicants, 20 features
๐Ÿ”ฌ Deep EDA Correlation heatmaps, 3D scatter plots, risk trend analysis
โš™๏ธ Feature Engineering Debt ratios, age groups, credit tiers, encoding & scaling
๐Ÿค– 5 ML Models Logistic Regression, Decision Tree, Random Forest, GBM, XGBoost
๐Ÿ“ˆ Full Evaluation Accuracy, Precision, Recall, F1, ROC-AUC, Confusion Matrix
๐ŸŽ›๏ธ Hyperparameter Tuning GridSearchCV with 5-fold cross-validation
๐Ÿ”ฎ Live Prediction Interactive form with risk gauge and confidence breakdown
๐Ÿง  Explainable AI SHAP values + feature importance for every prediction
๐Ÿ“ฅ Export Reports Download prediction as CSV or JSON
๐ŸŒ™ Dark Mode UI Modern, polished Streamlit dashboard

๐Ÿ› ๏ธ Tech Stack

Backend:          Python 3.10+
ML Framework:     Scikit-learn, XGBoost
Data:             Pandas, NumPy, SciPy
Visualization:    Plotly, Seaborn, Matplotlib
Web App:          Streamlit
Explainability:   SHAP
Model Saving:     Joblib

๐Ÿ“Š Dataset

German Credit Dataset โ€” UCI Machine Learning Repository
Created by: Prof. Dr. Hans Hofmann, Universitรคt Hamburg (1994)

Property Value
Samples 1,000
Features 20 (financial + personal)
Target 1 = Creditworthy, 0 = Default
Class Split 70% Good / 30% Bad
Missing Values None
Source UCI ML Repository

Key Features:

  • Checking account status
  • Credit history
  • Loan purpose & amount
  • Savings account balance
  • Employment duration
  • Age, housing, job category
  • Number of dependents

๐Ÿš€ Installation

Prerequisites

  • Python 3.10 or higher
  • pip

Step 1: Clone the repository

git clone https://github.com/yourusername/CodeAlpha_CreditScoring.git
cd CodeAlpha_CreditScoring

Step 2: Create virtual environment (recommended)

python -m venv venv
# Windows:
venv\Scripts\activate
# macOS/Linux:
source venv/bin/activate

Step 3: Install dependencies

pip install -r requirements.txt

Step 4: Train the model

python train_model.py

This downloads the real dataset, trains 5 models, tunes the best one, and saves credit_model.pkl.

Step 5: Launch the app

streamlit run app.py

Open http://localhost:8501 in your browser.


๐Ÿ“ Project Structure

CodeAlpha_CreditScoring/
โ”‚
โ”œโ”€โ”€ app.py                      # Main Streamlit app (entry point)
โ”œโ”€โ”€ train_model.py              # Standalone training pipeline
โ”œโ”€โ”€ credit_model.pkl            # Saved model bundle (after training)
โ”œโ”€โ”€ requirements.txt            # Python dependencies
โ”œโ”€โ”€ README.md                   # This file
โ”‚
โ”œโ”€โ”€ utils/                      # Core ML modules
โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”œโ”€โ”€ data_loader.py          # Dataset loading, cleaning, feature engineering
โ”‚   โ”œโ”€โ”€ model_trainer.py        # Training, evaluation, tuning, prediction
โ”‚   โ””โ”€โ”€ visualizations.py      # All Plotly chart functions
โ”‚
โ”œโ”€โ”€ pages/                      # Streamlit page modules
โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”œโ”€โ”€ home.py                 # Landing page
โ”‚   โ”œโ”€โ”€ dataset_insights.py     # Data preview & statistics
โ”‚   โ”œโ”€โ”€ eda.py                  # Exploratory data analysis
โ”‚   โ”œโ”€โ”€ model_training.py       # Training UI & metrics
โ”‚   โ”œโ”€โ”€ prediction.py           # Live prediction form
โ”‚   โ”œโ”€โ”€ explainability.py       # SHAP & feature importance
โ”‚   โ””โ”€โ”€ about.py                # Project info & deployment
โ”‚
โ”œโ”€โ”€ data/                       # Dataset files
โ”‚   โ”œโ”€โ”€ german.data             # Raw UCI dataset (auto-downloaded)
โ”‚   โ””โ”€โ”€ german_credit_cleaned.csv
โ”‚
โ”œโ”€โ”€ models/                     # Saved model files
โ”‚   โ”œโ”€โ”€ random_forest_bundle.pkl
โ”‚   โ””โ”€โ”€ model_metadata.json
โ”‚
โ”œโ”€โ”€ notebooks/                  # Jupyter notebooks (EDA & experiments)
โ”‚   โ””โ”€โ”€ credit_scoring_eda.ipynb
โ”‚
โ”œโ”€โ”€ screenshots/                # App screenshots for README
โ””โ”€โ”€ assets/                     # Static assets

๐Ÿ“ˆ Model Results

Model Accuracy Precision Recall F1 Score ROC-AUC
Logistic Regression ~75% ~0.83 ~0.84 ~0.84 ~0.78
Decision Tree ~72% ~0.80 ~0.82 ~0.81 ~0.72
Random Forest ~79% ~0.86 ~0.87 ~0.87 ~0.83
Gradient Boosting ~78% ~0.85 ~0.86 ~0.86 ~0.82
XGBoost ~80% ~0.87 ~0.87 ~0.87 ~0.84

Results vary slightly based on random seed and tuning. XGBoost or Random Forest typically performs best.


๐Ÿ–ฅ๏ธ Screenshots

Add screenshots to the screenshots/ folder and link them here after running the app.

screenshots/
โ”œโ”€โ”€ home_page.png
โ”œโ”€โ”€ eda_charts.png
โ”œโ”€โ”€ model_comparison.png
โ”œโ”€โ”€ live_prediction.png
โ””โ”€โ”€ shap_explanation.png

๐Ÿš€ Deployment

Streamlit Cloud (Recommended โ€” Free)

  1. Push this repo to GitHub
  2. Go to share.streamlit.io
  3. Connect your repo, set app.py as the entry point
  4. Deploy!

Render

# Start command:
streamlit run app.py --server.port $PORT --server.address 0.0.0.0

HuggingFace Spaces

  • Choose Streamlit SDK when creating a Space
  • Upload project files or connect GitHub
  • Add sdk: streamlit to README YAML header

๐Ÿ”ฎ Future Improvements

  • Deep learning model (PyTorch/Keras) comparison
  • SMOTE oversampling for class imbalance
  • FastAPI REST endpoint for programmatic access
  • Batch prediction (upload CSV of applicants)
  • Real-time drift monitoring
  • Docker containerization
  • CI/CD with GitHub Actions

๐Ÿ“„ License

This project is open source and available under the MIT License.


๐Ÿ‘จโ€๐Ÿ’ป Author

CodeAlpha Internship Project
Machine Learning Track | Credit Scoring Task


๐Ÿ™ Acknowledgements

  • UCI ML Repository for the German Credit Dataset
  • Prof. Dr. Hans Hofmann for creating the benchmark dataset
  • CodeAlpha for the internship opportunity
  • Scikit-learn, XGBoost, SHAP, Plotly, Streamlit open source communities

โญ Star this repo if you found it helpful!
CodeAlpha ML Internship

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors