Skip to content

el-Badr07/MLinf

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

4 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

MLInf - ML Training & Inference Suite

Overview

A comprehensive, modular machine learning platform for classification and regression tasks with an intuitive Streamlit interface. Built with scikit-learn, XGBoost, and LightGBM.

MLInf provides a complete end-to-end machine learning workflow from data loading to model deployment, featuring intelligent preprocessing, automated hyperparameter configuration, training visualizations, and model explainability tools.

Key Features

๐Ÿ“Š Data Processing

  • Multiple Data Sources: Upload CSV, Excel, JSON, Parquet files, load from URLs, or use built-in scikit-learn datasets
  • Intelligent Preprocessing: Automatic encoding strategy for categorical features (one-hot vs frequency encoding based on cardinality)
  • Missing Value Handling: Multiple imputation strategies (mean, median, mode, constant, forward/backward fill)
  • Feature Scaling: StandardScaler, MinMaxScaler, RobustScaler
  • Data Validation: Automatic class balance checking, duplicate detection, missing value analysis

๐Ÿค– Machine Learning Models

Classification Models:

  • Logistic Regression
  • Random Forest Classifier
  • Support Vector Machine (SVM)
  • Gradient Boosting Classifier (scikit-learn)
  • XGBoost Classifier
  • LightGBM Classifier
  • Neural Network (MLP Classifier)

Regression Models:

  • Linear Regression
  • Ridge Regression
  • Lasso Regression
  • Random Forest Regressor
  • Support Vector Regression (SVR)
  • Gradient Boosting Regressor (scikit-learn)
  • XGBoost Regressor
  • LightGBM Regressor

โš™๏ธ Training & Hyperparameters

  • Dynamic Hyperparameter UI: Automatically generated configuration widgets for all models
  • Training History Tracking: Real-time loss curves and convergence monitoring
  • Early Stopping: Configurable early stopping for supported models
  • Overfitting Detection: Automatic train-validation gap analysis
  • Cross-Validation: K-fold cross-validation support

๐Ÿ“ˆ Evaluation & Visualization

  • Comprehensive Metrics:
    • Classification: Accuracy, Precision, Recall, F1-Score, ROC-AUC
    • Regression: MSE, RMSE, MAE, Rยฒ, Adjusted Rยฒ
  • Rich Visualizations:
    • Confusion matrices
    • ROC curves and Precision-Recall curves
    • Feature importance plots
    • Training loss curves
    • Residual plots (regression)
    • Actual vs Predicted plots
  • Model Comparison: Side-by-side comparison of multiple trained models

๐Ÿ”ฎ Inference & Deployment

  • Single Prediction: Interactive form-based predictions
  • Batch Prediction: Upload files for bulk predictions with probability outputs
  • Model Export: Download trained models as zip files with metadata
  • Probability Outputs: Class probabilities for classification tasks
  • Feature Importance: Built-in feature importance from tree-based models

๐Ÿ—๏ธ Architecture

  • Modular Design: Plugin-based model registration system
  • Extensible: Add new models by simply dropping files in the models directory
  • Type-Safe: Comprehensive type hints throughout the codebase
  • Error Handling: Robust exception handling with detailed error messages
  • Session Management: Persistent state across UI interactions

Installation

Option 1: Docker Deployment (Recommended)

# Clone the repository
git clone https://github.com/el-Badr07/mlinf.git
cd mlinf

# Build and run with Docker Compose
docker-compose up -d

# Access the application at http://localhost:8501

Docker Benefits:

  • No dependency conflicts
  • Isolated environment
  • Easy deployment and scaling
  • Consistent behavior across systems
  • Automatic restarts

Docker Commands:

# Start the application
docker-compose up -d

# Stop the application
docker-compose down

# View logs
docker-compose logs -f

# Rebuild after code changes
docker-compose up -d --build

# Remove everything including volumes
docker-compose down -v

Option 2: Local Installation

# Clone the repository
git clone https://github.com/el-Badr07/mlinf.git
cd mlinf

# Create virtual environment (recommended)
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Launch the application
streamlit run ui/app.py

Project Structure

mlinf/
โ”œโ”€โ”€ src/
โ”‚   โ”œโ”€โ”€ core/                 # Core abstractions and registry
โ”‚   โ”‚   โ”œโ”€โ”€ base_model.py    # Base model interface
โ”‚   โ”‚   โ””โ”€โ”€ registry.py      # Model registration system
โ”‚   โ”œโ”€โ”€ data/                # Data handling modules
โ”‚   โ”‚   โ”œโ”€โ”€ loaders.py       # File and URL loaders
โ”‚   โ”‚   โ”œโ”€โ”€ validators.py    # Data validation utilities
โ”‚   โ”‚   โ”œโ”€โ”€ preprocessors.py # Preprocessing pipelines
โ”‚   โ”‚   โ””โ”€โ”€ sklearn_datasets.py # Built-in dataset loader
โ”‚   โ”œโ”€โ”€ models/              # Model implementations
โ”‚   โ”‚   โ”œโ”€โ”€ classification/  # Classification models
โ”‚   โ”‚   โ””โ”€โ”€ regression/      # Regression models
โ”‚   โ”œโ”€โ”€ training/            # Training utilities
โ”‚   โ”‚   โ””โ”€โ”€ trainer.py       # Model training logic
โ”‚   โ”œโ”€โ”€ evaluation/          # Evaluation modules
โ”‚   โ”‚   โ”œโ”€โ”€ metrics.py       # Metric calculations
โ”‚   โ”‚   โ””โ”€โ”€ visualizations.py # Plotting utilities
โ”‚   โ”œโ”€โ”€ inference/           # Inference utilities
โ”‚   โ”‚   โ””โ”€โ”€ predictor.py     # Prediction logic
โ”‚   โ”œโ”€โ”€ explainability/      # Model explainability
โ”‚   โ”‚   โ”œโ”€โ”€ shap_explainer.py
โ”‚   โ”‚   โ””โ”€โ”€ lime_explainer.py
โ”‚   โ”œโ”€โ”€ persistence/         # Model saving/loading
โ”‚   โ”‚   โ””โ”€โ”€ model_saver.py
โ”‚   โ””โ”€โ”€ utils/               # Utility functions
โ”œโ”€โ”€ ui/
โ”‚   โ”œโ”€โ”€ pages/               # Streamlit pages
โ”‚   โ”‚   โ”œโ”€โ”€ 1_๐Ÿ“ค_Data_Upload.py
โ”‚   โ”‚   โ”œโ”€โ”€ 2_๐Ÿ”ง_Preprocessing.py
โ”‚   โ”‚   โ”œโ”€โ”€ 3_๐ŸŽฏ_Model_Training.py
โ”‚   โ”‚   โ”œโ”€โ”€ 4_๐Ÿ“Š_Evaluation.py
โ”‚   โ”‚   โ””โ”€โ”€ 5_๐Ÿ”ฎ_Inference.py
โ”‚   โ”œโ”€โ”€ ui_utils/            # UI utilities
โ”‚   โ”‚   โ”œโ”€โ”€ session_state.py
โ”‚   โ”‚   โ””โ”€โ”€ hyperparam_widgets.py
โ”‚   โ””โ”€โ”€ app.py               # Main application
โ”œโ”€โ”€ configs/                 # Configuration files
โ”œโ”€โ”€ tests/                   # Test suite
โ”œโ”€โ”€ requirements.txt         # Python dependencies
โ””โ”€โ”€ README.md

Usage Workflow

1. Data Upload

  • Choose from three data sources:
    • Upload File: CSV, Excel, JSON, Parquet formats
    • Load from URL: Direct URL to dataset
    • Built-in Datasets: 8 scikit-learn datasets (iris, wine, breast_cancer, digits, diabetes, california_housing, linnerud)
  • Automatic data profiling with statistics, missing values, and duplicates detection
  • Select target variable and features
  • Automatic task type detection (classification/regression)

2. Preprocessing

  • Missing Values: Choose imputation strategy per feature
  • Categorical Encoding:
    • Auto mode: One-hot encoding for low cardinality (<20 unique values)
    • Frequency encoding for high cardinality features
    • Manual override available
  • Numerical Scaling: StandardScaler, MinMaxScaler, or RobustScaler
  • Train/Test Split: Configurable split ratio with stratification for classification

3. Model Training

  • Model Selection: Choose one or multiple models to train
  • Hyperparameter Configuration:
    • Dynamic UI widgets auto-generated from model schemas
    • Model-specific parameters (trees, depth, learning rate, etc.)
    • Early stopping configuration for supported models
  • Training Execution:
    • Real-time training progress
    • Loss curve visualization during training
    • Overfitting detection alerts

4. Evaluation

  • Performance Metrics:
    • Classification: Confusion matrix, ROC-AUC, precision-recall curves
    • Regression: Actual vs predicted plots, residual analysis
  • Model Comparison: Compare metrics across all trained models
  • Feature Importance: View which features drive predictions

5. Inference

  • Single Prediction:
    • Interactive form with all feature inputs
    • Probability outputs for classification
  • Batch Prediction:
    • Upload new data file
    • Automatic preprocessing application
    • Download results with predictions and probabilities
  • Model Download: Export trained model as zip file with metadata

Adding Custom Models

The modular architecture makes it easy to add new models:

  1. Create a new file in src/models/classification/ or src/models/regression/
  2. Inherit from BaseModel
  3. Implement required methods and add hyperparameter schema
  4. The model will be automatically registered and appear in the UI

Example:

from core import BaseModel, register_model
from typing import Dict, Any

@register_model
class MyCustomClassifier(BaseModel):
    model_name = "My Custom Classifier"
    model_type = "classification"

    @classmethod
    def get_hyperparameter_schema(cls) -> Dict[str, Dict[str, Any]]:
        return {
            'my_param': {
                'type': 'int',
                'default': 100,
                'min': 10,
                'max': 1000,
                'description': 'My custom parameter'
            }
        }

    def build_model(self):
        from sklearn.ensemble import SomeClassifier
        return SomeClassifier(**self.hyperparameters)

Built-in Datasets

Access 8 popular machine learning datasets directly:

  • Iris - Iris flower classification (150 samples, 4 features, 3 classes)
  • Wine - Wine classification (178 samples, 13 features, 3 classes)
  • Breast Cancer - Cancer diagnosis (569 samples, 30 features, 2 classes)
  • Digits - Handwritten digit recognition (1797 samples, 64 features, 10 classes)
  • Diabetes - Diabetes progression regression (442 samples, 10 features)
  • California Housing - Housing price prediction (20640 samples, 8 features)
  • Linnerud - Multi-output regression (20 samples, 3 features, 3 targets)

Technologies

  • Core ML: scikit-learn, XGBoost, LightGBM
  • UI Framework: Streamlit
  • Visualization: Plotly, Matplotlib, Seaborn
  • Explainability: SHAP, LIME
  • Data Processing: Pandas, NumPy
  • Model Persistence: Joblib

Requirements

  • Python 3.8+
  • See requirements.txt for full dependency list

About

ML Training & Inference Suite

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors