Skip to content

PedroM2626/Multi-AutoML-Interface

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

85 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Multi-AutoML Interface

Version Python Hugging Face License

A unified interface for experimenting with AutoML, allowing you to compare multiple frameworks (AutoGluon, FLAML, H2O, TPOT, PyCaret, Lale, AutoKeras) with integrated MLOps via MLflow.


Important: The linked Hugging Face Spaces demo is provided for testing and visualization only β€” this project is intended to be run locally for real experiments and production use. See the Quick Start section below to run the application on your machine.

πŸ†• What's New (Recent)

  • Added user-selectable parallelism (n_jobs) in the Training UI: choose Auto (all cores) or Manual (select number of jobs).
  • Added support for headerless tabular datasets: upload CSV/Excel files without a header row and the app will auto-generate col_0, col_1, ....
  • Significant Streamlit performance improvements: heavy computations and plots are cached with @st.cache_data, server-init tasks run once with @st.cache_resource, and st.session_state initialization was consolidated to reduce rerun overhead.

🎯 Overview

The Multi-AutoML Interface is a web/desktop application that simplifies the use of AutoML frameworks, enabling:

  • Side-by-side comparison of different AutoML engines
  • Integrated MLOps with complete tracking via MLflow
  • Unified interface for training, evaluation, and prediction
  • Flexible deployment (web, Docker, desktop)
  • Support for Multiple ML Tasks: Classification, Regression, Multi-Label Classification, Time Series Forecasting, Computer Vision (Image Classification, Object Detection, Image Segmentation).
  • Detailed metrics and logging

✨ Key Features

πŸ€– Supported AutoML Frameworks:

  • AutoGluon (Amazon) - Exceptional performance
  • FLAML (Microsoft) - Fast and efficient
  • H2O AutoML (Enterprise) - Robust and comprehensive
  • TPOT (Open Source) - Pipelines generated by Genetic Algorithms
  • PyCaret (Open Source) - End-to-end low code ML platform
  • Lale (IBM) - Scikit-Learn compatible topology search with Hyperopt
  • AutoKeras (Open Source) - AutoML for deep learning based on Keras

πŸ“Š Integrated MLOps & Advanced Dashboard:

  • Explainable AI (XAI): Integrated SHAP for tabular data and Saliency Maps (Occlusion) for Computer Vision to understand model decisions.
  • Auto-EDA & Data Health: Real-time data profiling using ydata-profiling to identify imbalances and correlations before training.
  • Live Experiments Dashboard: Monitor multiple concurrent training runs with real-time logs and metrics.
  • ⚑ Non-Blocking Updates: Powered by Streamlit Fragments, the dashboard auto-refreshes without lagging the main UI.
  • Multi-Concurrent Training: Launch AutoGluon, FLAML, H2O, TPOT, PyCaret, Lale, and AutoKeras simultaneously.
  • Complete MLflow tracking: Metrics, parameters, and artifacts (models, leaderboards, reports).
  • Graceful Cancellation: Stop any running experiment instantly without crashing the application.
  • Automatic Code Generation: Every training generates a Python snippet for model consumption.
  • πŸš€ One-Click API Deployment: Generate a complete FastAPI + Docker package for any model in seconds.
  • Storage Management: Automatically cleans up local model files after MLflow sync.
  • Advanced Prediction: Batch processing via file upload or Manual Entry Form.
  • Unified ML Task Selector: Choose between Classification, Regression, Multi-Label Classification, Time Series Forecasting, and Computer Vision (Image Classification, Multi-Label, Object Detection, Image Segmentation).
  • Dynamic Framework Filtering: View only the AutoML engines that support your selected task.

πŸ–₯️ Multi-Deploy:

  • Web interface (Streamlit)
  • Docker container (production)
  • Desktop app (Electron)
  • Hugging Face Spaces (Live Demo)
  • Local development

Note: The Hugging Face Spaces entry above links to a demo deployment provided for quick preview and visualization. For reproducible experiments and real workloads, run the project locally using the Quick Start instructions.

πŸŽ›οΈ Advanced Interface:

  • Upload multiple datasets (Train, Validation, Test)
  • Advanced parameter configuration
  • Real-time monitoring
  • Results visualization
  • Interactive prediction

πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Frontend      β”‚    β”‚   Backend API    β”‚    β”‚   ML Engines    β”‚
β”‚                 β”‚    β”‚                  β”‚    β”‚                 β”‚
β”‚ β€’ Streamlit     │◄──►│ β€’ Python         │◄──►│ β€’ AutoGluon     β”‚
β”‚ β€’ Electron      β”‚    β”‚ β€’ FastAPI        β”‚    β”‚ β€’ FLAML         β”‚
β”‚ β€’ React         β”‚    β”‚ β€’ MLflow         β”‚    β”‚ β€’ H2O AutoML    β”‚
β”‚ β€’ Custom UI     β”‚    β”‚ β€’ Logging        β”‚    β”‚ β€’ TPOT          β”‚
β”‚                 β”‚    β”‚                  β”‚    β”‚ β€’ PyCaret       β”‚
β”‚                 β”‚    β”‚                  β”‚    β”‚ β€’ Lale          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚                       β”‚                       β”‚
         β–Ό                       β–Ό                       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Storage       β”‚    β”‚   Monitoring     β”‚    β”‚   Deployment    β”‚
β”‚                 β”‚    β”‚                  β”‚    β”‚                 β”‚
β”‚ β€’ File System   β”‚    β”‚ β€’ MLflow UI      β”‚    β”‚ β€’ Docker Hub    β”‚
β”‚ β€’ MLflow Artifactsβ”‚  β”‚ β€’ Logs           β”‚    β”‚ β€’ GitHub        β”‚
β”‚ β€’ Model Registryβ”‚    β”‚ β€’ Metrics        β”‚    β”‚ β€’ Electron Storeβ”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸš€ Quick Start

πŸ“‹ Prerequisites:

  • Python 3.11+
  • Node.js 16+ (for desktop app)
  • Java 11+ (for H2O AutoML)
  • Git

πŸ”§ Installation:

1. Clone the Repository:

git clone https://github.com/PedroM2626/Multi-AutoML-Interface.git
cd Multi-AutoML-Interface

2. Python Environment:

# Create virtual environment
python -m venv venv

# Activate (Windows)
venv\Scripts\activate

# Activate (Mac/Linux)
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

3. Start MLflow:

# Start MLflow server
mlflow server --host 0.0.0.0 --port 5000

4. Run the Application:

# Option 1: Web interface
streamlit run app.py --server.port 8501

# Option 1B: Independent Reflex interface
reflex run

# Option 2: Desktop app (requires Node.js)
npm install && npm run dev

# Option 3: Docker
docker-compose up

πŸ§ͺ Developer Workflow (Quality Gates):

# Install dev tooling
pip install -r requirements-dev.txt

# Lint (critical rules)
ruff check .

# Syntax check
python -m compileall app.py run.py src tests

The repository now includes a GitHub Actions pipeline in .github/workflows/ci.yml to run two levels automatically:

  • quick-pr: lint + syntax + regression flow tests on push/PR
  • nightly-complete: full nightly best-effort run with optional integration stack
# Run critical regression flows locally
pytest -q tests/test_regression_flows.py

πŸ“– User Guide

🎯 Basic Workflow:

Reflex Interface (Independent):

  • The project now includes a standalone Reflex interface in reflex_interface/.
  • It mirrors the Streamlit workflow with dedicated pages for Upload, Exploration, Training, Experiments, Prediction, and MLflow History.
  • It uses the same backend modules in src/, so training, monitoring, predictions, XAI, and MLflow actions are consistent across both UIs.

1. Data Upload & Exploration:

  • Supported formats: CSV, Excel
  • Auto-EDA Integrated: Generate comprehensive profiling reports with ydata-profiling to check for missing values and correlations.
  • Multiple splits supported: Train (mandatory), Validation (optional), and Test (optional)
  • Automatic type detection
  • Automatic Data Lake: When processing data, it is copied to the data_lake/ folder and versioned via DVC, generating hashes for version control.

2. Experiment Configuration:

  • Task Type: Classification, Regression, Multi-Label, Time Series, or Computer Vision (Image Classification, Object Detection, Segmentation).
  • Framework Agnostic: Support for AutoGluon, FLAML, H2O, TPOT, PyCaret, and Lale.
  • ONNX Integration: Universal model export and import via ONNX for cross-platform deployment.
  • Hugging Face Hub: One-click deployment and discovery of models on the HF Hub.
  • MLOps Ready: Integrated MLflow experiment tracking and DVC data versioning.
  • Advanced parameters: seed, time limits, folds, max textual features (TF-IDF), CV, forecasting horizon (TS), etc.

3. Training & Monitoring:

  • Experiments Tab: All training runs appear in a dedicated dashboard.
  • Live Logs: View framework outputs in real-time with O(1) performance optimization.
  • πŸ’“ Activity Heartbeat: Visual pulse indicator when new logs are being processed.
  • Background Execution: Training happens in threads; you can continue using the app while models train.
  • Graceful Stop: Cancel runs that appear to be diverging or taking too long.

4. Results Analysis & Comparison:

  • Comparative leaderboards per framework.
  • Side-by-side run comparison: Select multiple runs in the History tab to compare metrics (Accuracy, RMSE, etc.) visually.
  • Model Registry: Register specific versions of models for lifecycle management.
  • πŸš€ One-Click API Deployment: In the History tab, click "Generate FastAPI Deployment Package" to create a ready-to-use Dockerized web service for your model.

5. Prediction & Explainability:

  • File Upload: Predict on large datasets (CSV/Excel).
  • Manual Input: Real-time inference using a dynamically generated form.
  • XAI Support: Click "Explain Prediction (SHAP)" for tabular data or "Explain AI Decision (Saliency Map)" for CV to see why the model made a certain decision.
  • Consumption Sample: Copy the generated Python code to integrate the model into your own applications.

6. Maintenance & Storage Management:

  • Local Cleanup: Integrated tool to clear the models/ folder and reset local mlruns if needed.
  • Disk Monitoring: Live disk space warning in the experiments tab.
  • Auto-Sync: Models are synced to MLflow and local high-weight copies are removed automatically to preserve space.

πŸ› οΈ Advanced Configuration

βš™οΈ Framework Parameters:

AutoGluon:

{
    'presets': 'best_quality',
    'time_limit': 3600,
    'seed': 42,
    'num_bag_folds': 5,
    'num_bag_sets': 1
}

FLAML:

{
    'time_budget': 3600,
    'seed': 42,
    'ensemble': True,
    'metric': 'accuracy',
    'estimator_list': ['lgbm', 'xgboost', 'rf']
}

H2O AutoML:

{
    'max_runtime_secs': 3600,
    'max_models': 20,
    'seed': 42,
    'nfolds': 5,
    'balance_classes': True,
    'sort_metric': 'AUTO'
}

TPOT:

{
    'generations': 5,
    'population_size': 20,
    'cv': 5,
    'max_time_mins': 30,
    'config_dict': 'TPOT sparse',
    'tfidf_max_features': 500,
    'tfidf_ngram_range': (1, 2)
}

PyCaret:

{
    'time_limit': 1200, # mapped to PyCaret n_iter tuning budget
    'task_type': 'Classification' | 'Regression' | 'Time Series Forecasting',
    'fh': 12, # Forecasting Horizon (Time Series only)
}

Lale:

{
    'time_limit': 1200, # mapped to Hyperopt max_evals bounds
    'task_type': 'Classification' | 'Regression',
}

πŸŽ›οΈ MLflow Configuration:

# Experiments
mlflow.set_experiment("AutoGluon_Experiments")
mlflow.set_experiment("FLAML_Experiments") 
mlflow.set_experiment("H2O_Experiments")

# Tracking
mlflow.log_param("framework", "autogluon")
mlflow.log_metric("accuracy", 0.95)
mlflow.log_artifact("model.pkl")

🐳 Deploy with Docker

πŸ“¦ Build and Run:

1. Build Image:

docker build -t multi-automl:latest .

2. Docker Compose:

# Start all services
docker-compose up -d

# Logs
docker-compose logs -f

# Stop
docker-compose down

3. Ports:

  • 8501: Streamlit UI
  • 5000: MLflow UI
  • 54321: H2O Cluster

πŸ–₯️ Desktop App (Electron)

πŸ“¦ Installation and Build:

1. Install Node.js:

# Download: https://nodejs.org/
node --version
npm --version

2. Install Dependencies:

npm install

3. Development Mode:

npm run dev

4. Production Build:

# Windows
npm run build-win

# Mac
npm run build-mac

# Linux
npm run build-linux

5. Desktop Features:

  • Native window (without browser)
  • Professional menu with shortcuts
  • Native file dialogs
  • System integration
  • Offline mode

πŸ“Š Performance and Benchmarks

πŸ† Framework Comparison:

Framework Speed Performance Memory Ease of Use
AutoGluon ⚑⚑⚑ πŸ†πŸ† πŸ†πŸ† πŸ†πŸ†πŸ†
FLAML ⚑⚑⚑⚑ πŸ†πŸ† πŸ†πŸ†πŸ† πŸ†πŸ†
H2O ⚑⚑ πŸ†πŸ†πŸ† πŸ† πŸ†
TPOT ⚑ πŸ†πŸ†πŸ† πŸ†πŸ† πŸ†
PyCaret ⚑⚑⚑ πŸ†πŸ† πŸ†πŸ† πŸ†πŸ†πŸ†
Lale ⚑⚑ πŸ†πŸ† πŸ†πŸ† πŸ†

πŸ“ˆ Performance Metrics:

Test Dataset (10k rows, 50 columns):

AutoGluon: 2.5 min, 94.2% accuracy
FLAML: 1.8 min, 93.8% accuracy  
H2O: 4.2 min, 94.0% accuracy

Memory Usage:

AutoGluon: ~2GB RAM
FLAML: ~1.5GB RAM
H2O: ~3GB RAM
TPOT: ~1GB RAM (Optimized)

πŸ”§ Troubleshooting

❌ Common Issues:

"Java not found" (H2O):

# Windows: Add JAVA_HOME
set JAVA_HOME="C:\Program Files\Java\jdk-11"

# Mac/Linux: Export variable
export JAVA_HOME=/usr/lib/jvm/java-11-openjdk

"Port already in use":

# Check ports
netstat -an | findstr 8501

# Kill process
taskkill /PID <PID> /F

# Use another port
streamlit run app.py --server.port 8502

"Memory error":

# Increase H2O memory
export H2O_MAX_MEM_SIZE="8G"

# Or reduce dataset

"MLflow connection error" / "Missing mlruns":

# In the new version, the mlruns/.trash directory is automatically healed and recreated if broken.
# For other issues:
mlflow server --host 0.0.0.0 --port 5000

πŸ§ͺ Testing

πŸ“‹ Test Suite:

1. Integration Tests:

# Test H2O integration
python tests/test_h2o_integration.py

# Test MLflow integration  
python tests/test_mlflow_integration.py

2. Unit Tests:

# Test utils
pytest tests/test_utils.py

# Test interface
pytest tests/test_interface.py

3. Performance Tests:

# Benchmark frameworks
python tests/benchmark_frameworks.py

πŸ“ Project Structure

Multi-AutoML-Interface/
β”œβ”€β”€ πŸ“ src/                    # Main source code
β”‚   β”œβ”€β”€ πŸ“„ autogluon_utils.py  # AutoGluon integration
β”‚   β”œβ”€β”€ πŸ“„ flaml_utils.py      # FLAML integration
β”‚   β”œβ”€β”€ πŸ“„ h2o_utils.py        # H2O integration
β”‚   β”œβ”€β”€ πŸ“„ tpot_utils.py       # TPOT integration 
β”‚   β”œβ”€β”€ πŸ“„ pycaret_utils.py    # PyCaret integration
β”‚   β”œβ”€β”€ πŸ“„ lale_utils.py       # Lale integration
β”‚   β”œβ”€β”€ πŸ“„ xai_utils.py        # SHAP and Saliency Map integration
β”‚   β”œβ”€β”€ πŸ“„ mlflow_utils.py     # MLflow helpers and auto-healing
β”‚   β”œβ”€β”€ πŸ“„ mlflow_cache.py     # Cache optimization
β”‚   β”œβ”€β”€ πŸ“„ code_gen_utils.py   # Model consumption code generator
β”‚   β”œβ”€β”€ πŸ“„ data_utils.py       # Data processing & DVC integration
β”‚   └── πŸ“„ log_utils.py        # Logging utilities
β”œβ”€β”€ πŸ“ tests/                  # Automated tests (Integrations & Fixes)
β”‚   β”œβ”€β”€ πŸ“„ test_h2o_integration.py
β”‚   β”œβ”€β”€ πŸ“„ test_tpot_integration.py
β”‚   β”œβ”€β”€ πŸ“„ test_pycaret_utils.py
β”‚   β”œβ”€β”€ πŸ“„ test_lale_utils.py
β”‚   └── ...                    # Specific fixes and simulations
β”œβ”€β”€ πŸ“ deploy_[run_id]/        # Generated API deployment packages
β”‚   β”œβ”€β”€ πŸ“„ main.py             # FastAPI application
β”‚   β”œβ”€β”€ πŸ“„ Dockerfile          # Container configuration
β”‚   └── πŸ“„ requirements.txt    # API-specific dependencies
β”œβ”€β”€ πŸ“ electron/               # Desktop app (Electron)
β”‚   β”œβ”€β”€ πŸ“„ main.js             # Main process
β”‚   β”œβ”€β”€ πŸ“„ preload.js          # Security bridge
β”‚   β”œβ”€β”€ πŸ“„ renderer.js         # UI enhancements
β”‚   └── πŸ“ assets/             # Icons and resources
β”œβ”€β”€ 🐳 Dockerfile              # Docker configuration
β”œβ”€β”€ 🐳 docker-compose.yml      # Multi-service setup
β”œβ”€β”€ πŸ“„ requirements.txt        # Python dependencies
└── πŸ“„ README.md               # This file

🀝 Contributing

🎯 How to Contribute:

1. Fork and Clone:

git clone https://github.com/PedroM2626/Multi-AutoML-Interface.git
cd Multi-AutoML-Interface

2. Create Branch:

git checkout -b feature/new-feature

3. Develop:

  • Follow existing code style
  • Add tests
  • Document changes

4. Commit and Push:

git add .
git commit -m "feat: add new feature"
git push origin feature/new-feature

5. Pull Request:

  • Describe changes
  • Link issues
  • Await review

πŸ“ Guidelines:

  • Python: PEP 8
  • JavaScript: ESLint
  • Commits: Conventional Commits
  • Docs: Clear Markdown

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.


πŸ™ Credits and Acknowledgements

πŸ€– Frameworks:

  • AutoGluon - Amazon Web Services
  • FLAML - Microsoft Research
  • H2O AutoML - H2O.ai
  • TPOT - Rhodes Lab
  • PyCaret - PyCaret
  • Lale - IBM
  • MLflow - Databricks

πŸ› οΈ Technologies:

  • Streamlit - Web interface
  • Electron - Desktop app
  • Docker - Containerization
  • FastAPI - Backend API

πŸ“š Resources:

  • AutoML Documentation
  • MLflow Tracking
  • Streamlit Components
  • Electron Security

πŸ—ΊοΈ Future Roadmap

πŸš€ Upcoming Features

  • Auto-sklearn (meta-learning)
  • Advanced visualizations (3D clusters, interactive ROC)
  • Batch processing queue (Distributed training)

🌐 Live Demo:

Hugging Face Spaces - Multi-AutoML Interface β€” demo only (visualization/testing). Run locally for real experiments.


πŸŽ‰ Conclusion

The Multi-AutoML Interface represents a complete and professional solution for AutoML experimentation, combining:

  • πŸ€– Multiple frameworks in a unified interface
  • πŸ“Š Integrated MLOps with full tracking
  • πŸ–₯️ Flexible deployment (web, desktop, container)
  • πŸŽ›οΈ Intuitive interface for technical users
  • πŸ”§ Advanced configuration for experts
  • πŸ“ˆ Optimized performance for production

Ideal for:

  • Data Scientists wanting to compare frameworks
  • Researchers experimenting with different approaches
  • Students learning about AutoML

Developed by Pedro Morato Lahoz

About

A unified interface for experimenting with AutoML, allowing you to compare multiple frameworks (AutoGluon, FLAML, H2O, TPOT, PyCaret, Lale, AutoKeras) with integrated MLOps via MLflow.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors