Skip to content

bindpratapsingh/AI-process-migration-OS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AI-Assisted Process Migration for Load-Aware Distributed Systems

This project simulates a distributed operating system across three virtual nodes and demonstrates AI-driven, load-aware process migration using Docker and CRIU.

It combines:

  • Virtualization — Vagrant + VirtualBox
  • Containerization — Docker
  • Live process checkpointing — CRIU
  • Real-time telemetry — Python (psutil) + Flask
  • Visualization — Streamlit + Matplotlib
  • Machine Learning — Random Forest (scikit-learn)

The system continuously monitors CPU, memory, and load across nodes.
An ML model predicts overloads and triggers live container migration from an overloaded node to a healthier one.

Built as an academic systems project (Jan–Apr 2025).


🧠 Architecture Overview

Nodes

  • Node1 – 192.168.33.11
    Central control plane:

    • Receives metrics from all nodes
    • Stores telemetry in merged_metrics.csv
    • Hosts Flask server
    • Runs Streamlit dashboard
    • Trains AI model
    • Executes migration orchestrator
  • Node2 – 192.168.33.12
    Worker node:

    • Runs stress container (load_gen)
    • Sends metrics to Node1
    • Acts as source of migration
  • Node3 – 192.168.33.13
    Worker node:

    • Receives checkpoint
    • Restores container
    • Becomes destination of migration

Data & Control Flow


Node2 ─┐
├── HTTP Metrics ──> Node1 (Flask + CSV + Streamlit)
Node3 ─┘
Node1 ── AI Decision ──> Node2 (Checkpoint via CRIU)
Node2 ── SCP ─────────> Node3
Node3 ── Restore ─────> Container continues execution


📁 Repository Structure


ai-process-migration/
│
├── Vagrantfile
│
├── node_common/
│   ├── Dockerfile
│   ├── load_generator.py
│   └── requirements.txt
│
├── node2_node3/
│   └── metrics_sender.py
│
├── node1_server/
│   ├── metrics_server.py
│   ├── live_dashboard.py
│   ├── metrics_plotter.py
│   ├── train_ai_model.py
│   └── ai_orchestrator.py


🚀 Setup & Execution Guide

1️⃣ Bring Up the Cluster (Host Machine)

vagrant up

Open three terminals:

vagrant ssh node1
vagrant ssh node2
vagrant ssh node3

2️⃣ Node1 – Metrics Server & Dashboard

cd ~/app
pip3 install flask pandas matplotlib streamlit scikit-learn joblib psutil requests
python3 metrics_server.py

In another terminal on Node1:

streamlit run live_dashboard.py --server.address=192.168.33.11

Open in browser:

http://192.168.33.11:8501

3️⃣ Node2 & Node3 – Workers

On both Node2 and Node3:

cd ~/app
pip3 install psutil requests
python3 metrics_sender.py &

Build and run the load generator:

docker build -t load_gen .
docker run -d --name generator load_gen

4️⃣ Train the AI Model (Node1)

After some metrics accumulate:

python3 train_ai_model.py

This produces:

load_classifier.joblib

5️⃣ Trigger AI-Driven Migration (Node1)

python3 ai_orchestrator.py

What happens:

  1. Latest metrics are read

  2. AI model predicts overload

  3. If overloaded:

    • Node2 checkpoints container using CRIU
    • Checkpoint is SCP-ed to Node3
    • Node3 restores container

📊 What You Can Demonstrate

  • Live CPU/memory/load graphs in Streamlit
  • Real-time metrics flowing over HTTP
  • Stress containers consuming resources
  • Checkpoint creation on Node2
  • Transfer of CRIU state
  • Restore attempt on Node3
  • AI model training & inference

Even where CRIU restore faces kernel limitations, the entire pipeline is real and functional.


🧩 Key Learnings

  • Distributed OS design using virtualization
  • Container internals and live process state
  • Real-time telemetry pipelines
  • OS-level checkpoint/restore
  • AI-driven orchestration logic
  • Debugging low-level Linux systems

This project bridges Operating Systems, Distributed Systems, and Machine Learning into one cohesive system.


📌 Tech Stack

Component Technology Used
Virtual Machines Vagrant + VirtualBox
Containers Docker
Process Migration CRIU
Monitoring psutil + Flask
Visualization Streamlit + Matplotlib
AI Model Random Forest (scikit-learn)

👥 Team

This project was built collaboratively by:

  • Bind Pratap Singh
  • Krishna Garg
  • Parth Agrawal
  • Ayush Tiwari

“This project explores a future where the operating system itself becomes intelligent - predicting overloads and moving live workloads across machines in real time.”

About

AI-Assisted Process Migration for Load-Aware Distributed Systems

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors