AI-Assisted Process Migration for Load-Aware Distributed Systems

This project simulates a distributed operating system across three virtual nodes and demonstrates AI-driven, load-aware process migration using Docker and CRIU.

It combines:

Virtualization — Vagrant + VirtualBox
Containerization — Docker
Live process checkpointing — CRIU
Real-time telemetry — Python (psutil) + Flask
Visualization — Streamlit + Matplotlib
Machine Learning — Random Forest (scikit-learn)

The system continuously monitors CPU, memory, and load across nodes.
An ML model predicts overloads and triggers live container migration from an overloaded node to a healthier one.

Built as an academic systems project (Jan–Apr 2025).

🧠 Architecture Overview

Nodes

Node1 – 192.168.33.11
Central control plane:
- Receives metrics from all nodes
- Stores telemetry in merged_metrics.csv
- Hosts Flask server
- Runs Streamlit dashboard
- Trains AI model
- Executes migration orchestrator
Node2 – 192.168.33.12
Worker node:
- Runs stress container (load_gen)
- Sends metrics to Node1
- Acts as source of migration
Node3 – 192.168.33.13
Worker node:
- Receives checkpoint
- Restores container
- Becomes destination of migration

Data & Control Flow


Node2 ─┐
├── HTTP Metrics ──> Node1 (Flask + CSV + Streamlit)
Node3 ─┘
Node1 ── AI Decision ──> Node2 (Checkpoint via CRIU)
Node2 ── SCP ─────────> Node3
Node3 ── Restore ─────> Container continues execution

📁 Repository Structure


ai-process-migration/
│
├── Vagrantfile
│
├── node_common/
│   ├── Dockerfile
│   ├── load_generator.py
│   └── requirements.txt
│
├── node2_node3/
│   └── metrics_sender.py
│
├── node1_server/
│   ├── metrics_server.py
│   ├── live_dashboard.py
│   ├── metrics_plotter.py
│   ├── train_ai_model.py
│   └── ai_orchestrator.py

🚀 Setup & Execution Guide

1️⃣ Bring Up the Cluster (Host Machine)

vagrant up

Open three terminals:

vagrant ssh node1
vagrant ssh node2
vagrant ssh node3

2️⃣ Node1 – Metrics Server & Dashboard

cd ~/app
pip3 install flask pandas matplotlib streamlit scikit-learn joblib psutil requests
python3 metrics_server.py

In another terminal on Node1:

streamlit run live_dashboard.py --server.address=192.168.33.11

Open in browser:

http://192.168.33.11:8501

3️⃣ Node2 & Node3 – Workers

On both Node2 and Node3:

cd ~/app
pip3 install psutil requests
python3 metrics_sender.py &

Build and run the load generator:

docker build -t load_gen .
docker run -d --name generator load_gen

4️⃣ Train the AI Model (Node1)

After some metrics accumulate:

python3 train_ai_model.py

This produces:

load_classifier.joblib

5️⃣ Trigger AI-Driven Migration (Node1)

python3 ai_orchestrator.py

What happens:

Latest metrics are read
AI model predicts overload
If overloaded:
- Node2 checkpoints container using CRIU
- Checkpoint is SCP-ed to Node3
- Node3 restores container

📊 What You Can Demonstrate

Live CPU/memory/load graphs in Streamlit
Real-time metrics flowing over HTTP
Stress containers consuming resources
Checkpoint creation on Node2
Transfer of CRIU state
Restore attempt on Node3
AI model training & inference

Even where CRIU restore faces kernel limitations, the entire pipeline is real and functional.

🧩 Key Learnings

Distributed OS design using virtualization
Container internals and live process state
Real-time telemetry pipelines
OS-level checkpoint/restore
AI-driven orchestration logic
Debugging low-level Linux systems

This project bridges Operating Systems, Distributed Systems, and Machine Learning into one cohesive system.

📌 Tech Stack

Component	Technology Used
Virtual Machines	Vagrant + VirtualBox
Containers	Docker
Process Migration	CRIU
Monitoring	psutil + Flask
Visualization	Streamlit + Matplotlib
AI Model	Random Forest (scikit-learn)

👥 Team

This project was built collaboratively by:

Bind Pratap Singh
Krishna Garg
Parth Agrawal
Ayush Tiwari

“This project explores a future where the operating system itself becomes intelligent - predicting overloads and moving live workloads across machines in real time.”

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI-Assisted Process Migration for Load-Aware Distributed Systems

🧠 Architecture Overview

Nodes

Data & Control Flow

📁 Repository Structure

🚀 Setup & Execution Guide

1️⃣ Bring Up the Cluster (Host Machine)

2️⃣ Node1 – Metrics Server & Dashboard

3️⃣ Node2 & Node3 – Workers

4️⃣ Train the AI Model (Node1)

5️⃣ Trigger AI-Driven Migration (Node1)

📊 What You Can Demonstrate

🧩 Key Learnings

📌 Tech Stack

👥 Team

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
node1_server		node1_server
node2_node3		node2_node3
node_common		node_common
README.md		README.md
Vagrantfile		Vagrantfile

Folders and files

Latest commit

History

Repository files navigation

AI-Assisted Process Migration for Load-Aware Distributed Systems

🧠 Architecture Overview

Nodes

Data & Control Flow

📁 Repository Structure

🚀 Setup & Execution Guide

1️⃣ Bring Up the Cluster (Host Machine)

2️⃣ Node1 – Metrics Server & Dashboard

3️⃣ Node2 & Node3 – Workers

4️⃣ Train the AI Model (Node1)

5️⃣ Trigger AI-Driven Migration (Node1)

📊 What You Can Demonstrate

🧩 Key Learnings

📌 Tech Stack

👥 Team

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages