InferenceMesh

A decentralized, peer-to-peer pipeline-parallel inference network written in Rust. Inspired by Petals, optimized for low-latency systems-level performance, zero-copy memory operations, and robust connection pooling.

Architecture

InferenceMesh splits large language models (LLMs) across a heterogeneous network of nodes. Each node hosts a subset of transformer blocks and participates in a pipeline-parallel forward pass:

[Client] → (Embeddings) → [Node A: Layers 1-10] → [Node B: Layers 11-20] → [Node C: Layers 21-30] → [Client] (LM Head)

Core Flow

Client tokenizes input, runs the embedding layer locally, and produces hidden states.
Route Resolution via Kademlia DHT — discover which peers host the required layers.
Pipeline Forward Pass — hidden states are piped through the chain over QUIC connections.
Client receives final hidden states, runs the LM head, and samples the next token.

Tech Stack

Component	Crate	Purpose
P2P Networking	`rust-libp2p`	Identity, NAT traversal, Kademlia DHT, encrypted transport
Data Transport	QUIC (via libp2p)	Multiplexed, low-latency UDP streams
Serialization	`capnp` (Cap'n Proto)	Zero-copy tensor serialization
Inference Engine	`candle-core` / `candle-nn`	Pure-Rust ML framework with CUDA/Metal/CPU backends
Async Runtime	`tokio`	Multi-threaded async executor

Project Structure

src/
├── protocol/   # Cap'n Proto schemas, codec, request/response types
├── compute/    # Candle-based transformer block execution engine
├── p2p/        # libp2p swarm, DHT, routing, provider cache
└── cli/        # Binary entry point (bootstrap, worker, client modes)

tests/          # Integration tests (loopback, chaos, concurrency, math parity, memory)
docs/           # Architecture documentation and specs
examples/       # Usage examples

Building

cargo build --workspace

Running

Bootstrap Node

cargo run -- bootstrap --port 9000

Worker Node

cargo run -- worker --bootstrap /ip4/BOOTSTRAP_IP/tcp/9000 --layers 1-10 --port 9001

With model weights:

cargo run -- worker --bootstrap /ip4/BOOTSTRAP_IP/tcp/9000 --layers 1-10 --model-path ./model/ --device cuda --port 9001

Client

cargo run -- client --bootstrap /ip4/BOOTSTRAP_IP/tcp/9000 --chain 1,2,3,4,5

Testing

# All tests
cargo test --workspace

# Specific integration tests
cargo test --test loopback_test
cargo test --test math_parity_test
cargo test --test memory_test
cargo test --test chaos_test
cargo test --test concurrency_test

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
src		src
tests		tests
.gitignore		.gitignore
Cargo.toml		Cargo.toml
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

InferenceMesh

Architecture

Core Flow

Tech Stack

Project Structure

Building

Running

Bootstrap Node

Worker Node

Client

Testing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

InferenceMesh

Architecture

Core Flow

Tech Stack

Project Structure

Building

Running

Bootstrap Node

Worker Node

Client

Testing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages