A digital board game based on the Red-Blue Pebble Game from compiler theory (Hong & Kung, 1981), designed to build intuition for memory-efficient computation scheduling.
Play it solo to find optimal memory strategies, or pit AI agents against each other in tournaments.
On a real H100 GPU, doing math costs ~0.3 picojoules — but fetching data from HBM costs 500+ picojoules. Most of AI training's energy is wasted moving data, not computing. The pebbling game models this problem: given a computation DAG and a memory hierarchy, find the schedule that minimizes total energy cost.
This game was built for the Sutro Group, a volunteer research collective exploring energy-efficient alternatives to backpropagation. It operationalizes the "pebbling game" concept discussed in Yaroslav Bulatov's article Fitting Larger Networks into Memory.
The React artifact in web/pebbling-game.jsx can be played directly in Claude.ai by sharing the conversation, or deployed as a static site:
# Option 1: Deploy with Vite
cd web
npm create vite@latest pebbling-app -- --template react
cp pebbling-game.jsx pebbling-app/src/App.jsx
cd pebbling-app && npm install && npm run dev
# Option 2: Share via Claude.ai artifact link (no deploy needed)# No dependencies required for the core engine
cd src
# Run the built-in demo
python pebbling_engine.py
# Play with the heuristic agent
python pebbling_agent.py --agent heuristic --nodes 12 --depth 5 --seed 42
# Run a tournament (random vs heuristic)
python pebbling_agent.py --agent tournament --nodes 12 --depth 5 --games 10
# Play with Claude as the agent (requires API key)
pip install anthropic
export ANTHROPIC_API_KEY=sk-ant-...
python pebbling_agent.py --agent llm --nodes 10 --depth 4 --seed 42You're given a computation DAG (directed acyclic graph) where:
- Top nodes = inputs (freely available data)
- Bottom nodes = outputs (must all be computed to win)
- Edges = dependencies (a node can only be computed if ALL predecessors are in memory)
You place pebbles representing data stored in different memory tiers:
| Tier | Symbol | Cost | Capacity | Real H100 Equivalent |
|---|---|---|---|---|
| Register | R | 1 pJ | 2–3 slots | RF (0.3 pJ) |
| L1 SRAM | S | 5 pJ | 3–5 slots | L1 Cache (5 pJ) |
| HBM/DRAM | H | 100 pJ | Unlimited | HBM3 (500+ pJ) |
Rules:
- Source nodes can be pebbled freely
- Other nodes require ALL predecessors to currently have pebbles
- Pebbles can be removed at any time (frees capacity)
- Pebbles can be transferred between tiers (costs the destination tier's I/O price)
Goal: Pebble all output nodes with the minimum total energy cost.
Strategy: Keep data in registers as long as possible. When registers are full, spill to SRAM (5 pJ), not HBM (100 pJ). Sometimes it's cheaper to recompute a node than to store it in HBM. Free registers by removing pebbles from nodes whose successors are all computed.
pebbling-game/
├── src/
│ ├── pebbling_engine.py # Core game engine (gym-style API)
│ └── pebbling_agent.py # Agent implementations + tournament runner
├── web/
│ └── pebbling-game.jsx # Interactive React game (browser)
├── notebooks/
│ └── pebbling_colab.ipynb # Google Colab notebook
└── README.md
The engine exposes a standard gym-style interface:
from pebbling_engine import PebblingGame, Action, ActionType
# Create a game
game = PebblingGame.create(
num_nodes=12, depth=5, max_fan_in=3,
register_slots=3, sram_slots=4,
hardware_profile="h100", # or "tutorial", "extreme"
seed=42,
)
# Observe state
obs = game.observe()
print(obs.summary())
# Take an action
result = game.step(Action(ActionType.PLACE, node_id=0, tier="register"))
# result -> StepResult(observation, reward, done, info)
# Get all legal moves
valid = game.valid_actions()
# Render for debugging
game.render()
# Generate prompt for LLM agents
prompt = game.prompt_for_agent()Implement a class with reset() and get_action(game):
from pebbling_engine import PebblingGame, Action, ActionType
class MyAgent:
def reset(self):
pass # called at start of each game
def get_action(self, game: PebblingGame) -> Action:
valid = game.valid_actions()
# Your strategy here...
return valid[0]
# Run it
from pebbling_agent import run_agent_game
results = run_agent_game(MyAgent(), num_nodes=12, depth=5, seed=42)
print(f"Total cost: {results['total_cost']} pJ")Compare agents head-to-head:
from pebbling_agent import run_tournament, HeuristicAgent, RandomAgent
results = run_tournament(
agents={
"random": RandomAgent(),
"heuristic": HeuristicAgent(),
"my_agent": MyAgent(),
},
num_games=20,
num_nodes=12,
depth=5,
)10-game tournament on 12-node, depth-5 DAGs (3 registers, 4 SRAM, H100 profile):
| Agent | Avg Cost | Best | Worst | vs Random |
|---|---|---|---|---|
| Random | 9,498 pJ | 1,043 | 27,522 | 1× |
| Heuristic | 118 pJ | 29 | 233 | 80× better |
| Your agent? | ? | ? | ? | ?× |
python pebbling_agent.py [OPTIONS]
Options:
--agent {llm,heuristic,random,tournament} Agent type (default: heuristic)
--nodes N Number of DAG nodes (default: 10)
--depth D Max DAG depth (default: 4)
--fan-in F Max fan-in per node (default: 2)
--registers R Register slots (default: 3)
--sram S SRAM slots (default: 4)
--seed S Random seed (default: 42)
--games G Tournament games (default: 5)
--model M Claude model for LLM agent (default: claude-sonnet-4-5-20250929)
- Interactive browser game (React)
- Gym-style Python engine
- Heuristic baseline agent
- LLM agent (Claude via Anthropic API)
- Tournament runner
- Colab notebook with walkthrough
- Visualization of agent replays
- Multi-level memory hierarchies (4+ tiers)
- Multiplayer competitive mode
- RL agent (PPO/DQN on the action space)
- Integration with Sutro's agentic pipeline for co-optimizing algorithms + schedules
- Fitting Larger Networks into Memory — Yaroslav Bulatov (OpenAI/TensorFlow)
- I/O Complexity: The Red-Blue Pebble Game — Hong & Kung, 1981
- On the Hardness of Red-Blue Pebble Games — Papp & Wattenhofer, 2020
- Complete Register Allocation Problems — Sethi, 1975
MIT