Skip to content

michaelkeating/pebbling

Repository files navigation

The Pebbling Game

A digital board game based on the Red-Blue Pebble Game from compiler theory (Hong & Kung, 1981), designed to build intuition for memory-efficient computation scheduling.

Play it solo to find optimal memory strategies, or pit AI agents against each other in tournaments.

Pebbling Game concept Python 3.10+ License: MIT

Why This Exists

On a real H100 GPU, doing math costs ~0.3 picojoules — but fetching data from HBM costs 500+ picojoules. Most of AI training's energy is wasted moving data, not computing. The pebbling game models this problem: given a computation DAG and a memory hierarchy, find the schedule that minimizes total energy cost.

This game was built for the Sutro Group, a volunteer research collective exploring energy-efficient alternatives to backpropagation. It operationalizes the "pebbling game" concept discussed in Yaroslav Bulatov's article Fitting Larger Networks into Memory.

Quick Start

Play the Interactive Game (Browser)

The React artifact in web/pebbling-game.jsx can be played directly in Claude.ai by sharing the conversation, or deployed as a static site:

# Option 1: Deploy with Vite
cd web
npm create vite@latest pebbling-app -- --template react
cp pebbling-game.jsx pebbling-app/src/App.jsx
cd pebbling-app && npm install && npm run dev

# Option 2: Share via Claude.ai artifact link (no deploy needed)

Run the Python Engine & Agents

# No dependencies required for the core engine
cd src

# Run the built-in demo
python pebbling_engine.py

# Play with the heuristic agent
python pebbling_agent.py --agent heuristic --nodes 12 --depth 5 --seed 42

# Run a tournament (random vs heuristic)
python pebbling_agent.py --agent tournament --nodes 12 --depth 5 --games 10

# Play with Claude as the agent (requires API key)
pip install anthropic
export ANTHROPIC_API_KEY=sk-ant-...
python pebbling_agent.py --agent llm --nodes 10 --depth 4 --seed 42

How to Play

You're given a computation DAG (directed acyclic graph) where:

  • Top nodes = inputs (freely available data)
  • Bottom nodes = outputs (must all be computed to win)
  • Edges = dependencies (a node can only be computed if ALL predecessors are in memory)

You place pebbles representing data stored in different memory tiers:

Tier Symbol Cost Capacity Real H100 Equivalent
Register R 1 pJ 2–3 slots RF (0.3 pJ)
L1 SRAM S 5 pJ 3–5 slots L1 Cache (5 pJ)
HBM/DRAM H 100 pJ Unlimited HBM3 (500+ pJ)

Rules:

  1. Source nodes can be pebbled freely
  2. Other nodes require ALL predecessors to currently have pebbles
  3. Pebbles can be removed at any time (frees capacity)
  4. Pebbles can be transferred between tiers (costs the destination tier's I/O price)

Goal: Pebble all output nodes with the minimum total energy cost.

Strategy: Keep data in registers as long as possible. When registers are full, spill to SRAM (5 pJ), not HBM (100 pJ). Sometimes it's cheaper to recompute a node than to store it in HBM. Free registers by removing pebbles from nodes whose successors are all computed.

Architecture

pebbling-game/
├── src/
│   ├── pebbling_engine.py    # Core game engine (gym-style API)
│   └── pebbling_agent.py     # Agent implementations + tournament runner
├── web/
│   └── pebbling-game.jsx     # Interactive React game (browser)
├── notebooks/
│   └── pebbling_colab.ipynb  # Google Colab notebook
└── README.md

Agent API

The engine exposes a standard gym-style interface:

from pebbling_engine import PebblingGame, Action, ActionType

# Create a game
game = PebblingGame.create(
    num_nodes=12, depth=5, max_fan_in=3,
    register_slots=3, sram_slots=4,
    hardware_profile="h100",  # or "tutorial", "extreme"
    seed=42,
)

# Observe state
obs = game.observe()
print(obs.summary())

# Take an action
result = game.step(Action(ActionType.PLACE, node_id=0, tier="register"))
# result -> StepResult(observation, reward, done, info)

# Get all legal moves
valid = game.valid_actions()

# Render for debugging
game.render()

# Generate prompt for LLM agents
prompt = game.prompt_for_agent()

Building Your Own Agent

Implement a class with reset() and get_action(game):

from pebbling_engine import PebblingGame, Action, ActionType

class MyAgent:
    def reset(self):
        pass  # called at start of each game

    def get_action(self, game: PebblingGame) -> Action:
        valid = game.valid_actions()
        # Your strategy here...
        return valid[0]

# Run it
from pebbling_agent import run_agent_game
results = run_agent_game(MyAgent(), num_nodes=12, depth=5, seed=42)
print(f"Total cost: {results['total_cost']} pJ")

Tournament Mode

Compare agents head-to-head:

from pebbling_agent import run_tournament, HeuristicAgent, RandomAgent

results = run_tournament(
    agents={
        "random": RandomAgent(),
        "heuristic": HeuristicAgent(),
        "my_agent": MyAgent(),
    },
    num_games=20,
    num_nodes=12,
    depth=5,
)

Benchmark Results

10-game tournament on 12-node, depth-5 DAGs (3 registers, 4 SRAM, H100 profile):

Agent Avg Cost Best Worst vs Random
Random 9,498 pJ 1,043 27,522
Heuristic 118 pJ 29 233 80× better
Your agent? ? ? ?

CLI Reference

python pebbling_agent.py [OPTIONS]

Options:
  --agent {llm,heuristic,random,tournament}  Agent type (default: heuristic)
  --nodes N          Number of DAG nodes (default: 10)
  --depth D          Max DAG depth (default: 4)
  --fan-in F         Max fan-in per node (default: 2)
  --registers R      Register slots (default: 3)
  --sram S           SRAM slots (default: 4)
  --seed S           Random seed (default: 42)
  --games G          Tournament games (default: 5)
  --model M          Claude model for LLM agent (default: claude-sonnet-4-5-20250929)

Roadmap

  • Interactive browser game (React)
  • Gym-style Python engine
  • Heuristic baseline agent
  • LLM agent (Claude via Anthropic API)
  • Tournament runner
  • Colab notebook with walkthrough
  • Visualization of agent replays
  • Multi-level memory hierarchies (4+ tiers)
  • Multiplayer competitive mode
  • RL agent (PPO/DQN on the action space)
  • Integration with Sutro's agentic pipeline for co-optimizing algorithms + schedules

Background Reading

License

MIT

About

The Pebbling Game: Video game version for humans, API for agents, and basic agent.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors