🧑‍🌾 PaperFarm: Planting GPUs & APIs 🌱, Harvesting Papers & SOTAs 🌾

🔬 Point it at any repo — sow ideas, run experiments, and harvest better code autonomously

🌱 Sow ideas. 🚜 Run experiments. 🌾 Harvest evidence. 📄

Quick Start · How It Works · Agents · TUI Dashboard · CLI Reference · Configuration · Examples

🌾 Key Features

🚀 One run Command: paperfarm run bootstraps a new workflow when .research/ is missing, or resumes an existing workflow when it already exists.
🤖 Multi-Agent Support: Works with Claude Code, Codex CLI, Aider, OpenCode, Kimi CLI, and Gemini CLI — auto-detects the first installed agent, or pick your own.
🔬 Scout → Prepare → Review → Experiment Flow: AI agent analyzes your codebase, resolves install/data/smoke bootstrap steps, then runs the research-v1 loop — keeping what works, discarding what doesn't.
🖥️ Research Command Center TUI: A 3-tab Execution / Metrics / Logs dashboard with frontier table, parallel worker status, trend chart with results table, and color-coded event stream.
🛡️ Safety First: Every experiment is an isolated git commit. Failed experiments auto-rollback. Timeout watchdog, crash counter, and max-experiments limit keep things under control.
🧭 Research-v1 Runtime: A single Scout -> Manager -> Critic -> Experiment loop keeps research state explicit and reviewable.
📡 Headless Mode: Run without TUI — outputs structured JSON Lines to stdout, perfect for scripts, CI, or monitoring with external tools.
⚡ Parallel Workers: Run experiments across multiple GPUs in isolated git worktrees — workers can't interfere with each other.

🌱 Quick Start

One-Command Workflow (Recommended)

pip install PaperFarm

cd your-project
paperfarm run

This launches a 4-phase flow:

Plant the first seed with paperfarm run, then let the field work:

Scout — survey the field: analyze your codebase, search related work, and design evaluation metrics
Prepare — prepare the soil: resolve a local Python env, install command, data/setup step, and a readiness smoke check
Review — inspect the crop plan: review the analysis and prepare results in an interactive TUI, then confirm or edit the plan
Experiment — plant, test, and harvest: Manager -> Critic -> Experiment runs the research loop autonomously, keeping what improves metrics

If you want to inspect exactly what run will use before it touches the repo, use:

paperfarm run --dry-run
paperfarm doctor

Headless Mode

Run without TUI — perfect for scripts, CI, or monitoring with external tools:

paperfarm run --mode headless --goal "reduce val_loss below 0.3" --max-experiments 20

Outputs structured JSON Lines to stdout, one event per line:

{"ts": "2026-03-10T12:34:56Z", "level": "info", "phase": "scouting", "event": "scout_started"}
{"ts": "2026-03-10T12:40:00Z", "level": "info", "phase": "preparing", "event": "prepare_step_completed", "step": "smoke", "status": "completed"}
{"ts": "2026-03-10T12:45:00Z", "level": "info", "phase": "experimenting", "event": "experiment_completed", "idea": "idea-001", "metric_value": 0.95, "experiment_num": 3, "max_experiments": 20}
{"ts": "2026-03-10T12:50:00Z", "level": "info", "phase": "done", "event": "limit_reached", "detail": "Max experiments (20) reached"}

Also writes to .research/events.jsonl for persistent logging. Interactive mode now writes the same canonical event stream, so TUI and headless share one runtime log.

Manual Step-by-Step

pip install PaperFarm

cd your-project
paperfarm init                      # Initialize .research/ directory
paperfarm run --agent claude-code   # Launch with TUI dashboard
# Go to sleep. Check results in the morning:
paperfarm status --sparkline
paperfarm results --chart primary

Try the interactive demo — no agent or API key needed:

paperfarm demo              # run in terminal
paperfarm demo --serve      # open in browser at http://localhost:8000
paperfarm demo --serve --port 9000

🚜 How It Works

Open Researcher generates a .research/ directory in your repo with everything needed for autonomous research.

📂 .research/ Directory Structure

File	Purpose
`scout_program.md`	Scout agent instructions — project analysis phase
`.internal/role_programs/*.md`	Internal runtime role prompts (manager / critic / experiment), auto-managed
`config.yaml`	Mode, metrics, timeout, experiment limits, agent settings, and `bootstrap.*` overrides
`project-understanding.md`	Agent fills: what the project does
`research-strategy.md`	Agent fills: research direction and focus areas
`literature.md`	Agent fills: related work and prior art
`evaluation.md`	Agent fills: how to measure improvement
`bootstrap_state.json`	Canonical install/data/smoke state for repo readiness
`prepare.log`	Raw logs from env install, data prep, and smoke execution
`idea_pool.json`	Projected experiment backlog with priority, status, and worker claim metadata
`results.tsv`	Experiment log (timestamp, commit, metrics, status)
`events.jsonl`	Canonical runtime event stream for research + control
`research_graph.json`	Canonical hypothesis / experiment / evidence graph
`research_memory.json`	Repo prior, ideation, and experiment memory
`control.json`	Compatibility snapshot of pause/resume/skip state
`activity.json`	Real-time agent status for TUI display

🔄 The Scout → Prepare → Review → Experiment Flow

Phase 0: Bootstrap
  └─ Auto-init .research/ if needed, load config

Phase 1: Goal Input
  └─ Optional research goal (TUI modal or --goal flag)

Phase 2: Scout Analysis
  ├─ Read codebase → project-understanding.md
  ├─ Search related work → literature.md
  ├─ Define strategy → research-strategy.md
  └─ Design evaluation + bootstrap hints → evaluation.md + config.yaml

Phase 3: Repository Prepare
  ├─ Resolve local Python env
  ├─ Resolve install_command / data_command / smoke_command
  ├─ Run install/data/smoke with logs in .research/prepare.log
  └─ Persist readiness state in .research/bootstrap_state.json

Phase 4: Human Review (TUI only, auto-confirmed in headless)
  ├─ Review all Scout outputs
  ├─ Review bootstrap resolution and readiness
  └─ Confirm, edit, or re-analyze

Phase 5: Research-v1 Loop
  ├─ Manager proposes/refines hypotheses and frontier rows
  ├─ Critic reviews experiment specs before execution
  ├─ Experiment agent implements, tests, and evaluates → results.tsv
  ├─ Critic records evidence and claim updates into research_graph.json
  └─ Repeat until no runnable frontier remains or --max-experiments reached

Each experiment is a git commit. Successful experiments stay; failed ones are rolled back. Everything is logged in results.tsv.

🧰 Auto-Prepare Resolution Rules

paperfarm run now tries to make a local Python repo runnable before the research loop starts.

Python env priority: explicit bootstrap.python → active virtualenv → repo .venv → auto-create .venv
Install priority: explicit bootstrap.install_command → uv sync → poetry install → python -m pip install -r requirements.txt → python -m pip install -e .
Data/setup priority: explicit bootstrap.data_command → make setup|prepare|data|download-data → scripts/prepare*.py / scripts/download*.py / data/*/prepare.py
Smoke priority: explicit bootstrap.smoke_command → first runnable command block from .research/evaluation.md → pytest -q → make test

If a command cannot be resolved safely, run stops before the review/runtime stage and records the failure in .research/bootstrap_state.json.

🛡️ Field Safety & Runtime Controls

Feature	Description
Isolated git commits	Every experiment is a separate commit — nothing is lost
Auto-rollback	Failed experiments are automatically rolled back via `git reset`
Timeout watchdog	Kills experiments exceeding the configured time limit
Crash counter	Auto-pauses after N consecutive crashes (default: 3)
Max experiments	Stops after N experiments (`--max-experiments` or `config.yaml`)
Control plane	Pause / resume / skip commands are event-backed in `events.jsonl`, with `control.json` kept as a compatibility snapshot
Failure memory	Persistent ledger of past failures, ranked by recovery success
Phase gate	In collaborative mode, pauses between phase transitions
Parallel workers	Run experiments across multiple GPUs in isolated worktrees

🤖 Supported Agents

Agent	Command	Status
Claude Code	`--agent claude-code`	Supported
Codex CLI	`--agent codex`	Supported
Aider	`--agent aider`	Supported
OpenCode	`--agent opencode`	Supported
Kimi CLI	`--agent kimi-cli`	Supported
Gemini CLI	`--agent gemini-cli`	Supported

Auto-detection: If you don't specify --agent, Open Researcher finds the first installed one.

⚙️ Agent Configuration

Customize agent parameters in .research/config.yaml:

agents:
  claude-code:
    model: "claude-sonnet-4-5-20250514"   # override model
    allowed_tools: "Edit,Write,Bash,Read,Glob,Grep"
    extra_flags: ["--max-turns", "50"]
  codex:
    model: "gpt-5.2"                      # override default
    sandbox: "workspace-write"            # workspace-write | read-only | danger-full-access | full-auto
  aider:
    model: "gpt-4o"
    extra_flags: ["--no-git"]
  opencode:
    model: "openai/gpt-5"
    agent: "builder"
    extra_flags: ["--share"]
  kimi-cli:
    model: ""                       # optional model override
    agent: "okabe"                  # optional built-in agent profile
    agent_file: ""                  # custom agent file path (optional)
    extra_flags: ["--thinking"]
  gemini-cli:
    model: "gemini-3.1-pro"          # override default model
    sandbox: ""                       # optional sandbox mode
    extra_flags: []

📊 Interactive TUI Dashboard

The interactive TUI is a research command center built around the runtime state in .research/: frontier items, experiment results, worker status, and the event stream. Three tabs — Execution (frontier + workers), Metrics (summary stats + trend chart + results table), and Logs (color-coded event stream). Supports human-in-the-loop checkpoints — review hypotheses, override results, inject ideas, and edit goals without leaving the terminal.

Screenshots

Execution: frontier table sorted by priority with colored status, parallel workers running on multiple GPUs.

Metrics: summary stats (kept/discarded/best/mean/latest), braille trend chart, and scrollable results table.

Logs: color-coded event stream with aligned prefixes — SKILL / DONE / W+ / W- / RES / WAIT / REVW / INJ / GOAL events across rounds.

Hypothesis Review: human-in-the-loop checkpoint — toggle, approve all, or reject frontier items before the next round.

Paused: one-key pause/resume with bold indicator on the status bar.

Completed: all phases checked off, final frontier state with best metric displayed.

🖼️ More Screenshots

Result Review: override AI keep/discard decisions and add constraints for the next round.

Inject Experiment: add a human-authored idea to the frontier with priority.

Edit Goal: update research constraints and direction mid-run.

Stress Test: round 10, 8 frontier items, 6 parallel workers across GPUs — scales smoothly.

Idle: clean initial state before any research round starts.

Failed: bold red indicator when the research loop encounters an unrecoverable error.

📑 3 Tabs & Keyboard Shortcuts

3 tabs:

Execution — Frontier table (sorted by priority, colored status) + Workers panel (GPU, frontier assignment, live status)
Metrics — Summary stats bar (kept/discarded/best/mean/latest + trend arrow) + braille trend chart + scrollable results table
Logs — Color-coded event stream: SKILL / DONE / OUT / W+ / W- / RES / WAIT / REVW / INJ / GOAL

Keyboard shortcuts: p pause, r resume, s skip, g edit goal, i inject idea, q quit.

🔎 Human-in-the-Loop Checkpoints

Hypothesis Review — After manager proposes ideas, review frontier items: toggle keep/reject, approve all, or skip.
Result Review — After experiments complete, review AI decisions (keep/discard) and override any result.
Inject Idea (i key) — Add a human-authored experiment to the frontier at any time.
Edit Goal (g key) — Update research constraints and direction mid-run.
Pause/Resume (p/r keys) — Temporarily halt the research loop.

🚜 Installation

Open Researcher supports Linux, macOS, and Windows. Python 3.10+ required.

Option A: pip install (recommended)

pip install PaperFarm

# Try the demo first (no agent or API key needed)
paperfarm demo                   # run in terminal
paperfarm demo --serve           # open in browser at http://localhost:8000

# Install browser support (optional)
pip install "PaperFarm[serve]"

# Then use it for real
cd your-project
paperfarm run

Option B: From source (for development)

🐧 Linux / 🍎 macOS / 💻 Windows

git clone https://github.com/shatianming5/PaperFarm.git
cd PaperFarm
make dev    # install with dev dependencies
make test   # run tests
make test-cov      # run tests with coverage gate (>=75%)
make lint   # run linter
make package-check # build wheel + install + CLI smoke test
make ci     # full local CI: lint + test + coverage + package smoke

🖥️ CLI Reference

All commands: paperfarm <command>

⚡ Core Commands

Command	What It Does
`run`	Primary command: bootstrap if needed, otherwise run the existing workflow
`run --mode headless --goal "..." --max-experiments N`	Headless JSON Lines mode
`run --workers N`	Set experiment worker count for serial or parallel execution
`init [--tag NAME]`	Initialize `.research/` directory
`demo`	Try the TUI with sample data (no agent needed)
`demo --serve [--port N]`	Serve the demo TUI in a browser (requires `PaperFarm[serve]`)

Hidden compatibility alias: start still works for older scripts, but it is deprecated. Use run.

📈 Monitoring & Results

Command	What It Does
`status [--sparkline]`	Show experiment progress
`results [--chart primary] [--json]`	Print results table or chart
`logs [--follow] [--errors]`	View agent logs
`export`	Export markdown report

💡 Idea Management

Command	What It Does
`ideas list`	Inspect the projected backlog currently derived from `research_graph.json`
`ideas add "description"`	Compatibility command that now refuses mutation under `research-v1`
`ideas delete IDEA_ID`	Compatibility command that now refuses mutation under `research-v1`
`ideas prioritize`	Compatibility command that now refuses mutation under `research-v1`

🔧 Utilities & Diagnostics

Command	What It Does
`config show`	View/validate configuration
`doctor`	Health check environment

⚙️ Configuration

Edit .research/config.yaml:

🎛️ Full Configuration Reference

mode: autonomous              # autonomous | collaborative

experiment:
  timeout: 600                # seconds per experiment before kill
  max_consecutive_crashes: 3  # pause after N consecutive crashes
  max_experiments: 0          # 0 = unlimited; set to N to stop after N experiments
  max_parallel_workers: 0     # 0 = auto (one per GPU), 1 = serial
  worker_agent: ""            # agent for sub-workers (default: same as master)

metrics:
  primary:
    name: ""                  # filled by agent (e.g., "val_loss")
    direction: ""             # higher_is_better | lower_is_better

environment: |
  # Free-form notes for agents. Runtime execution uses bootstrap.* below.

bootstrap:
  auto_prepare: true          # run install/data/smoke before review/runtime
  working_dir: "."            # relative to repo root
  python: ""                  # explicit python path if needed
  install_command: ""         # explicit dependency install command
  data_command: ""            # explicit dataset/setup command
  smoke_command: ""           # explicit readiness check command
  expected_paths: []          # files/dirs that data/setup must materialize
  requires_gpu: false         # fail prepare if GPU is required but unavailable

research:
  protocol: research-v1
  manager_batch_size: 3
  critic_repro_policy: best_or_surprising

memory:
  ideation: true
  experiment: true
  repo_type_prior: true

roles:
  scout_agent: ""             # optional override
  manager_agent: ""           # optional override
  critic_agent: ""            # optional override
  experiment_agent: ""        # optional override

gpu:
  remote_hosts: []            # optional remote GPU allocation hosts

agents:                       # per-agent overrides (optional)
  claude-code:
    model: ""
    allowed_tools: "Edit,Write,Bash,Read,Glob,Grep"

🏡 Project Structure

🎯 Core System

Module	Description
`cli.py`	CLI entry point, all commands (Typer)
`run_cmd.py`	Unified workflow entrypoint: bootstrap flow + existing-workflow runner
`headless.py`	Headless mode (JSON Lines output)
`init_cmd.py`	Initialize `.research/` directory
`config.py`	Configuration parsing

🤖 Agent Adapters (agents/)

Module	Description
`base.py`	AgentAdapter abstract base class
`claude_code.py`	Claude Code adapter
`codex.py`	Codex CLI adapter
`aider.py`	Aider adapter
`opencode.py`	OpenCode adapter
`kimi.py`	Kimi CLI adapter
`gemini.py`	Gemini CLI adapter

📊 TUI Components (tui/)

Module	Description
`app.py`	Main Textual application for the 4-tab research command center
`widgets.py`	Command, execution, logs, docs, lineage, frontier, and detail drawer widgets
`view_model.py`	TUI-specific aggregation layer from graph / memory / results / events into renderable state
`review.py`	Post-Scout review TUI
`modals.py`	Modal dialogs (AddIdea, GPUStatus, Log)
`tui_runner.py`	Shared Textual session lifecycle for bootstrap and existing-workflow entrypoints
`styles.css`	CSS styling

⚙️ Runtime Engine

Module	Description
`idea_pool.py`	Serial idea backlog plus parallel claim handling for workers
`research_loop.py`	Shared Scout → Manager → Critic → Experiment core loop
`research_events.py`	Typed event contract shared by TUI and headless
`event_journal.py`	Shared JSONL journal for runtime and control events
`control_plane.py`	Runtime control (pause/resume/skip)
`failure_memory.py`	Failure memory ledger (categorize, improve fixes)
`worker.py`	Parallel worker management (multi-GPU)
`worktree.py`	Git worktree management (worker isolation)
`gpu_manager.py`	GPU allocation (local/remote)
`watchdog.py`	Timeout watchdog (kill runaway experiments)
`crash_counter.py`	Crash counter (auto-pause after N failures)
`phase_gate.py`	Phase gate (collaborative mode confirmation)
`activity.py`	Activity monitor (real-time agent status)

🌽 Examples

See examples/ for complete setups:

nanoGPT — Reduce validation loss in character-level language model training
Liger-Kernel — Optimize Triton GPU kernels
HF GLUE — Improve HuggingFace Transformers fine-tuning
CIFAR-10 Speedrun — Maximize CIFAR-10 image classification accuracy
YOLO Tiny — Optimize YOLOv8 object detection on COCO8
Whisper Fine-tune — Reduce Whisper speech recognition word error rate
CartPole RL — Maximize CartPole-v1 reinforcement learning reward
Code Perf — Optimize Python JSON parser throughput (non-ML)

🧑‍🌾 Contributing

Contributions are welcome! Please follow these steps:

Open an issue to discuss the proposed change
Fork the repository and create your feature branch
Submit a pull request with a clear description

See CONTRIBUTING.md for guidelines and CHANGELOG.md for version history.

📄 License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 280 Commits
.github		.github
analysis/daily		analysis/daily
docs		docs
examples		examples
imgs		imgs
src		src
tests		tests
.gitignore		.gitignore
AUTO_REVIEW.md		AUTO_REVIEW.md
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
REVIEW_STATE.json		REVIEW_STATE.json
demo.py		demo.py
pyproject.toml		pyproject.toml
run_demo.sh		run_demo.sh
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧑‍🌾 PaperFarm: Planting GPUs & APIs 🌱, Harvesting Papers & SOTAs 🌾

🔬 Point it at any repo — sow ideas, run experiments, and harvest better code autonomously

🌾 Key Features

🌱 Quick Start

One-Command Workflow (Recommended)

Headless Mode

Manual Step-by-Step

🚜 How It Works

🛡️ Field Safety & Runtime Controls

🤖 Supported Agents

📊 Interactive TUI Dashboard

Screenshots

🚜 Installation

Option A: pip install (recommended)

Option B: From source (for development)

🖥️ CLI Reference

⚙️ Configuration

🏡 Project Structure

🌽 Examples

🧑‍🌾 Contributing

📄 License

Star History

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🧑‍🌾 PaperFarm: Planting GPUs & APIs 🌱, Harvesting Papers & SOTAs 🌾

🔬 Point it at any repo — sow ideas, run experiments, and harvest better code autonomously

🌾 Key Features

🌱 Quick Start

One-Command Workflow (Recommended)

Headless Mode

Manual Step-by-Step

🚜 How It Works

🛡️ Field Safety & Runtime Controls

🤖 Supported Agents

📊 Interactive TUI Dashboard

Screenshots

🚜 Installation

Option A: pip install (recommended)

Option B: From source (for development)

🖥️ CLI Reference

⚙️ Configuration

🏡 Project Structure

🌽 Examples

🧑‍🌾 Contributing

📄 License

Star History

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages