Skip to content

riverho/agent-playbook

Repository files navigation

Agent-Playbook

Done is an exit code, not prose. The kernel is a pb record --status done that re-runs each task's acceptance_checks (shell commands) and refuses on failure. Anchoring, the North Star (north_star), the cycle brief, and carry-on portability all support that verification gate — they do not replace it. If a check is tautological, the gate is hollow; see scripts/check-hollow.mjs.

A portable, agent-first playbook. Drop it into any folder and an agent can run it in a loop without friction: orient on a master file, pick a task, do the work, prove it with executable acceptance checks, record, and roll the records up into a human-readable report. Everything lives inside the folder — copy it anywhere and it still works (carry-on).

Why

Agents lose the thread between sessions, drift from process, and — worst of all — declare work "done" without proof. This playbook fixes all three with the minimum machinery that actually works:

  1. One master everything re-anchors to (playbook.yaml, the "fixation"), kept salient by cheap re-injection (pb anchor + runtime hooks), so long context and compaction never lose the plot.
  2. Enforced done. A task's acceptance_checks are shell commands. pb record --status done runs them and refuses to record if any fail. Exit codes keep the loop honest — process documents don't.
  3. Durable state on disk (backlog, append-only journal), so context loss never means work loss.

That's the whole thesis. No specs pipeline, no DAG scheduler, no debt ledger — the playbook earns complexity only when a real workload demands it.

Layout

playbook.yaml      THE MASTER — indexes everything; loop contract; guardrails
SKILL.md           How any agent operates the playbook (read first)
AGENTS.md          Pointer for cross-tool compatibility
scripts/pb.mjs     The loop CLI (status | next | record | report | validate | anchor | checkpoint | loop | learn | run | ps | stop | list | scaffold | init | bootstrap)
processes/         Canonical, ordered workflows (+ index.yaml)
skills/            Short "how-to"s that route to processes (+ index.yaml)
memory/            project-memory.md · backlog.yaml · journal.ndjson  (durable, agent-first)
artifacts/reports/ Generated human-facing rollups

Quick start

npm install                       # one dependency: js-yaml
node scripts/pb.mjs bootstrap     # first empty install only: seed minimal run-task skill/process
node scripts/pb.mjs status        # orient
node scripts/pb.mjs next --claim  # pick + claim the next task (prints its acceptance checks)
# ...do the work via the skill it names...
node scripts/pb.mjs validate --task T1            # run the task's checks on demand
node scripts/pb.mjs record --task T1 --action execute --status done --notes "did the thing"
#   ^ re-runs the checks; refuses to record done if any fail
node scripts/pb.mjs report        # writes artifacts/reports/report-<date>.md

There are npm aliases too: npm run status, npm run next, npm run validate, npm run report.

The loop

orient → select → act → verify → record → report → repeat. One command per step:

Step Command
Orient node scripts/pb.mjs status
Select node scripts/pb.mjs next --claim
Act open skills/<id>/SKILL.md, follow processes/<id>.yaml
Verify node scripts/pb.mjs validate + validate --task <id>
Record node scripts/pb.mjs record ... (done is enforced)
Report node scripts/pb.mjs report

See SKILL.md for the full contract and skills-first routing.

Done is enforced, not declared

Tasks in memory/backlog.yaml carry executable checks:

- id: T7
  title: Add a sitemap generator
  status: todo
  skill: run-task
  priority: 1
  acceptance_checks:
    - node scripts/generate-sitemap.mjs --dry-run
    - node scripts/pb.mjs validate

Each check runs with cwd = the playbook root; exit 0 = pass. pb record --status done runs them all and exits 1 on any failure, telling the agent to fix the work or record blocked instead. --skip-checks exists as an escape hatch, but the skip is stamped on the journal entry and flagged in reports (⚠checks-skipped) — it can't be hidden.

A task without checks is verified on the agent's honor only, and pb next says so when claiming it.

Tasks may also declare dependencies: [T1, T2] — a task isn't claimable until its dependencies are done.

Hardening (context-loss survival)

State lives on disk, never only in chat. Two commands keep the playbook in an agent's attention:

  • pb anchor [--brief] — prints the tiny constitution; cheap enough to re-inject every turn.
  • pb checkpoint [--snapshot] — heartbeat: re-anchors, detects drift (multiple claims, claimed work with no record, red guardrails), and --snapshot writes memory/RESUME.md for cold resume.

Wire them into runtime hooks so the agent never has to remember (Claude Code example): SessionStartpb anchor, UserPromptSubmitpb anchor --brief, PreCompactpb checkpoint --snapshot. See the harden skill.

Loop epochs and lessons

Use pb loop new to open a durable loop epoch. New pb record entries are stamped with the active loop_id, and loop-scoped artifacts live under artifacts/loops/<loop_id>/.

Close clean loops with pb loop close --status done. Close contaminated runs with pb loop close --status failed --reason "..."; that writes a quarantine artifact and blocks the next pb loop new until a lesson is recorded with pb learn --loop <id> --source user --notes "...". Promote reusable lessons into project memory, backlog tasks, or skills/processes.

Make it your own

  • Add tasks to memory/backlog.yaml — with executable acceptance_checks whenever possible.
  • Add a workflow: processes/<id>.yaml (+ register in processes/index.yaml) and skills/<id>/SKILL.md (+ register in skills/index.yaml).
  • Record durable facts in memory/project-memory.md.

Drop into another project

node <engine>/scripts/pb.mjs scaffold --target <repo>/.agent-playbook

Copy-don't-clobber: existing files are never overwritten (except pb.mjs itself, which is the engine). Then npm install and node scripts/pb.mjs bootstrap. The CLI resolves paths relative to its own folder, so it travels intact. Full lifecycle in INSTALL.md.

Guardrails

node scripts/pb.mjs validate checks the master + indices parse, every referenced file exists, skills point to real processes, the backlog (statuses, dependencies, check declarations) and journal are well-formed. It exits non-zero on failure, so it drops cleanly into CI or a pre-commit hook. validate --task <id> runs one task's acceptance checks.

Dependencies

One: js-yaml. Node >= 18.

What was deliberately cut

Earlier versions carried a spec/Work-Map layer (DAG scheduling, gates, waves, debt ledgers). It was planning metadata the CLI never executed — bureaucracy cosplaying as machinery. It now lives in attic/ and the engine is ~40% smaller. If a real workload ever needs orchestration, build it against demonstrated need, not anticipation.

About

A carry-on playbook for AI agents: drop one folder into any repo and agents loop — orient, claim, act, verify — where "done" isn't declared, it's proven by exit codes.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors