Done is an exit code, not prose. The kernel is a
pb record --status donethat re-runs each task'sacceptance_checks(shell commands) and refuses on failure. Anchoring, the North Star (north_star), the cycle brief, and carry-on portability all support that verification gate — they do not replace it. If a check is tautological, the gate is hollow; seescripts/check-hollow.mjs.
A portable, agent-first playbook. Drop it into any folder and an agent can run it in a loop without friction: orient on a master file, pick a task, do the work, prove it with executable acceptance checks, record, and roll the records up into a human-readable report. Everything lives inside the folder — copy it anywhere and it still works (carry-on).
Agents lose the thread between sessions, drift from process, and — worst of all — declare work "done" without proof. This playbook fixes all three with the minimum machinery that actually works:
- One master everything re-anchors to (
playbook.yaml, the "fixation"), kept salient by cheap re-injection (pb anchor+ runtime hooks), so long context and compaction never lose the plot. - Enforced done. A task's
acceptance_checksare shell commands.pb record --status doneruns them and refuses to record if any fail. Exit codes keep the loop honest — process documents don't. - Durable state on disk (backlog, append-only journal), so context loss never means work loss.
That's the whole thesis. No specs pipeline, no DAG scheduler, no debt ledger — the playbook earns complexity only when a real workload demands it.
playbook.yaml THE MASTER — indexes everything; loop contract; guardrails
SKILL.md How any agent operates the playbook (read first)
AGENTS.md Pointer for cross-tool compatibility
scripts/pb.mjs The loop CLI (status | next | record | report | validate | anchor | checkpoint | loop | learn | run | ps | stop | list | scaffold | init | bootstrap)
processes/ Canonical, ordered workflows (+ index.yaml)
skills/ Short "how-to"s that route to processes (+ index.yaml)
memory/ project-memory.md · backlog.yaml · journal.ndjson (durable, agent-first)
artifacts/reports/ Generated human-facing rollups
npm install # one dependency: js-yaml
node scripts/pb.mjs bootstrap # first empty install only: seed minimal run-task skill/process
node scripts/pb.mjs status # orient
node scripts/pb.mjs next --claim # pick + claim the next task (prints its acceptance checks)
# ...do the work via the skill it names...
node scripts/pb.mjs validate --task T1 # run the task's checks on demand
node scripts/pb.mjs record --task T1 --action execute --status done --notes "did the thing"
# ^ re-runs the checks; refuses to record done if any fail
node scripts/pb.mjs report # writes artifacts/reports/report-<date>.mdThere are npm aliases too: npm run status, npm run next, npm run validate, npm run report.
orient → select → act → verify → record → report → repeat. One command per step:
| Step | Command |
|---|---|
| Orient | node scripts/pb.mjs status |
| Select | node scripts/pb.mjs next --claim |
| Act | open skills/<id>/SKILL.md, follow processes/<id>.yaml |
| Verify | node scripts/pb.mjs validate + validate --task <id> |
| Record | node scripts/pb.mjs record ... (done is enforced) |
| Report | node scripts/pb.mjs report |
See SKILL.md for the full contract and skills-first routing.
Tasks in memory/backlog.yaml carry executable checks:
- id: T7
title: Add a sitemap generator
status: todo
skill: run-task
priority: 1
acceptance_checks:
- node scripts/generate-sitemap.mjs --dry-run
- node scripts/pb.mjs validateEach check runs with cwd = the playbook root; exit 0 = pass. pb record --status done runs them
all and exits 1 on any failure, telling the agent to fix the work or record blocked instead.
--skip-checks exists as an escape hatch, but the skip is stamped on the journal entry and
flagged in reports (⚠checks-skipped) — it can't be hidden.
A task without checks is verified on the agent's honor only, and pb next says so when claiming it.
Tasks may also declare dependencies: [T1, T2] — a task isn't claimable until its dependencies
are done.
State lives on disk, never only in chat. Two commands keep the playbook in an agent's attention:
pb anchor [--brief]— prints the tiny constitution; cheap enough to re-inject every turn.pb checkpoint [--snapshot]— heartbeat: re-anchors, detects drift (multiple claims, claimed work with no record, red guardrails), and--snapshotwritesmemory/RESUME.mdfor cold resume.
Wire them into runtime hooks so the agent never has to remember (Claude Code example):
SessionStart → pb anchor, UserPromptSubmit → pb anchor --brief, PreCompact →
pb checkpoint --snapshot. See the harden skill.
Use pb loop new to open a durable loop epoch. New pb record entries are stamped with the active
loop_id, and loop-scoped artifacts live under artifacts/loops/<loop_id>/.
Close clean loops with pb loop close --status done. Close contaminated runs with
pb loop close --status failed --reason "..."; that writes a quarantine artifact and blocks the
next pb loop new until a lesson is recorded with pb learn --loop <id> --source user --notes "...".
Promote reusable lessons into project memory, backlog tasks, or skills/processes.
- Add tasks to
memory/backlog.yaml— with executableacceptance_checkswhenever possible. - Add a workflow:
processes/<id>.yaml(+ register inprocesses/index.yaml) andskills/<id>/SKILL.md(+ register inskills/index.yaml). - Record durable facts in
memory/project-memory.md.
node <engine>/scripts/pb.mjs scaffold --target <repo>/.agent-playbookCopy-don't-clobber: existing files are never overwritten (except pb.mjs itself, which is the
engine). Then npm install and node scripts/pb.mjs bootstrap. The CLI resolves paths relative
to its own folder, so it travels intact. Full lifecycle in INSTALL.md.
node scripts/pb.mjs validate checks the master + indices parse, every referenced file exists,
skills point to real processes, the backlog (statuses, dependencies, check declarations) and
journal are well-formed. It exits non-zero on failure, so it drops cleanly into CI or a
pre-commit hook. validate --task <id> runs one task's acceptance checks.
One: js-yaml. Node >= 18.
Earlier versions carried a spec/Work-Map layer (DAG scheduling, gates, waves, debt ledgers).
It was planning metadata the CLI never executed — bureaucracy cosplaying as machinery. It now
lives in attic/ and the engine is ~40% smaller. If a real workload ever needs orchestration,
build it against demonstrated need, not anticipation.