From 25173662a326c25d6584fa52d336f34188a16b00 Mon Sep 17 00:00:00 2001 From: Christopher Plain Date: Fri, 27 Mar 2026 19:29:01 -0700 Subject: [PATCH 01/18] docs(guide): add user guides with SDD workflow doc Add docs/guide/ section for practical usage guidance, separate from design specs. Includes an incremental spec-driven development workflow guide and placeholder files for design specs, prompts, and configuration guides. --- docs/guide/README.md | 12 ++++++ docs/guide/configuration.md | 3 ++ docs/guide/design-specs.md | 3 ++ docs/guide/prompts.md | 3 ++ docs/guide/workflow.md | 86 +++++++++++++++++++++++++++++++++++++ 5 files changed, 107 insertions(+) create mode 100644 docs/guide/README.md create mode 100644 docs/guide/configuration.md create mode 100644 docs/guide/design-specs.md create mode 100644 docs/guide/prompts.md create mode 100644 docs/guide/workflow.md diff --git a/docs/guide/README.md b/docs/guide/README.md new file mode 100644 index 0000000..3f127f2 --- /dev/null +++ b/docs/guide/README.md @@ -0,0 +1,12 @@ +# Lorah Guides + +Practical guides for using Lorah effectively. These cover _how to use_ Lorah — for _how Lorah is built_, see the [design specifications](../design/README.md). + +## Index + +| Guide | Description | +| ------------------------------------ | ---------------------------------------------------------- | +| [workflow.md](workflow.md) | An incremental spec-driven development workflow pattern | +| [design-specs.md](design-specs.md) | How to write design specs that agents can reliably execute | +| [prompts.md](prompts.md) | How to write effective prompt files for the agent loop | +| [configuration.md](configuration.md) | Setting up a .lorah project directory | diff --git a/docs/guide/configuration.md b/docs/guide/configuration.md new file mode 100644 index 0000000..b670211 --- /dev/null +++ b/docs/guide/configuration.md @@ -0,0 +1,3 @@ +# Guide: Configuration + +TODO diff --git a/docs/guide/design-specs.md b/docs/guide/design-specs.md new file mode 100644 index 0000000..a3a97e6 --- /dev/null +++ b/docs/guide/design-specs.md @@ -0,0 +1,3 @@ +# Guide: Writing Design Specs + +TODO diff --git a/docs/guide/prompts.md b/docs/guide/prompts.md new file mode 100644 index 0000000..71287ea --- /dev/null +++ b/docs/guide/prompts.md @@ -0,0 +1,3 @@ +# Guide: Writing Prompts + +TODO diff --git a/docs/guide/workflow.md b/docs/guide/workflow.md new file mode 100644 index 0000000..26253a9 --- /dev/null +++ b/docs/guide/workflow.md @@ -0,0 +1,86 @@ +# Workflow: Incremental Spec-Driven Development + +Lorah provides the loop. How you structure the work inside that loop is up to you. There are many valid approaches — this document presents one pattern that works well for spec-driven development. + +## Overview + +This is a spec-driven development (SDD) workflow. Unlike test-driven development where tests drive the design, here the spec drives the design — tests verify the spec, and code satisfies the tests. + +The workflow has two parts: a one-time scoping step, then a repeating loop of task selection, testing, and implementation. + +``` +Scope the work + └─► Loop + ├─ Select next task + ├─ Write tests + ├─ Implement + └─ Repeat until done +``` + +Each step is handled by a fresh agent. Agents maintain continuity through git history and a living scope document — not shared memory. + +## Phase 1: Scope the work + +Before the loop begins, an agent reviews the design specs and produces a scope document. This is not a full task breakdown. It defines: + +- **What is being built** — the boundaries of this unit of work. +- **What done looks like** — concrete, verifiable acceptance criteria. + +The scope document is the contract between the human and the agent loop. It should be specific enough that an agent can determine whether the work is complete by checking git state and test results. Avoid subjective criteria. + +This step runs once. The loop handles everything else. + +## Phase 2: Select the next task + +Each iteration begins with an agent reviewing: + +- The design specs (authoritative source of truth). +- The scope document (boundaries and definition of done). +- Current git state (what has already been built). + +Based on this, the agent identifies and documents the single next task to work on. It does not plan beyond the immediate next step. + +This is where the workflow diverges from upfront planning. Instead of decomposing all work at the start, each task is chosen with full knowledge of what exists now. This means: + +- **Ordering adapts** to what was actually built, not what was predicted. +- **No plan drift** — there is no detailed plan to become stale. +- **The agent naturally sequences** foundational work before dependent work, because it can see what is missing. + +The quality of task selection depends on the quality of the design specs. If the specs clearly define boundaries and behavior, the agent has a deterministic contract to work against. Ambiguity in specs propagates into ambiguity in task selection. + +## Phase 3: Write tests + +An agent writes tests for the selected task based on the design spec. The spec defines the intended behavior; the tests encode it as a verifiable contract between this agent and the implementation agent that follows. A passing test suite means the task is complete. + +Test quality is the bottleneck of the entire workflow. If the tests are shallow or misinterpret the spec, the implementation agent will write code that passes bad tests. + +## Phase 4: Implement + +An agent writes code to pass the tests. It can see the tests, the specs, and the full git history. When the tests pass, it exits. + +If the implementation agent encounters an issue — an ambiguous spec, a flawed test, or a dependency it cannot resolve — it documents the issue in the scope document before exiting. The next task selection agent picks this up. + +## Loop + +Return to Phase 2. The task selection agent checks the scope document's definition of done against current state. If all acceptance criteria are met, the work is complete and the loop ends. + +## Key properties + +**Agent isolation with continuity.** Each agent starts fresh, but git history and the scope document provide full context. This prevents context pollution while maintaining coherence across iterations. + +**Tests as contract.** Tests are the handoff mechanism between agents. They encode the spec as verifiable assertions, removing ambiguity about what "done" means for each task. + +**Specs are the quality ceiling.** The entire workflow is only as good as the design specs. Clear boundaries and unambiguous behavior definitions produce reliable agent output. Vague specs produce drift. + +**Incremental over upfront.** Planning the next task is a simpler, more reliable problem than planning all tasks. Each decision is made with maximum information. + +## Alternatives + +This is one workflow among many. Other valid patterns include: + +- **Upfront task planning** — decompose all work before the loop begins. Simpler coordination, but plans can drift as implementation diverges from predictions. +- **No formal testing phase** — the implementation agent writes its own tests. Faster iteration, but loses the contract between test and implementation agents. +- **Parallel execution** — run independent tasks concurrently. Higher throughput when tasks are truly independent. +- **Single-agent iterations** — one agent handles task selection, testing, and implementation in a single loop iteration. Less overhead, but larger context per agent. + +The right workflow depends on the nature of the work, the specificity of the specs, and how much you trust a single agent context to handle. From a819b6e6daab15839b8dae52eb6afc943d5f0b67 Mon Sep 17 00:00:00 2001 From: Christopher Plain Date: Fri, 27 Mar 2026 22:24:36 -0700 Subject: [PATCH 02/18] docs(guide): add design specs writing guide Covers the spec-writing process (6-step engineer-agent collaboration) and a quality reference checklist for agent consumption, testability, decision rationale, and spec organization. --- docs/guide/design-specs.md | 167 ++++++++++++++++++++++++++++++++++++- 1 file changed, 166 insertions(+), 1 deletion(-) diff --git a/docs/guide/design-specs.md b/docs/guide/design-specs.md index a3a97e6..37b89ec 100644 --- a/docs/guide/design-specs.md +++ b/docs/guide/design-specs.md @@ -1,3 +1,168 @@ # Guide: Writing Design Specs -TODO +A design spec is a behavioral contract — it defines what the system does, not how to code it or how to use it. Specs are the single source of truth for their domain. If the spec and the code disagree, one of them has a bug. Specs are stable during execution — modifying a spec mid-loop invalidates the scope document, existing tests, and completed work derived from it. Changes happen between units of work, not during them. Specs are not tutorials, READMEs, API reference docs, or implementation plans. + +## Part 1: The Spec-Writing Process + +A spec emerges from a structured conversation between the engineer and the agent. The engineer holds the intent; the agent drives the structure. At each step, the agent drafts content, presents it to the engineer for feedback, and iterates until aligned. + +### Step 1. Establish intent + +The engineer describes what they want to build. The agent restates this in behavioral terms — what the system does, not how it works — and confirms alignment before proceeding. + +### Step 2. Drive discovery + +The agent's primary job during discovery is to surface what the engineer has not yet articulated. Three categories of questions drive this: + +- **Boundary questions.** For each input, ask: "What happens when this is empty? Missing? Malformed? The wrong type?" Walk every input systematically. Unasked boundary questions become unspecified edge cases. +- **Interaction questions.** "What other components read or write this data? What breaks if this format changes?" These surface cross-boundary contracts — the shared schemas, storage formats, and data flows that must be specified because multiple consumers depend on them. +- **Negative questions.** "What does this component explicitly not do?" These drive Non-Goals, which are frequently absent from first drafts. An engineer who says "it just handles routing" has implicit Non-Goals that need to be made explicit. + +Discovery is sufficient when all four conditions are met: + +- Every identified input has defined behavior for empty, missing, and invalid cases. +- Every identified output has a defined format and error representation. +- Cross-boundary contracts (shared data formats, storage schemas) are identified. +- Non-Goals have been explicitly discussed. + +If the engineer gives a vague answer or wants to skip ahead, the agent flags the specific risk — unspecified edge cases become coin flips in implementation — and asks targeted follow-ups. + +### Step 3. Draft the overview + +The agent drafts the Overview section — Purpose, Goals, and Non-Goals — and gets explicit agreement on scope before detailing behavior. Misalignment here compounds in every later section. + +Use this scaffold as the starting structure: + +```markdown +# Specification + +--- + +## 1. Overview + +### Purpose + +What this component does and why it exists — one paragraph. + +### Goals + +- Bulleted list of what this spec defines. + +### Non-Goals + +- Bulleted list of active exclusions. + +--- + +## 2–N. [Topic-specific sections] + +The middle sections define the component's behavior. Their shape +depends on what the component does: + +- If it has a user-facing interface (CLI, API), define it first — + commands, flags, endpoints, parameters. +- If it has distinct behavioral modes or lifecycle phases, give + each its own section. +- If it has data structures or storage, specify the schema. +- If it has internal rules or algorithms, describe them precisely + enough to test against. + +Use tables, code blocks, and subsections as the content demands. + +--- + +## N+1. Examples + +5–15 concrete input/output examples. These become test cases. + +--- + +## N+2. Related Specifications + +- Links to specs that interact with this one. +``` + +The invariant parts — Overview, Examples, Related Specifications — appear in every spec. The middle sections are where each spec diverges based on what it describes. Their shape is dictated by the content, not a template. + +### Step 4. Fill behavioral sections + +The agent drafts each behavioral section using present-tense declarative statements — "The CLI exits 1 on unknown command," not "should exit" or "ideally exits." Define observable behavior at the boundary (inputs, outputs, error cases, side effects), not internal implementation. As each section takes shape, apply the testability check: if a claim cannot be imagined as a test assertion, it is too vague — fix it now, not later. After drafting each section, consult the Part 2 checklist to review tone, structure, and testability. + +The difference between a testable spec and a vague one is concrete, observable values: + +- Vague: "The program handles Ctrl+C gracefully." + Precise: "First SIGINT sets a stopping flag and lets the current iteration complete. Second SIGINT calls os.Exit(0) immediately." +- Vague: "Long tool inputs are truncated." + Precise: "Tool inputs with more than one line display the first line followed by `... +N lines` where N is the remaining line count." +- Vague: "The system creates a default file if none exists." + Precise: "If tasks.json does not exist on Load, return an empty TaskList with Version 1.0. Do not create the file on disk until the first Save." + +Each vague version reads as reasonable prose. Each precise version can be directly encoded as a test assertion. The gap between them is where implementations silently diverge from intent. + +### Step 5. Add concrete examples + +The agent drafts five to fifteen input/output pairs. Examples surface specification gaps — if an example is hard to write, the underlying behavior is underspecified. Return to the relevant behavioral section and tighten it. + +### Step 6. Check readiness + +The agent checks readiness against these criteria: + +- Every behavioral claim has an imaginable test assertion. +- Every input has defined behavior for empty, missing, and invalid cases. +- Non-Goals explicitly exclude the most likely scope creep. + +If any criterion fails, the agent returns to the relevant step. + +The process is not strictly linear. Examples frequently reveal gaps that send the conversation back to discovery or behavioral drafting. This is expected — each cycle tightens the spec. + +## Part 2: Spec Quality Reference + +A review checklist for the agent to apply after drafting each section. These properties define what makes a spec effective for agent consumption and execution. + +### Writing for agent consumption + +Agents parse structure, match patterns, and extract requirements. How a spec is written directly affects how reliably agents can execute against it. + +**Prescriptive tone.** Use present-tense declarative statements. "The CLI exits 1 on unknown command" — not "should exit" or "ideally exits." Hedging language creates ambiguity that agents cannot resolve. If the behavior is defined, state it as fact. + +**Scannable structure.** Each spec uses numbered top-level sections with horizontal rule dividers. An agent working on a specific concern can jump to the relevant section without parsing everything above. Within sections: + +- Tables for reference data (flags, exit codes, field schemas, commands). +- Code blocks for function signatures, JSON formats, and CLI invocations. +- Subsection headings for distinct behavioral areas. + +Maintain consistent structure across specs. When agents learn the pattern from one spec, they can efficiently navigate all others. + +**Behavior at the boundary, not implementation behind it.** Specs define what a component does as observed from the outside — its inputs, outputs, error cases, and side effects. They do not prescribe internal implementation. How a function achieves its result is the implementer's decision, not the spec's. The exception is when an internal detail becomes a cross-boundary concern: shared data formats, storage schemas, or contracts that multiple components depend on. These must be specified because changing them affects more than one consumer. + +**Explicit scope boundaries.** Every spec needs Goals and Non-Goals. Goals define what to build. Non-Goals are equally important — they define what to not build, preventing scope creep and gold-plating. Non-Goals are active exclusions, not a "future work" list. + +**Defined vocabulary.** Define terms once in a shared glossary and use them consistently. Agents treat synonyms as distinct concepts. If the glossary says "loop iteration," do not alternate with "cycle" or "run" elsewhere. + +### Testability + +The spec-driven workflow depends on agents writing tests directly from the spec. A spec that cannot be tested cannot be verified. + +Every behavioral claim in a spec should be verifiable by a test. If you cannot imagine the assertion, the spec is too vague. Prefer concrete values over abstract descriptions — "exits 1" is testable, "exits with an error code" is not. + +Examples are test cases. A rich examples section gives agents concrete input/output pairs to encode as assertions. Five to fifteen examples per spec is typical. Skimping on examples forces agents to invent test cases, which means inventing behavior the spec did not define. + +Edge cases belong in the spec. If the spec does not say what happens on empty input, an unknown flag, or a missing file, the agent will guess. Every unspecified edge case is a coin flip in the implementation. See Step 4 for concrete examples of vague vs. precise specification. + +### Decision rationale + +Agents make judgment calls at the edges of every spec. When they understand why a design choice was made, they make better decisions about cases the spec does not explicitly cover. + +Record why, not just what. A sentence of rationale per non-obvious decision prevents agents from optimizing away intentional constraints. Keep the rationale inline, close to the decision it explains — not in a separate document the agent may not read. + +This is the one area where conversational tone is appropriate in a spec. "Uses a switch statement instead of a command registry because there are only two commands and simplicity outweighs extensibility" gives an agent the information it needs to preserve that choice. + +### Spec organization + +Specs must support efficient partial consumption — a fresh agent each iteration reads the specs to select its next task. + +**Cross-reference, do not duplicate.** When two specs interact, link between them. Duplicated content diverges over time, and agents cannot know which copy is authoritative. + +**Self-contained sections.** An agent working on output formatting should be able to read the relevant section of the output spec without reading every section that precedes it. Each section should establish its own context. + +**One spec per logical unit.** A logical unit is a behavioral domain that can be independently tested. Split specs by what a component _does_ — its observable behavior — not by file or package. If two behaviors can be tested without referencing each other, they belong in separate specs. If testing one requires understanding the other, they either belong together or need an explicit cross-reference. Focused specs reduce noise and context waste. From cfd05a8ce46c16d51917245c40c93e9c928eb257 Mon Sep 17 00:00:00 2001 From: Christopher Plain <me@christopherplain.com> Date: Fri, 27 Mar 2026 22:43:34 -0700 Subject: [PATCH 03/18] docs(guide): simplify AGENTS.md to minimal instructions --- AGENTS.md | 54 +++--------------------------------------------------- 1 file changed, 3 insertions(+), 51 deletions(-) diff --git a/AGENTS.md b/AGENTS.md index 2c1ae2d..6eef817 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -1,52 +1,4 @@ -# Lorah +If you encounter something surprising or confusing in this project, flag it as a comment. -## Project Overview - -Lorah is a simple infinite-loop harness for long-running autonomous coding agents. It runs Claude Code CLI in a continuous loop, parsing stream-JSON output and formatting it nicely. Includes a task management system for structured agent workflow coordination. The agent manages its own workflow — Lorah just provides the loop, error recovery, output formatting, and task tracking. Follows the Ralph pattern. Distributed as a single self-contained binary with no external runtime dependencies. - -## Commands - -```bash -make build # Build binary -go run . run PROMPT.md # Development run -lorah run PROMPT.md [flags...] # Run loop (all flags after prompt passed to claude CLI) -lorah task <subcommand> [args...] # Task management -``` - -Use TDD — write tests before implementation. Use `make fmt` and `make lint`. - -## Architecture - -``` -main.go CLI router: subcommand dispatch, help text, version -internal/loop/ - loop.go Run() entry point, signal handling, infinite loop - claude.go Subprocess execution - output.go Stream-JSON parsing and formatted output - constants.go ANSI colors, buffer size, retry delay -internal/task/ - task.go Core types: Phase, Section, Task, TaskStatus, TaskList, Filter - storage.go Storage interface - json_storage.go JSONStorage implementation (.lorah/tasks.json) - format.go Output formatters: json, markdown - cmd.go CLI subcommand handlers -docs/design/ Design specifications (authoritative reference) -``` - -## Design Principles - -**Ralph Philosophy**: The agent is smart enough to manage its own workflow. Don't orchestrate — provide a simple loop and trust the model. - -**Radical Simplicity**: Every line of code is overhead. The simplest solution that works is the best solution. Prefer deleting code over adding it. - -**Agent is in Control**: The harness provides the loop and nice output. The agent reads the codebase, decides what to do, and makes progress. No phase management needed. - -**No Ceremony**: No config files, session state, lock files, or scaffolding commands. Just a prompt file and a loop. - -**Filesystem as State**: No session files. Git commits show progress. Agent reads files to understand context. - -**Design Specifications**: Authoritative design docs live in `docs/design/`. When in doubt about intended behavior, consult the specs: `cli.md`, `run.md`, `output.md`, `task.md`. - -## Dependencies - -No external runtime dependencies. All functionality uses the Go standard library. The `claude` CLI (separate install) is the only runtime requirement. +- `make fmt` +- `make lint` From 870749c41b1f8176e5b32085e6b3fdeb8ed8f9cb Mon Sep 17 00:00:00 2001 From: Christopher Plain <me@christopherplain.com> Date: Sat, 28 Mar 2026 10:35:55 -0700 Subject: [PATCH 04/18] docs(guide): restructure design specs from prescriptive to descriptive Replace the step-by-step spec-writing process with a properties-based reference. The guide now defines what a good spec looks like through nine properties with concrete examples and a readiness checklist, rather than prescribing how to produce one. The engineer-agent pair determines their own path to a spec that meets the criteria. --- docs/guide/design-specs.md | 132 ++++++++++++++++--------------------- 1 file changed, 57 insertions(+), 75 deletions(-) diff --git a/docs/guide/design-specs.md b/docs/guide/design-specs.md index 37b89ec..7cdd2e9 100644 --- a/docs/guide/design-specs.md +++ b/docs/guide/design-specs.md @@ -2,36 +2,11 @@ A design spec is a behavioral contract — it defines what the system does, not how to code it or how to use it. Specs are the single source of truth for their domain. If the spec and the code disagree, one of them has a bug. Specs are stable during execution — modifying a spec mid-loop invalidates the scope document, existing tests, and completed work derived from it. Changes happen between units of work, not during them. Specs are not tutorials, READMEs, API reference docs, or implementation plans. -## Part 1: The Spec-Writing Process +A spec emerges through iteration between an engineer and an agent. The engineer holds the intent; the agent drives the structure. The properties below guide each pass — they are not a checklist to complete once. How the engineer-agent pair reaches a spec that meets these properties is up to them. -A spec emerges from a structured conversation between the engineer and the agent. The engineer holds the intent; the agent drives the structure. At each step, the agent drafts content, presents it to the engineer for feedback, and iterates until aligned. +## Spec Structure -### Step 1. Establish intent - -The engineer describes what they want to build. The agent restates this in behavioral terms — what the system does, not how it works — and confirms alignment before proceeding. - -### Step 2. Drive discovery - -The agent's primary job during discovery is to surface what the engineer has not yet articulated. Three categories of questions drive this: - -- **Boundary questions.** For each input, ask: "What happens when this is empty? Missing? Malformed? The wrong type?" Walk every input systematically. Unasked boundary questions become unspecified edge cases. -- **Interaction questions.** "What other components read or write this data? What breaks if this format changes?" These surface cross-boundary contracts — the shared schemas, storage formats, and data flows that must be specified because multiple consumers depend on them. -- **Negative questions.** "What does this component explicitly not do?" These drive Non-Goals, which are frequently absent from first drafts. An engineer who says "it just handles routing" has implicit Non-Goals that need to be made explicit. - -Discovery is sufficient when all four conditions are met: - -- Every identified input has defined behavior for empty, missing, and invalid cases. -- Every identified output has a defined format and error representation. -- Cross-boundary contracts (shared data formats, storage schemas) are identified. -- Non-Goals have been explicitly discussed. - -If the engineer gives a vague answer or wants to skip ahead, the agent flags the specific risk — unspecified edge cases become coin flips in implementation — and asks targeted follow-ups. - -### Step 3. Draft the overview - -The agent drafts the Overview section — Purpose, Goals, and Non-Goals — and gets explicit agreement on scope before detailing behavior. Misalignment here compounds in every later section. - -Use this scaffold as the starting structure: +A spec has three invariant parts — Overview, Examples, Related Specifications — and topic-specific behavioral sections in between. The middle sections diverge based on what the component does; their shape is dictated by the content, not a template. ```markdown # <Title> Specification @@ -82,87 +57,94 @@ Use tables, code blocks, and subsections as the content demands. - Links to specs that interact with this one. ``` -The invariant parts — Overview, Examples, Related Specifications — appear in every spec. The middle sections are where each spec diverges based on what it describes. Their shape is dictated by the content, not a template. +**One spec per logical unit.** A logical unit is a behavioral domain that can be independently tested. Split specs by what a component _does_ — its observable behavior — not by file or package. If two behaviors can be tested without referencing each other, they belong in separate specs. If testing one requires understanding the other, they either belong together or need an explicit cross-reference. -### Step 4. Fill behavioral sections +**Cross-reference, do not duplicate.** When two specs interact, link between them. Duplicated content diverges over time, and agents cannot know which copy is authoritative. -The agent drafts each behavioral section using present-tense declarative statements — "The CLI exits 1 on unknown command," not "should exit" or "ideally exits." Define observable behavior at the boundary (inputs, outputs, error cases, side effects), not internal implementation. As each section takes shape, apply the testability check: if a claim cannot be imagined as a test assertion, it is too vague — fix it now, not later. After drafting each section, consult the Part 2 checklist to review tone, structure, and testability. +**Self-contained sections.** An agent working on output formatting should be able to read the relevant section of the output spec without reading every section that precedes it. Each section should establish its own context. -The difference between a testable spec and a vague one is concrete, observable values: +## Properties of a Good Spec -- Vague: "The program handles Ctrl+C gracefully." - Precise: "First SIGINT sets a stopping flag and lets the current iteration complete. Second SIGINT calls os.Exit(0) immediately." -- Vague: "Long tool inputs are truncated." - Precise: "Tool inputs with more than one line display the first line followed by `... +N lines` where N is the remaining line count." -- Vague: "The system creates a default file if none exists." - Precise: "If tasks.json does not exist on Load, return an empty TaskList with Version 1.0. Do not create the file on disk until the first Save." +These properties define what makes a spec effective. When properties conflict, prioritize testability and boundary-completeness over scannability. -Each vague version reads as reasonable prose. Each precise version can be directly encoded as a test assertion. The gap between them is where implementations silently diverge from intent. +### Behavioral, not implementational -### Step 5. Add concrete examples +Specs define what a component does as observed from the outside — its inputs, outputs, error cases, and side effects. They do not prescribe internal implementation. How a function achieves its result is the implementer's decision, not the spec's. -The agent drafts five to fifteen input/output pairs. Examples surface specification gaps — if an example is hard to write, the underlying behavior is underspecified. Return to the relevant behavioral section and tighten it. +The exception is when an internal detail becomes a cross-boundary concern: shared data formats, storage schemas, or contracts that multiple components depend on. These must be specified because changing them affects more than one consumer. -### Step 6. Check readiness +### Prescriptive tone -The agent checks readiness against these criteria: +Use present-tense declarative statements. "The CLI exits 1 on unknown command" — not "should exit" or "ideally exits." Hedging language creates ambiguity that agents cannot resolve. If the behavior is defined, state it as fact. -- Every behavioral claim has an imaginable test assertion. -- Every input has defined behavior for empty, missing, and invalid cases. -- Non-Goals explicitly exclude the most likely scope creep. +### Testable -If any criterion fails, the agent returns to the relevant step. +Every behavioral claim in a spec should be verifiable by a test. If you cannot imagine the assertion, the spec is too vague. Prefer concrete values over abstract descriptions — "exits 1" is testable, "exits with an error code" is not. -The process is not strictly linear. Examples frequently reveal gaps that send the conversation back to discovery or behavioral drafting. This is expected — each cycle tightens the spec. +The difference between a testable spec and a vague one is concrete, observable values: -## Part 2: Spec Quality Reference +- Vague: "The program handles Ctrl+C gracefully." + Precise: "First SIGINT sets a stopping flag and lets the current iteration complete. Second SIGINT calls os.Exit(0) immediately." +- Vague: "Long tool inputs are truncated." + Precise: "Tool inputs with more than one line display the first line followed by `... +N lines` where N is the remaining line count." +- Vague: "The system creates a default file if none exists." + Precise: "If tasks.json does not exist on Load, return an empty TaskList with Version 1.0. Do not create the file on disk until the first Save." -A review checklist for the agent to apply after drafting each section. These properties define what makes a spec effective for agent consumption and execution. +Each vague version reads as reasonable prose. Each precise version can be directly encoded as a test assertion. The gap between them is where implementations silently diverge from intent. -### Writing for agent consumption +If a claim is hard to make precise, the behavior is underspecified — return to it and tighten it before moving on. -Agents parse structure, match patterns, and extract requirements. How a spec is written directly affects how reliably agents can execute against it. +### Boundary-complete -**Prescriptive tone.** Use present-tense declarative statements. "The CLI exits 1 on unknown command" — not "should exit" or "ideally exits." Hedging language creates ambiguity that agents cannot resolve. If the behavior is defined, state it as fact. +Every input has defined behavior for empty, missing, and invalid cases. Every output has a defined format and error representation. Unspecified edge cases become coin flips in implementation. -**Scannable structure.** Each spec uses numbered top-level sections with horizontal rule dividers. An agent working on a specific concern can jump to the relevant section without parsing everything above. Within sections: +Three categories of questions surface gaps: -- Tables for reference data (flags, exit codes, field schemas, commands). -- Code blocks for function signatures, JSON formats, and CLI invocations. -- Subsection headings for distinct behavioral areas. +- **Boundary questions.** For each input: "What happens when this is empty? Missing? Malformed? The wrong type?" Walk every input systematically. Unasked boundary questions become unspecified edge cases. +- **Interaction questions.** "What other components read or write this data? What breaks if this format changes?" These surface cross-boundary contracts — the shared schemas, storage formats, and data flows that must be specified because multiple consumers depend on them. Cross-boundary contracts are a common blind spot. +- **Negative questions.** "What does this component explicitly not do?" These drive Non-Goals, which are frequently absent from first drafts. An engineer who says "it just handles routing" has implicit Non-Goals that need to be made explicit. -Maintain consistent structure across specs. When agents learn the pattern from one spec, they can efficiently navigate all others. +### Explicitly scoped -**Behavior at the boundary, not implementation behind it.** Specs define what a component does as observed from the outside — its inputs, outputs, error cases, and side effects. They do not prescribe internal implementation. How a function achieves its result is the implementer's decision, not the spec's. The exception is when an internal detail becomes a cross-boundary concern: shared data formats, storage schemas, or contracts that multiple components depend on. These must be specified because changing them affects more than one consumer. +Every spec needs Goals and Non-Goals. Goals define what to build. Non-Goals are equally important — they define what to not build, preventing scope creep and gold-plating. Non-Goals are active exclusions, not a "future work" list. They should exclude the most likely scope creep. -**Explicit scope boundaries.** Every spec needs Goals and Non-Goals. Goals define what to build. Non-Goals are equally important — they define what to not build, preventing scope creep and gold-plating. Non-Goals are active exclusions, not a "future work" list. +### Concrete examples -**Defined vocabulary.** Define terms once in a shared glossary and use them consistently. Agents treat synonyms as distinct concepts. If the glossary says "loop iteration," do not alternate with "cycle" or "run" elsewhere. +A rich examples section gives agents concrete input/output pairs to encode as test assertions. Five to fifteen examples per spec is typical. Skimping on examples forces agents to invent test cases, which means inventing behavior the spec did not define. -### Testability +Examples that are hard to write indicate underspecified behavior — the underlying behavioral section needs tightening. -The spec-driven workflow depends on agents writing tests directly from the spec. A spec that cannot be tested cannot be verified. +### Decision rationale -Every behavioral claim in a spec should be verifiable by a test. If you cannot imagine the assertion, the spec is too vague. Prefer concrete values over abstract descriptions — "exits 1" is testable, "exits with an error code" is not. +Agents make judgment calls at the edges of every spec. When they understand why a design choice was made, they make better decisions about cases the spec does not explicitly cover. -Examples are test cases. A rich examples section gives agents concrete input/output pairs to encode as assertions. Five to fifteen examples per spec is typical. Skimping on examples forces agents to invent test cases, which means inventing behavior the spec did not define. +Record why, not just what. A sentence of rationale per non-obvious decision prevents agents from optimizing away intentional constraints. Apply to non-obvious constraints and rejected alternatives — self-evident decisions don't need rationale. Keep the rationale inline, close to the decision it explains — not in a separate document the agent may not read. -Edge cases belong in the spec. If the spec does not say what happens on empty input, an unknown flag, or a missing file, the agent will guess. Every unspecified edge case is a coin flip in the implementation. See Step 4 for concrete examples of vague vs. precise specification. +This is the one area where conversational tone is appropriate in a spec. "Uses a switch statement instead of a command registry because there are only two commands and simplicity outweighs extensibility" gives an agent the information it needs to preserve that choice. -### Decision rationale +### Defined vocabulary -Agents make judgment calls at the edges of every spec. When they understand why a design choice was made, they make better decisions about cases the spec does not explicitly cover. +Define terms once in a shared glossary and use them consistently. Agents treat synonyms as distinct concepts. If the glossary says "loop iteration," do not alternate with "cycle" or "run" elsewhere. -Record why, not just what. A sentence of rationale per non-obvious decision prevents agents from optimizing away intentional constraints. Keep the rationale inline, close to the decision it explains — not in a separate document the agent may not read. +### Scannable structure -This is the one area where conversational tone is appropriate in a spec. "Uses a switch statement instead of a command registry because there are only two commands and simplicity outweighs extensibility" gives an agent the information it needs to preserve that choice. +Each spec uses numbered top-level sections with horizontal rule dividers. An agent working on a specific concern can jump to the relevant section without parsing everything above. Within sections: + +- Tables for reference data (flags, exit codes, field schemas, commands). +- Code blocks for function signatures, JSON formats, and CLI invocations. +- Subsection headings for distinct behavioral areas. -### Spec organization +Maintain consistent structure across specs. When agents learn the pattern from one spec, they can efficiently navigate all others. -Specs must support efficient partial consumption — a fresh agent each iteration reads the specs to select its next task. +## Readiness Checklist -**Cross-reference, do not duplicate.** When two specs interact, link between them. Duplicated content diverges over time, and agents cannot know which copy is authoritative. +A spec is ready for implementation when all of these hold: -**Self-contained sections.** An agent working on output formatting should be able to read the relevant section of the output spec without reading every section that precedes it. Each section should establish its own context. +- [ ] Can you write a test assertion for every behavioral claim? +- [ ] Are all inputs covered for empty, missing, and invalid cases? +- [ ] Does every output have a defined format and error representation? +- [ ] Are cross-boundary contracts identified and specified? +- [ ] Do Non-Goals actively exclude the most likely scope creep? +- [ ] Do 5–15 concrete examples exist and were they easy to write? -**One spec per logical unit.** A logical unit is a behavioral domain that can be independently tested. Split specs by what a component _does_ — its observable behavior — not by file or package. If two behaviors can be tested without referencing each other, they belong in separate specs. If testing one requires understanding the other, they either belong together or need an explicit cross-reference. Focused specs reduce noise and context waste. +If any criterion fails, the spec needs more work. This is expected — specs tighten through iteration. From cc801af318f4c8c6d3383093291bcb22d51c4afd Mon Sep 17 00:00:00 2001 From: Christopher Plain <me@christopherplain.com> Date: Sat, 28 Mar 2026 11:30:39 -0700 Subject: [PATCH 05/18] docs(guide): sharpen design specs guide for agent clarity Add cross-boundary contract heuristic, glossary location pointer, and expand readiness checklist to cover all properties. --- docs/guide/design-specs.md | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/docs/guide/design-specs.md b/docs/guide/design-specs.md index 7cdd2e9..eb68f34 100644 --- a/docs/guide/design-specs.md +++ b/docs/guide/design-specs.md @@ -57,7 +57,7 @@ Use tables, code blocks, and subsections as the content demands. - Links to specs that interact with this one. ``` -**One spec per logical unit.** A logical unit is a behavioral domain that can be independently tested. Split specs by what a component _does_ — its observable behavior — not by file or package. If two behaviors can be tested without referencing each other, they belong in separate specs. If testing one requires understanding the other, they either belong together or need an explicit cross-reference. +**One spec per logical unit.** A logical unit is a behavioral domain that can be tested in isolation. Split specs by what a component _does_ — its observable behavior — not by file or package. If two behaviors can be tested without referencing each other, they belong in separate specs. If testing one requires understanding the other, they either belong together or need an explicit cross-reference. **Cross-reference, do not duplicate.** When two specs interact, link between them. Duplicated content diverges over time, and agents cannot know which copy is authoritative. @@ -71,7 +71,7 @@ These properties define what makes a spec effective. When properties conflict, p Specs define what a component does as observed from the outside — its inputs, outputs, error cases, and side effects. They do not prescribe internal implementation. How a function achieves its result is the implementer's decision, not the spec's. -The exception is when an internal detail becomes a cross-boundary concern: shared data formats, storage schemas, or contracts that multiple components depend on. These must be specified because changing them affects more than one consumer. +The exception is when an internal detail becomes a cross-boundary concern: shared data formats, storage schemas, or contracts that multiple components depend on. These must be specified because changing them affects more than one consumer. A useful test: if a test in a _different_ spec would assert on this detail, it is a cross-boundary contract and belongs in the spec. If it is consumed only within this component, it is an implementation decision. ### Prescriptive tone @@ -124,7 +124,7 @@ This is the one area where conversational tone is appropriate in a spec. "Uses a ### Defined vocabulary -Define terms once in a shared glossary and use them consistently. Agents treat synonyms as distinct concepts. If the glossary says "loop iteration," do not alternate with "cycle" or "run" elsewhere. +Define terms once in a shared glossary and use them consistently. The glossary lives in `docs/design/README.md`. Agents treat synonyms as distinct concepts. If the glossary says "loop iteration," do not alternate with "cycle" or "run" elsewhere. ### Scannable structure @@ -140,11 +140,15 @@ Maintain consistent structure across specs. When agents learn the pattern from o A spec is ready for implementation when all of these hold: +- [ ] Does the spec define observable behavior without prescribing internal implementation? - [ ] Can you write a test assertion for every behavioral claim? - [ ] Are all inputs covered for empty, missing, and invalid cases? - [ ] Does every output have a defined format and error representation? - [ ] Are cross-boundary contracts identified and specified? - [ ] Do Non-Goals actively exclude the most likely scope creep? - [ ] Do 5–15 concrete examples exist and were they easy to write? +- [ ] Does the spec use present-tense declarative statements without hedging? +- [ ] Do non-obvious decisions include inline rationale? +- [ ] Are terms defined in the glossary and used consistently? If any criterion fails, the spec needs more work. This is expected — specs tighten through iteration. From acb9043d15771ec114d901f28cf6e5a1884090ab Mon Sep 17 00:00:00 2001 From: Christopher Plain <me@christopherplain.com> Date: Sat, 28 Mar 2026 13:41:34 -0700 Subject: [PATCH 06/18] docs(guide): add cross-boundary and side-effect examples to design specs guide Add concrete examples for the two gaps that would cause inconsistent agent output: cross-boundary contract identification (with a borderline case teaching stabilization intent) and testable side-effect specification (file creation with permissions, atomicity, error cases). --- docs/guide/design-specs.md | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/docs/guide/design-specs.md b/docs/guide/design-specs.md index eb68f34..434f332 100644 --- a/docs/guide/design-specs.md +++ b/docs/guide/design-specs.md @@ -73,6 +73,12 @@ Specs define what a component does as observed from the outside — its inputs, The exception is when an internal detail becomes a cross-boundary concern: shared data formats, storage schemas, or contracts that multiple components depend on. These must be specified because changing them affects more than one consumer. A useful test: if a test in a _different_ spec would assert on this detail, it is a cross-boundary contract and belongs in the spec. If it is consumed only within this component, it is an implementation decision. +Examples: + +- Cross-boundary: "Tasks are persisted as a `TaskList` JSON object in `tasks.json` with the schema defined in §3." — Multiple specs depend on this format. +- Not cross-boundary: "The router uses a switch statement to dispatch subcommands." — Only this spec's implementation cares. +- Borderline: "The `Storage` interface defines `Load`, `Save`, `Get`, `List`, `Create`, `Update`, `Delete`." — Today only the JSON backend implements it, but the spec declares it as an abstraction point for future backends. Specify it now if the intent is to stabilize it for multiple consumers; leave it as an implementation detail if the interface is still in flux. + ### Prescriptive tone Use present-tense declarative statements. "The CLI exits 1 on unknown command" — not "should exit" or "ideally exits." Hedging language creates ambiguity that agents cannot resolve. If the behavior is defined, state it as fact. @@ -89,6 +95,8 @@ The difference between a testable spec and a vague one is concrete, observable v Precise: "Tool inputs with more than one line display the first line followed by `... +N lines` where N is the remaining line count." - Vague: "The system creates a default file if none exists." Precise: "If tasks.json does not exist on Load, return an empty TaskList with Version 1.0. Do not create the file on disk until the first Save." +- Vague: "The system writes results to an output file." + Precise: "On successful completion, the command writes the result JSON to `{dir}/output.json` with 0644 permissions. If the file exists, it is overwritten atomically via write-to-temp-then-rename. If the directory does not exist, the command returns an error — it does not create parent directories." Each vague version reads as reasonable prose. Each precise version can be directly encoded as a test assertion. The gap between them is where implementations silently diverge from intent. From b2eba10b5049bde864f9af54647ae099c0489b00 Mon Sep 17 00:00:00 2001 From: Christopher Plain <me@christopherplain.com> Date: Sat, 28 Mar 2026 14:44:52 -0700 Subject: [PATCH 07/18] docs(guide): trim design specs guide to reference weight Redundant explanatory prose in the properties section restated what property names and checklist items already convey, creating a document that invited endless iteration without convergence. Cut teaching prose, kept load-bearing material (cross-boundary examples, vague/precise pairs, diagnostic instructions), and removed the "Concrete examples" property entirely as redundant with the template and checklist. --- docs/guide/design-specs.md | 46 +++++++------------------------------- 1 file changed, 8 insertions(+), 38 deletions(-) diff --git a/docs/guide/design-specs.md b/docs/guide/design-specs.md index 434f332..00f9ae3 100644 --- a/docs/guide/design-specs.md +++ b/docs/guide/design-specs.md @@ -69,24 +69,18 @@ These properties define what makes a spec effective. When properties conflict, p ### Behavioral, not implementational -Specs define what a component does as observed from the outside — its inputs, outputs, error cases, and side effects. They do not prescribe internal implementation. How a function achieves its result is the implementer's decision, not the spec's. - -The exception is when an internal detail becomes a cross-boundary concern: shared data formats, storage schemas, or contracts that multiple components depend on. These must be specified because changing them affects more than one consumer. A useful test: if a test in a _different_ spec would assert on this detail, it is a cross-boundary contract and belongs in the spec. If it is consumed only within this component, it is an implementation decision. - -Examples: +Specs define what a component does as observed from the outside — its inputs, outputs, error cases, and side effects. They do not prescribe internal implementation. The exception is cross-boundary contracts: shared data formats, storage schemas, or contracts that multiple components depend on. A useful test: if a test in a _different_ spec would assert on this detail, it is a cross-boundary contract and belongs in the spec. - Cross-boundary: "Tasks are persisted as a `TaskList` JSON object in `tasks.json` with the schema defined in §3." — Multiple specs depend on this format. - Not cross-boundary: "The router uses a switch statement to dispatch subcommands." — Only this spec's implementation cares. -- Borderline: "The `Storage` interface defines `Load`, `Save`, `Get`, `List`, `Create`, `Update`, `Delete`." — Today only the JSON backend implements it, but the spec declares it as an abstraction point for future backends. Specify it now if the intent is to stabilize it for multiple consumers; leave it as an implementation detail if the interface is still in flux. +- Borderline: "The `Storage` interface defines `Load`, `Save`, `Get`, `List`, `Create`, `Update`, `Delete`." — Specify it now if the intent is to stabilize it for multiple consumers; leave it as an implementation detail if the interface is still in flux. ### Prescriptive tone -Use present-tense declarative statements. "The CLI exits 1 on unknown command" — not "should exit" or "ideally exits." Hedging language creates ambiguity that agents cannot resolve. If the behavior is defined, state it as fact. +Use present-tense declarative statements. "The CLI exits 1 on unknown command" — not "should exit" or "ideally exits." ### Testable -Every behavioral claim in a spec should be verifiable by a test. If you cannot imagine the assertion, the spec is too vague. Prefer concrete values over abstract descriptions — "exits 1" is testable, "exits with an error code" is not. - The difference between a testable spec and a vague one is concrete, observable values: - Vague: "The program handles Ctrl+C gracefully." @@ -98,51 +92,27 @@ The difference between a testable spec and a vague one is concrete, observable v - Vague: "The system writes results to an output file." Precise: "On successful completion, the command writes the result JSON to `{dir}/output.json` with 0644 permissions. If the file exists, it is overwritten atomically via write-to-temp-then-rename. If the directory does not exist, the command returns an error — it does not create parent directories." -Each vague version reads as reasonable prose. Each precise version can be directly encoded as a test assertion. The gap between them is where implementations silently diverge from intent. - If a claim is hard to make precise, the behavior is underspecified — return to it and tighten it before moving on. ### Boundary-complete -Every input has defined behavior for empty, missing, and invalid cases. Every output has a defined format and error representation. Unspecified edge cases become coin flips in implementation. - -Three categories of questions surface gaps: - -- **Boundary questions.** For each input: "What happens when this is empty? Missing? Malformed? The wrong type?" Walk every input systematically. Unasked boundary questions become unspecified edge cases. -- **Interaction questions.** "What other components read or write this data? What breaks if this format changes?" These surface cross-boundary contracts — the shared schemas, storage formats, and data flows that must be specified because multiple consumers depend on them. Cross-boundary contracts are a common blind spot. -- **Negative questions.** "What does this component explicitly not do?" These drive Non-Goals, which are frequently absent from first drafts. An engineer who says "it just handles routing" has implicit Non-Goals that need to be made explicit. +Every input has defined behavior for empty, missing, and invalid cases. Every output has a defined format and error representation. ### Explicitly scoped -Every spec needs Goals and Non-Goals. Goals define what to build. Non-Goals are equally important — they define what to not build, preventing scope creep and gold-plating. Non-Goals are active exclusions, not a "future work" list. They should exclude the most likely scope creep. - -### Concrete examples - -A rich examples section gives agents concrete input/output pairs to encode as test assertions. Five to fifteen examples per spec is typical. Skimping on examples forces agents to invent test cases, which means inventing behavior the spec did not define. - -Examples that are hard to write indicate underspecified behavior — the underlying behavioral section needs tightening. +Every spec needs Goals and Non-Goals. Non-Goals are active exclusions, not a "future work" list. ### Decision rationale -Agents make judgment calls at the edges of every spec. When they understand why a design choice was made, they make better decisions about cases the spec does not explicitly cover. - -Record why, not just what. A sentence of rationale per non-obvious decision prevents agents from optimizing away intentional constraints. Apply to non-obvious constraints and rejected alternatives — self-evident decisions don't need rationale. Keep the rationale inline, close to the decision it explains — not in a separate document the agent may not read. - -This is the one area where conversational tone is appropriate in a spec. "Uses a switch statement instead of a command registry because there are only two commands and simplicity outweighs extensibility" gives an agent the information it needs to preserve that choice. +Record why, not just what. Apply to non-obvious constraints and rejected alternatives — self-evident decisions don't need rationale. Keep rationale inline, close to the decision it explains. ### Defined vocabulary -Define terms once in a shared glossary and use them consistently. The glossary lives in `docs/design/README.md`. Agents treat synonyms as distinct concepts. If the glossary says "loop iteration," do not alternate with "cycle" or "run" elsewhere. +Define terms once in `docs/design/README.md` and use them consistently. Agents treat synonyms as distinct concepts. ### Scannable structure -Each spec uses numbered top-level sections with horizontal rule dividers. An agent working on a specific concern can jump to the relevant section without parsing everything above. Within sections: - -- Tables for reference data (flags, exit codes, field schemas, commands). -- Code blocks for function signatures, JSON formats, and CLI invocations. -- Subsection headings for distinct behavioral areas. - -Maintain consistent structure across specs. When agents learn the pattern from one spec, they can efficiently navigate all others. +Each spec uses numbered top-level sections with horizontal rule dividers. Use tables for reference data, code blocks for formats, and subsection headings for distinct behavioral areas. ## Readiness Checklist From b1ed73aa632cdb6101c4cfaa43a2793026bf5091 Mon Sep 17 00:00:00 2001 From: Christopher Plain <me@christopherplain.com> Date: Sat, 28 Mar 2026 21:09:35 -0700 Subject: [PATCH 08/18] docs(guide): add configuration guide and consolidate with prompts Combine the planned prompts and configuration guides into a single configuration guide covering the full .lorah directory setup: plan file, prompt file, settings, and CLI flags. --- docs/guide/README.md | 1 - docs/guide/configuration.md | 81 ++++++++++++++++++++++++++++++++++++- docs/guide/prompts.md | 3 -- 3 files changed, 80 insertions(+), 5 deletions(-) delete mode 100644 docs/guide/prompts.md diff --git a/docs/guide/README.md b/docs/guide/README.md index 3f127f2..f0585d5 100644 --- a/docs/guide/README.md +++ b/docs/guide/README.md @@ -8,5 +8,4 @@ Practical guides for using Lorah effectively. These cover _how to use_ Lorah — | ------------------------------------ | ---------------------------------------------------------- | | [workflow.md](workflow.md) | An incremental spec-driven development workflow pattern | | [design-specs.md](design-specs.md) | How to write design specs that agents can reliably execute | -| [prompts.md](prompts.md) | How to write effective prompt files for the agent loop | | [configuration.md](configuration.md) | Setting up a .lorah project directory | diff --git a/docs/guide/configuration.md b/docs/guide/configuration.md index b670211..1c6b778 100644 --- a/docs/guide/configuration.md +++ b/docs/guide/configuration.md @@ -1,3 +1,82 @@ # Guide: Configuration -TODO +A `.lorah` directory contains the files that define a unit of work for the agent loop. It sits at the project root by default (overridable with `--dir`). + +``` +.lorah/ +├── plan.md # scope and acceptance criteria +├── prompt.md # agent instructions for each iteration +└── settings.json # Claude Code CLI settings +``` + +## Plan file + +The plan file is the output of the scoping step — [Phase 1](workflow.md#phase-1-scope-the-work) in the spec-driven workflow. It defines what is being built and what done looks like. The agent loop uses it as the contract between the human and the agents. + +A plan file contains: + +- **Scope** — what is being built, at the level of a brief description and a list of capabilities. Reference the design specs rather than duplicating them. +- **Boundaries** — constraints and invariants that apply across the work (e.g., "stdlib only", "no external dependencies"). +- **Acceptance criteria** — concrete, verifiable conditions that define when the work is complete. An agent should be able to check each criterion against git state and test results. + +A plan file does not contain individual tasks. Task selection happens inside the loop, where each agent picks the next task based on current state. + +## Prompt file + +The prompt file is a markdown file piped to Claude Code on each loop iteration. It defines the agent's role, workflow, and constraints. This is the primary lever for controlling agent behavior. + +A prompt file typically contains: + +- **Role** — what the agent is and what it does in one sentence. +- **Workflow steps** — the sequence the agent follows each iteration: orient (check git history), select (pick next task), execute, verify, commit, exit. +- **Rules** — hard constraints the agent must follow (e.g., one task per invocation, strict TDD boundary). +- **Blocked workflow** — what to do when the agent encounters an issue it cannot resolve. + +The prompt does not need to be elaborate. Agents are capable of self-managing within clear constraints. Focus on boundaries and invariants rather than detailed instructions for every scenario. + +The prompt references the plan file for scope and the design specs for behavioral details. It should not duplicate either. + +## Settings + +`settings.json` is a standard Claude Code CLI settings file. Pass it via the `--settings` flag: + +```sh +lorah run prompt.md --settings .lorah/settings.json +``` + +Common fields: + +```json +{ + "model": "sonnet", + "permissions": { + "defaultMode": "bypassPermissions" + }, + "sandbox": { + "enabled": true, + "autoAllowBashIfSandboxed": true + }, + "includeCoAuthoredBy": false +} +``` + +- **model** — which Claude model to use. +- **permissions** — `bypassPermissions` is typical for autonomous loops where no human is approving each action. +- **sandbox** — enables sandboxed execution. `autoAllowBashIfSandboxed` avoids permission prompts for shell commands when sandboxing is on. + +See the [Claude Code documentation](https://docs.anthropic.com/en/docs/claude-code) for all available settings. + +## Claude flags + +Additional Claude CLI flags can be passed after the prompt file: + +```sh +lorah run prompt.md --settings .lorah/settings.json --model claude-opus-4-6 --max-turns 50 +``` + +Flags are passed through to the `claude` CLI unchanged. Common flags: + +- `--settings <file>` — path to settings file +- `--model <model>` — override the model (takes precedence over settings.json) +- `--max-turns <n>` — limit the number of agent turns per iteration +- `--allowedTools <tools>` — restrict which tools the agent can use diff --git a/docs/guide/prompts.md b/docs/guide/prompts.md deleted file mode 100644 index 71287ea..0000000 --- a/docs/guide/prompts.md +++ /dev/null @@ -1,3 +0,0 @@ -# Guide: Writing Prompts - -TODO From 40a8a76cd8974fb9a032b331c3f1dea69a15775c Mon Sep 17 00:00:00 2001 From: Christopher Plain <me@christopherplain.com> Date: Mon, 30 Mar 2026 20:39:39 -0700 Subject: [PATCH 09/18] docs: simplify AGENTS.md to defer to CLAUDE.md Replace project-specific linting commands and comment flagging with a single directive to surface non-obvious discoveries via CLAUDE.md. --- AGENTS.md | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/AGENTS.md b/AGENTS.md index 6eef817..2f37e07 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -1,4 +1 @@ -If you encounter something surprising or confusing in this project, flag it as a comment. - -- `make fmt` -- `make lint` +If you discover something non-obvious about this project, ask if it should be noted in CLAUDE.md. From df30faae562241ab2722d3c5ebf93ab043f80fd2 Mon Sep 17 00:00:00 2001 From: Christopher Plain <me@christopherplain.com> Date: Mon, 30 Mar 2026 21:21:06 -0700 Subject: [PATCH 10/18] docs(guide): add prompt, task, and plan templates with router pattern Split configuration guide into focused pages: prompts.md for router and phase prompt templates, tasks.md for task file format. Add plan file template to configuration.md. Align "scope document" terminology in workflow.md to "plan file" for consistency across guides. --- docs/guide/README.md | 2 + docs/guide/configuration.md | 47 +++++++++----- docs/guide/prompts.md | 120 ++++++++++++++++++++++++++++++++++++ docs/guide/tasks.md | 47 ++++++++++++++ docs/guide/workflow.md | 14 ++--- 5 files changed, 209 insertions(+), 21 deletions(-) create mode 100644 docs/guide/prompts.md create mode 100644 docs/guide/tasks.md diff --git a/docs/guide/README.md b/docs/guide/README.md index f0585d5..96e92ea 100644 --- a/docs/guide/README.md +++ b/docs/guide/README.md @@ -9,3 +9,5 @@ Practical guides for using Lorah effectively. These cover _how to use_ Lorah — | [workflow.md](workflow.md) | An incremental spec-driven development workflow pattern | | [design-specs.md](design-specs.md) | How to write design specs that agents can reliably execute | | [configuration.md](configuration.md) | Setting up a .lorah project directory | +| [prompts.md](prompts.md) | Router and phase prompt templates for the agent loop | +| [tasks.md](tasks.md) | Task file format and status lifecycle | diff --git a/docs/guide/configuration.md b/docs/guide/configuration.md index 1c6b778..4a3173d 100644 --- a/docs/guide/configuration.md +++ b/docs/guide/configuration.md @@ -4,9 +4,16 @@ A `.lorah` directory contains the files that define a unit of work for the agent ``` .lorah/ -├── plan.md # scope and acceptance criteria -├── prompt.md # agent instructions for each iteration -└── settings.json # Claude Code CLI settings +├── plan.md # scope and acceptance criteria +├── prompt.md # orient + route to phase prompt +├── prompts/ +│ ├── plan.md # task selection +│ ├── test.md # write tests for selected task +│ └── implement.md # make tests pass +├── tasks/ +│ ├── 01-<task>.md # one file per task +│ └── ... +└── settings.json # Claude Code CLI settings ``` ## Plan file @@ -21,20 +28,30 @@ A plan file contains: A plan file does not contain individual tasks. Task selection happens inside the loop, where each agent picks the next task based on current state. -## Prompt file +```markdown +# <Project/Feature Name> -The prompt file is a markdown file piped to Claude Code on each loop iteration. It defines the agent's role, workflow, and constraints. This is the primary lever for controlling agent behavior. +## Scope -A prompt file typically contains: +What is being built — brief description and list of capabilities. +Reference the design specs rather than duplicating them. -- **Role** — what the agent is and what it does in one sentence. -- **Workflow steps** — the sequence the agent follows each iteration: orient (check git history), select (pick next task), execute, verify, commit, exit. -- **Rules** — hard constraints the agent must follow (e.g., one task per invocation, strict TDD boundary). -- **Blocked workflow** — what to do when the agent encounters an issue it cannot resolve. +## Boundaries -The prompt does not need to be elaborate. Agents are capable of self-managing within clear constraints. Focus on boundaries and invariants rather than detailed instructions for every scenario. +- Constraints and invariants that apply across the work. -The prompt references the plan file for scope and the design specs for behavioral details. It should not duplicate either. +## Acceptance Criteria + +- [ ] Concrete, verifiable conditions. +``` + +## Prompt files + +The prompt structure splits into a router prompt and phase-specific prompts. This keeps each agent's context small and focused. See the [prompt files guide](prompts.md) for templates. + +## Task files + +Each task gets its own file in `.lorah/tasks/`. The planning agent creates one task file per iteration; the testing and implementation agents update it as they work. See the [task files guide](tasks.md) for the template and status values. ## Settings @@ -56,7 +73,7 @@ Common fields: "enabled": true, "autoAllowBashIfSandboxed": true }, - "includeCoAuthoredBy": false + "attribution": { "commit": "", "pr": "" } } ``` @@ -64,7 +81,7 @@ Common fields: - **permissions** — `bypassPermissions` is typical for autonomous loops where no human is approving each action. - **sandbox** — enables sandboxed execution. `autoAllowBashIfSandboxed` avoids permission prompts for shell commands when sandboxing is on. -See the [Claude Code documentation](https://docs.anthropic.com/en/docs/claude-code) for all available settings. +See the [Claude Code settings reference](https://code.claude.com/docs/en/settings) for all available settings. ## Claude flags @@ -80,3 +97,5 @@ Flags are passed through to the `claude` CLI unchanged. Common flags: - `--model <model>` — override the model (takes precedence over settings.json) - `--max-turns <n>` — limit the number of agent turns per iteration - `--allowedTools <tools>` — restrict which tools the agent can use + +See the [Claude Code CLI reference](https://code.claude.com/docs/en/cli-reference) for all available flags. diff --git a/docs/guide/prompts.md b/docs/guide/prompts.md new file mode 100644 index 0000000..b631bb1 --- /dev/null +++ b/docs/guide/prompts.md @@ -0,0 +1,120 @@ +# Guide: Prompt Files + +The prompt file is a markdown file piped to Claude Code on each loop iteration. Rather than a single monolithic prompt, the structure splits into a router prompt and phase-specific prompts. This keeps each agent's context small and focused. + +## Router prompt + +The main `prompt.md` orients the agent and routes it to the correct phase prompt. It is the only file piped to Claude Code — phase prompts are read by the agent during execution. + +```markdown +# <Role Title> + +You are a <role> for the <project> project. Your job is to complete +exactly one task per invocation. + +--- + +## Workflow + +1. **Orient** — Run `git log --oneline -10` to understand what was + done in prior iterations. + +2. **Route** — Scan `.lorah/tasks/` for task files. + - If no task file has `status: in_progress`, read and follow + `.lorah/prompts/plan.md`. + - If a task has `status: in_progress` and no tests exist for it, + read and follow `.lorah/prompts/test.md`. + - If a task has `status: in_progress` and tests exist, read and + follow `.lorah/prompts/implement.md`. + +3. **Exit** — Stop. Do not proceed to the next task. + +--- + +## Rules + +- One task per invocation: complete one task, commit, exit. +- Design specs are authoritative: `docs/design/` defines the target + behavior. +``` + +## Phase prompts + +Each phase prompt lives in `.lorah/prompts/` and defines the workflow for a single phase. The agent reads exactly one per iteration. + +**`prompts/plan.md`** — Select the next task from the design specs and current state, then create a task file. + +```markdown +# Planning Phase + +## Workflow + +1. Read `.lorah/plan.md` for scope and acceptance criteria. +2. Read the design specs in `docs/design/` for behavioral details. +3. Review git history and completed tasks in `.lorah/tasks/` to + understand what has been built. +4. Identify the single next task — the smallest unit of work that + moves toward acceptance criteria. +5. Create a new task file in `.lorah/tasks/` using the task file + format. Set status to `in_progress`. Add planning notes to the + Log. +6. Commit the new task file. +``` + +**`prompts/test.md`** — Write tests for the in-progress task. + +```markdown +# Testing Phase + +## Workflow + +1. Read the in-progress task file in `.lorah/tasks/`. +2. Read the relevant design spec section(s) referenced in the task. +3. Write tests that verify the behavior described in the task's + acceptance criteria. Do not write any production code. Add stubs + or interface definitions only if required to make tests + compilable. +4. Verify: run the test suite. Failures are expected (no + implementation yet), but panics and compilation errors must be + fixed. +5. Update the task file's Testing log with files created and edge + cases covered. +6. Commit. + +## Blocked workflow + +If the design spec is ambiguous or contradicts the task file, add a +note to the task file explaining the issue, set status to `blocked`, +and exit without committing test code. +``` + +**`prompts/implement.md`** — Make the tests pass. + +```markdown +# Implementation Phase + +## Workflow + +1. Read the in-progress task file in `.lorah/tasks/`. +2. Read the tests written in the testing phase. +3. Write production code to make the tests pass. Do not write new + tests. +4. Verify: run the full test suite. All tests must pass. +5. Update the task file: set status to `completed`, add + implementation notes to the Log. +6. Commit. + +## Blocked workflow + +If the existing tests conflict with the design spec: + +1. Discard uncommitted changes. +2. Set the task status to `blocked` with notes explaining the + conflict. +3. Exit without committing. + +The next iteration will route back to the testing phase to fix the +tests. +``` + +The prompts above are starting points. Adapt the role, rules, and workflow steps to match your project. Focus on boundaries and invariants rather than detailed instructions for every scenario. diff --git a/docs/guide/tasks.md b/docs/guide/tasks.md new file mode 100644 index 0000000..b2f1395 --- /dev/null +++ b/docs/guide/tasks.md @@ -0,0 +1,47 @@ +# Guide: Task Files + +Each task gets its own file in `.lorah/tasks/`. The planning agent creates one task file per iteration; the testing and implementation agents update it as they work. This keeps context small — an agent only reads the task it is working on. + +Task files use sequential numbering as a prefix (e.g., `01-parse-cli-args.md`) to preserve ordering. + +```markdown +--- +status: pending +--- + +# Task: <title> + +## Behavior + +What this task implements — reference the relevant spec section(s). + +## Acceptance Criteria + +- Concrete, testable conditions. + +## Context + +Relevant files, prior task decisions, or anything the next agent +needs. + +## Log + +### Planning + +- ... + +### Testing + +- ... + +### Implementation + +- ... +``` + +## Status values + +- `pending` — created but not yet started. +- `in_progress` — actively being worked by the current or most recent iteration. +- `completed` — done. Tests pass, code is committed. +- `blocked` — cannot proceed. See notes in Log for details. diff --git a/docs/guide/workflow.md b/docs/guide/workflow.md index 26253a9..a7dc81e 100644 --- a/docs/guide/workflow.md +++ b/docs/guide/workflow.md @@ -17,16 +17,16 @@ Scope the work └─ Repeat until done ``` -Each step is handled by a fresh agent. Agents maintain continuity through git history and a living scope document — not shared memory. +Each step is handled by a fresh agent. Agents maintain continuity through git history and the plan file — not shared memory. ## Phase 1: Scope the work -Before the loop begins, an agent reviews the design specs and produces a scope document. This is not a full task breakdown. It defines: +Before the loop begins, an agent reviews the design specs and produces a plan file. This is not a full task breakdown. It defines: - **What is being built** — the boundaries of this unit of work. - **What done looks like** — concrete, verifiable acceptance criteria. -The scope document is the contract between the human and the agent loop. It should be specific enough that an agent can determine whether the work is complete by checking git state and test results. Avoid subjective criteria. +The plan file is the contract between the human and the agent loop. It should be specific enough that an agent can determine whether the work is complete by checking git state and test results. Avoid subjective criteria. This step runs once. The loop handles everything else. @@ -35,7 +35,7 @@ This step runs once. The loop handles everything else. Each iteration begins with an agent reviewing: - The design specs (authoritative source of truth). -- The scope document (boundaries and definition of done). +- The plan file (boundaries and definition of done). - Current git state (what has already been built). Based on this, the agent identifies and documents the single next task to work on. It does not plan beyond the immediate next step. @@ -58,15 +58,15 @@ Test quality is the bottleneck of the entire workflow. If the tests are shallow An agent writes code to pass the tests. It can see the tests, the specs, and the full git history. When the tests pass, it exits. -If the implementation agent encounters an issue — an ambiguous spec, a flawed test, or a dependency it cannot resolve — it documents the issue in the scope document before exiting. The next task selection agent picks this up. +If the implementation agent encounters an issue — an ambiguous spec, a flawed test, or a dependency it cannot resolve — it documents the issue in the plan file before exiting. The next task selection agent picks this up. ## Loop -Return to Phase 2. The task selection agent checks the scope document's definition of done against current state. If all acceptance criteria are met, the work is complete and the loop ends. +Return to Phase 2. The task selection agent checks the plan file's definition of done against current state. If all acceptance criteria are met, the work is complete and the loop ends. ## Key properties -**Agent isolation with continuity.** Each agent starts fresh, but git history and the scope document provide full context. This prevents context pollution while maintaining coherence across iterations. +**Agent isolation with continuity.** Each agent starts fresh, but git history and the plan file provide full context. This prevents context pollution while maintaining coherence across iterations. **Tests as contract.** Tests are the handoff mechanism between agents. They encode the spec as verifiable assertions, removing ambiguity about what "done" means for each task. From 75c4834b70d7e7e9ad2edffa815a536c6f893803 Mon Sep 17 00:00:00 2001 From: Christopher Plain <me@christopherplain.com> Date: Mon, 30 Mar 2026 21:45:35 -0700 Subject: [PATCH 11/18] docs(guide): fix router prompt and add blocked task handling Remove role-playing framing from router template. Add routing for blocked tasks to plan phase, where they are revised before new work is selected. Add design spec reading to implement phase workflow for consistency with its blocked workflow. --- docs/guide/prompts.md | 34 +++++++++++++++++++--------------- 1 file changed, 19 insertions(+), 15 deletions(-) diff --git a/docs/guide/prompts.md b/docs/guide/prompts.md index b631bb1..26c8f4d 100644 --- a/docs/guide/prompts.md +++ b/docs/guide/prompts.md @@ -7,10 +7,9 @@ The prompt file is a markdown file piped to Claude Code on each loop iteration. The main `prompt.md` orients the agent and routes it to the correct phase prompt. It is the only file piped to Claude Code — phase prompts are read by the agent during execution. ```markdown -# <Role Title> +# <Project Name> -You are a <role> for the <project> project. Your job is to complete -exactly one task per invocation. +Complete exactly one task per invocation. --- @@ -20,12 +19,13 @@ exactly one task per invocation. done in prior iterations. 2. **Route** — Scan `.lorah/tasks/` for task files. - - If no task file has `status: in_progress`, read and follow - `.lorah/prompts/plan.md`. + - If a task has `status: blocked`, read its Log to understand the + issue, then read and follow `.lorah/prompts/plan.md`. - If a task has `status: in_progress` and no tests exist for it, read and follow `.lorah/prompts/test.md`. - If a task has `status: in_progress` and tests exist, read and follow `.lorah/prompts/implement.md`. + - Otherwise, read and follow `.lorah/prompts/plan.md`. 3. **Exit** — Stop. Do not proceed to the next task. @@ -53,12 +53,15 @@ Each phase prompt lives in `.lorah/prompts/` and defines the workflow for a sing 2. Read the design specs in `docs/design/` for behavioral details. 3. Review git history and completed tasks in `.lorah/tasks/` to understand what has been built. -4. Identify the single next task — the smallest unit of work that +4. Check for a blocked task in `.lorah/tasks/`. If one exists, read + its Log and revise the task to address the issue. Set status to + `in_progress`, add notes to the Log, and skip to step 7. +5. Identify the single next task — the smallest unit of work that moves toward acceptance criteria. -5. Create a new task file in `.lorah/tasks/` using the task file +6. Create a new task file in `.lorah/tasks/` using the task file format. Set status to `in_progress`. Add planning notes to the Log. -6. Commit the new task file. +7. Commit. ``` **`prompts/test.md`** — Write tests for the in-progress task. @@ -97,12 +100,13 @@ and exit without committing test code. 1. Read the in-progress task file in `.lorah/tasks/`. 2. Read the tests written in the testing phase. -3. Write production code to make the tests pass. Do not write new +3. Read the relevant design spec section(s) referenced in the task. +4. Write production code to make the tests pass. Do not write new tests. -4. Verify: run the full test suite. All tests must pass. -5. Update the task file: set status to `completed`, add +5. Verify: run the full test suite. All tests must pass. +6. Update the task file: set status to `completed`, add implementation notes to the Log. -6. Commit. +7. Commit. ## Blocked workflow @@ -113,8 +117,8 @@ If the existing tests conflict with the design spec: conflict. 3. Exit without committing. -The next iteration will route back to the testing phase to fix the -tests. +The next iteration will route to the planning phase to reassess the +task. ``` -The prompts above are starting points. Adapt the role, rules, and workflow steps to match your project. Focus on boundaries and invariants rather than detailed instructions for every scenario. +The prompts above are starting points. Adapt the rules and workflow steps to match your project. Focus on boundaries and invariants rather than detailed instructions for every scenario. From 1df5d7c5b3e46584e1206db1ba77b9e98837305e Mon Sep 17 00:00:00 2001 From: Christopher Plain <me@christopherplain.com> Date: Mon, 30 Mar 2026 22:02:04 -0700 Subject: [PATCH 12/18] docs(guide): fix cross-file inconsistencies - Fix workflow.md to use task file (not plan file) for blocked issues - Add blocked task recovery to Phase 2 description - Remove unused `pending` status from tasks.md - Add attribution field explanation to configuration.md - Note that .lorah/ should be committed to git - Add cross-reference from design-specs.md to workflow.md --- docs/guide/configuration.md | 3 ++- docs/guide/design-specs.md | 2 +- docs/guide/tasks.md | 3 +-- docs/guide/workflow.md | 4 ++-- 4 files changed, 6 insertions(+), 6 deletions(-) diff --git a/docs/guide/configuration.md b/docs/guide/configuration.md index 4a3173d..3043c79 100644 --- a/docs/guide/configuration.md +++ b/docs/guide/configuration.md @@ -1,6 +1,6 @@ # Guide: Configuration -A `.lorah` directory contains the files that define a unit of work for the agent loop. It sits at the project root by default (overridable with `--dir`). +A `.lorah` directory contains the files that define a unit of work for the agent loop. It sits at the project root by default (overridable with `--dir`). Commit `.lorah/` to git — the workflow depends on git history for state and continuity, and task files are committed as part of the loop. ``` .lorah/ @@ -80,6 +80,7 @@ Common fields: - **model** — which Claude model to use. - **permissions** — `bypassPermissions` is typical for autonomous loops where no human is approving each action. - **sandbox** — enables sandboxed execution. `autoAllowBashIfSandboxed` avoids permission prompts for shell commands when sandboxing is on. +- **attribution** — text added to commit messages (as git trailers) and PR descriptions. Empty strings disable attribution; omitting the field uses Claude Code's defaults. See the [Claude Code settings reference](https://code.claude.com/docs/en/settings) for all available settings. diff --git a/docs/guide/design-specs.md b/docs/guide/design-specs.md index 00f9ae3..420469e 100644 --- a/docs/guide/design-specs.md +++ b/docs/guide/design-specs.md @@ -1,6 +1,6 @@ # Guide: Writing Design Specs -A design spec is a behavioral contract — it defines what the system does, not how to code it or how to use it. Specs are the single source of truth for their domain. If the spec and the code disagree, one of them has a bug. Specs are stable during execution — modifying a spec mid-loop invalidates the scope document, existing tests, and completed work derived from it. Changes happen between units of work, not during them. Specs are not tutorials, READMEs, API reference docs, or implementation plans. +A design spec is a behavioral contract — it defines what the system does, not how to code it or how to use it. Specs are the foundation of the [incremental spec-driven workflow](workflow.md) and the single source of truth for their domain. If the spec and the code disagree, one of them has a bug. Specs are stable during execution — modifying a spec mid-loop invalidates the scope document, existing tests, and completed work derived from it. Changes happen between units of work, not during them. Specs are not tutorials, READMEs, API reference docs, or implementation plans. A spec emerges through iteration between an engineer and an agent. The engineer holds the intent; the agent drives the structure. The properties below guide each pass — they are not a checklist to complete once. How the engineer-agent pair reaches a spec that meets these properties is up to them. diff --git a/docs/guide/tasks.md b/docs/guide/tasks.md index b2f1395..d539a80 100644 --- a/docs/guide/tasks.md +++ b/docs/guide/tasks.md @@ -6,7 +6,7 @@ Task files use sequential numbering as a prefix (e.g., `01-parse-cli-args.md`) t ```markdown --- -status: pending +status: in_progress --- # Task: <title> @@ -41,7 +41,6 @@ needs. ## Status values -- `pending` — created but not yet started. - `in_progress` — actively being worked by the current or most recent iteration. - `completed` — done. Tests pass, code is committed. - `blocked` — cannot proceed. See notes in Log for details. diff --git a/docs/guide/workflow.md b/docs/guide/workflow.md index a7dc81e..c15a879 100644 --- a/docs/guide/workflow.md +++ b/docs/guide/workflow.md @@ -38,7 +38,7 @@ Each iteration begins with an agent reviewing: - The plan file (boundaries and definition of done). - Current git state (what has already been built). -Based on this, the agent identifies and documents the single next task to work on. It does not plan beyond the immediate next step. +Based on this, the agent identifies and documents the single next task to work on. It does not plan beyond the immediate next step. If a prior task was marked `blocked`, the planning agent reassesses it first — revising the task to address the issue before moving on. This is where the workflow diverges from upfront planning. Instead of decomposing all work at the start, each task is chosen with full knowledge of what exists now. This means: @@ -58,7 +58,7 @@ Test quality is the bottleneck of the entire workflow. If the tests are shallow An agent writes code to pass the tests. It can see the tests, the specs, and the full git history. When the tests pass, it exits. -If the implementation agent encounters an issue — an ambiguous spec, a flawed test, or a dependency it cannot resolve — it documents the issue in the plan file before exiting. The next task selection agent picks this up. +If the implementation agent encounters an issue — an ambiguous spec, a flawed test, or a dependency it cannot resolve — it sets the task status to `blocked` with notes in the task's Log before exiting. The next iteration routes to the planning phase to reassess. ## Loop From 6c7e0358220fc98510b95710b7646eb368e21814 Mon Sep 17 00:00:00 2001 From: Christopher Plain <me@christopherplain.com> Date: Tue, 31 Mar 2026 20:03:14 -0700 Subject: [PATCH 13/18] docs(guide): clarify ambiguities and missing context across guides - Distinguish --dir as a Lorah flag, not a Claude CLI flag - Introduce the lorah CLI before its first usage in examples - Break dense opening paragraph in design-specs into three - Generalize glossary location reference - Add blocked task handling section to tasks.md - Make single-active-task invariant explicit in router prompt --- docs/guide/configuration.md | 4 +++- docs/guide/design-specs.md | 8 ++++++-- docs/guide/prompts.md | 3 ++- docs/guide/tasks.md | 4 ++++ 4 files changed, 15 insertions(+), 4 deletions(-) diff --git a/docs/guide/configuration.md b/docs/guide/configuration.md index 3043c79..718cf74 100644 --- a/docs/guide/configuration.md +++ b/docs/guide/configuration.md @@ -1,6 +1,6 @@ # Guide: Configuration -A `.lorah` directory contains the files that define a unit of work for the agent loop. It sits at the project root by default (overridable with `--dir`). Commit `.lorah/` to git — the workflow depends on git history for state and continuity, and task files are committed as part of the loop. +A `.lorah` directory contains the files that define a unit of work for the agent loop. It sits at the project root by default (overridable with Lorah's `--dir` flag). Commit `.lorah/` to git — the workflow depends on git history for state and continuity, and task files are committed as part of the loop. ``` .lorah/ @@ -16,6 +16,8 @@ A `.lorah` directory contains the files that define a unit of work for the agent └── settings.json # Claude Code CLI settings ``` +The `lorah` CLI wraps Claude Code to run the agent loop. It accepts a prompt file and forwards additional flags to `claude`. + ## Plan file The plan file is the output of the scoping step — [Phase 1](workflow.md#phase-1-scope-the-work) in the spec-driven workflow. It defines what is being built and what done looks like. The agent loop uses it as the contract between the human and the agents. diff --git a/docs/guide/design-specs.md b/docs/guide/design-specs.md index 420469e..0d47c8a 100644 --- a/docs/guide/design-specs.md +++ b/docs/guide/design-specs.md @@ -1,6 +1,10 @@ # Guide: Writing Design Specs -A design spec is a behavioral contract — it defines what the system does, not how to code it or how to use it. Specs are the foundation of the [incremental spec-driven workflow](workflow.md) and the single source of truth for their domain. If the spec and the code disagree, one of them has a bug. Specs are stable during execution — modifying a spec mid-loop invalidates the scope document, existing tests, and completed work derived from it. Changes happen between units of work, not during them. Specs are not tutorials, READMEs, API reference docs, or implementation plans. +A design spec is a behavioral contract — it defines what the system does, not how to code it or how to use it. Specs are the foundation of the [incremental spec-driven workflow](workflow.md) and the single source of truth for their domain. If the spec and the code disagree, one of them has a bug. + +Specs are stable during execution — modifying a spec mid-loop invalidates the scope document, existing tests, and completed work derived from it. Changes happen between units of work, not during them. + +Specs are not tutorials, READMEs, API reference docs, or implementation plans. A spec emerges through iteration between an engineer and an agent. The engineer holds the intent; the agent drives the structure. The properties below guide each pass — they are not a checklist to complete once. How the engineer-agent pair reaches a spec that meets these properties is up to them. @@ -108,7 +112,7 @@ Record why, not just what. Apply to non-obvious constraints and rejected alterna ### Defined vocabulary -Define terms once in `docs/design/README.md` and use them consistently. Agents treat synonyms as distinct concepts. +Define terms once in a central glossary and use them consistently. Agents treat synonyms as distinct concepts. ### Scannable structure diff --git a/docs/guide/prompts.md b/docs/guide/prompts.md index 26c8f4d..4f96793 100644 --- a/docs/guide/prompts.md +++ b/docs/guide/prompts.md @@ -18,7 +18,8 @@ Complete exactly one task per invocation. 1. **Orient** — Run `git log --oneline -10` to understand what was done in prior iterations. -2. **Route** — Scan `.lorah/tasks/` for task files. +2. **Route** — Scan `.lorah/tasks/` for task files. At most one + task is non-completed at any time. - If a task has `status: blocked`, read its Log to understand the issue, then read and follow `.lorah/prompts/plan.md`. - If a task has `status: in_progress` and no tests exist for it, diff --git a/docs/guide/tasks.md b/docs/guide/tasks.md index d539a80..2ddb0e8 100644 --- a/docs/guide/tasks.md +++ b/docs/guide/tasks.md @@ -44,3 +44,7 @@ needs. - `in_progress` — actively being worked by the current or most recent iteration. - `completed` — done. Tests pass, code is committed. - `blocked` — cannot proceed. See notes in Log for details. + +## Blocked task handling + +When the planning agent encounters a blocked task, it revises the existing task file — updating the Behavior, Acceptance Criteria, or Context as needed to address the issue noted in the Log. It sets status back to `in_progress` and adds notes to the Planning log explaining the revision. No new task file is created. From 651fe786cfce2b338f2075fcb42c1b39cde4af12 Mon Sep 17 00:00:00 2001 From: Christopher Plain <me@christopherplain.com> Date: Tue, 31 Mar 2026 20:37:09 -0700 Subject: [PATCH 14/18] docs(guide): add glossary location, task invariant, and plan constraints - Specify glossary belongs in a shared file or specs README - State single-active-task invariant in tasks.md - Add constraints to plan file description in workflow.md --- docs/guide/design-specs.md | 2 +- docs/guide/tasks.md | 2 +- docs/guide/workflow.md | 1 + 3 files changed, 3 insertions(+), 2 deletions(-) diff --git a/docs/guide/design-specs.md b/docs/guide/design-specs.md index 0d47c8a..4f8fa0c 100644 --- a/docs/guide/design-specs.md +++ b/docs/guide/design-specs.md @@ -112,7 +112,7 @@ Record why, not just what. Apply to non-obvious constraints and rejected alterna ### Defined vocabulary -Define terms once in a central glossary and use them consistently. Agents treat synonyms as distinct concepts. +Define terms once in a central glossary — a shared file (e.g., `glossary.md`) or a section in the specs directory README — and use them consistently. Agents treat synonyms as distinct concepts. ### Scannable structure diff --git a/docs/guide/tasks.md b/docs/guide/tasks.md index 2ddb0e8..4b2d63d 100644 --- a/docs/guide/tasks.md +++ b/docs/guide/tasks.md @@ -1,6 +1,6 @@ # Guide: Task Files -Each task gets its own file in `.lorah/tasks/`. The planning agent creates one task file per iteration; the testing and implementation agents update it as they work. This keeps context small — an agent only reads the task it is working on. +Each task gets its own file in `.lorah/tasks/`. The planning agent creates one task file per iteration; the testing and implementation agents update it as they work. This keeps context small — an agent only reads the task it is working on. At most one task is non-completed at any time. Task files use sequential numbering as a prefix (e.g., `01-parse-cli-args.md`) to preserve ordering. diff --git a/docs/guide/workflow.md b/docs/guide/workflow.md index c15a879..9965fdc 100644 --- a/docs/guide/workflow.md +++ b/docs/guide/workflow.md @@ -24,6 +24,7 @@ Each step is handled by a fresh agent. Agents maintain continuity through git hi Before the loop begins, an agent reviews the design specs and produces a plan file. This is not a full task breakdown. It defines: - **What is being built** — the boundaries of this unit of work. +- **Constraints** — invariants and boundaries that apply across the work. - **What done looks like** — concrete, verifiable acceptance criteria. The plan file is the contract between the human and the agent loop. It should be specific enough that an agent can determine whether the work is complete by checking git state and test results. Avoid subjective criteria. From 7bb89f8b344757cefdefa23d370bba14d59ec031 Mon Sep 17 00:00:00 2001 From: Christopher Plain <me@christopherplain.com> Date: Tue, 31 Mar 2026 20:45:45 -0700 Subject: [PATCH 15/18] docs(guide): add plan termination step and fix inconsistencies Add completion check to plan prompt template so the agent exits when all acceptance criteria are met. Fix "scope document" to "plan file", add task files to continuity claim, and clarify task log section reference. --- docs/guide/design-specs.md | 2 +- docs/guide/prompts.md | 15 +++++++++------ docs/guide/workflow.md | 2 +- 3 files changed, 11 insertions(+), 8 deletions(-) diff --git a/docs/guide/design-specs.md b/docs/guide/design-specs.md index 4f8fa0c..c8d95fe 100644 --- a/docs/guide/design-specs.md +++ b/docs/guide/design-specs.md @@ -2,7 +2,7 @@ A design spec is a behavioral contract — it defines what the system does, not how to code it or how to use it. Specs are the foundation of the [incremental spec-driven workflow](workflow.md) and the single source of truth for their domain. If the spec and the code disagree, one of them has a bug. -Specs are stable during execution — modifying a spec mid-loop invalidates the scope document, existing tests, and completed work derived from it. Changes happen between units of work, not during them. +Specs are stable during execution — modifying a spec mid-loop invalidates the plan file, existing tests, and completed work derived from it. Changes happen between units of work, not during them. Specs are not tutorials, READMEs, API reference docs, or implementation plans. diff --git a/docs/guide/prompts.md b/docs/guide/prompts.md index 4f96793..16f236d 100644 --- a/docs/guide/prompts.md +++ b/docs/guide/prompts.md @@ -56,13 +56,16 @@ Each phase prompt lives in `.lorah/prompts/` and defines the workflow for a sing understand what has been built. 4. Check for a blocked task in `.lorah/tasks/`. If one exists, read its Log and revise the task to address the issue. Set status to - `in_progress`, add notes to the Log, and skip to step 7. -5. Identify the single next task — the smallest unit of work that + `in_progress`, add notes to the Log, and skip to step 8. +5. Check the plan file's acceptance criteria against current git + state and test results. If all criteria are met, exit — the work + is complete. +6. Identify the single next task — the smallest unit of work that moves toward acceptance criteria. -6. Create a new task file in `.lorah/tasks/` using the task file +7. Create a new task file in `.lorah/tasks/` using the task file format. Set status to `in_progress`. Add planning notes to the Log. -7. Commit. +8. Commit. ``` **`prompts/test.md`** — Write tests for the in-progress task. @@ -81,8 +84,8 @@ Each phase prompt lives in `.lorah/prompts/` and defines the workflow for a sing 4. Verify: run the test suite. Failures are expected (no implementation yet), but panics and compilation errors must be fixed. -5. Update the task file's Testing log with files created and edge - cases covered. +5. Update the Testing section of the task file's Log with files + created and edge cases covered. 6. Commit. ## Blocked workflow diff --git a/docs/guide/workflow.md b/docs/guide/workflow.md index 9965fdc..06193f3 100644 --- a/docs/guide/workflow.md +++ b/docs/guide/workflow.md @@ -17,7 +17,7 @@ Scope the work └─ Repeat until done ``` -Each step is handled by a fresh agent. Agents maintain continuity through git history and the plan file — not shared memory. +Each step is handled by a fresh agent. Agents maintain continuity through git history, the plan file, and task files — not shared memory. ## Phase 1: Scope the work From 8bb3eb200142f31c1b292f0633ea98da49bd5339 Mon Sep 17 00:00:00 2001 From: Christopher Plain <me@christopherplain.com> Date: Tue, 31 Mar 2026 21:10:37 -0700 Subject: [PATCH 16/18] docs(guide): use consistent h1 prefix in workflow guide --- docs/guide/workflow.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/guide/workflow.md b/docs/guide/workflow.md index 06193f3..7201939 100644 --- a/docs/guide/workflow.md +++ b/docs/guide/workflow.md @@ -1,4 +1,4 @@ -# Workflow: Incremental Spec-Driven Development +# Guide: Incremental Spec-Driven Development Lorah provides the loop. How you structure the work inside that loop is up to you. There are many valid approaches — this document presents one pattern that works well for spec-driven development. From 2d8edb83487532b41f7c651454adf83aef2b8863 Mon Sep 17 00:00:00 2001 From: Christopher Plain <me@christopherplain.com> Date: Tue, 31 Mar 2026 21:22:37 -0700 Subject: [PATCH 17/18] docs(guide): relabel workflow sections to distinguish setup from loop MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sequential "Phase 1-4" numbering implied linear execution, but only scoping runs once — the other three steps repeat. Renamed to "Setup" and "Loop step 1-3" so headings self-document the flow. Folded the standalone Loop section's termination condition into Loop step 1. --- docs/guide/configuration.md | 2 +- docs/guide/workflow.md | 14 +++++--------- 2 files changed, 6 insertions(+), 10 deletions(-) diff --git a/docs/guide/configuration.md b/docs/guide/configuration.md index 718cf74..4d1dc26 100644 --- a/docs/guide/configuration.md +++ b/docs/guide/configuration.md @@ -20,7 +20,7 @@ The `lorah` CLI wraps Claude Code to run the agent loop. It accepts a prompt fil ## Plan file -The plan file is the output of the scoping step — [Phase 1](workflow.md#phase-1-scope-the-work) in the spec-driven workflow. It defines what is being built and what done looks like. The agent loop uses it as the contract between the human and the agents. +The plan file is the output of the scoping step — [Setup](workflow.md#setup-scope-the-work) in the spec-driven workflow. It defines what is being built and what done looks like. The agent loop uses it as the contract between the human and the agents. A plan file contains: diff --git a/docs/guide/workflow.md b/docs/guide/workflow.md index 7201939..d7cb733 100644 --- a/docs/guide/workflow.md +++ b/docs/guide/workflow.md @@ -19,7 +19,7 @@ Scope the work Each step is handled by a fresh agent. Agents maintain continuity through git history, the plan file, and task files — not shared memory. -## Phase 1: Scope the work +## Setup: Scope the work Before the loop begins, an agent reviews the design specs and produces a plan file. This is not a full task breakdown. It defines: @@ -31,9 +31,9 @@ The plan file is the contract between the human and the agent loop. It should be This step runs once. The loop handles everything else. -## Phase 2: Select the next task +## Loop step 1: Select the next task -Each iteration begins with an agent reviewing: +Each iteration begins with an agent checking the plan file's definition of done against current state. If all acceptance criteria are met, the work is complete and the loop ends. Otherwise, the agent reviews: - The design specs (authoritative source of truth). - The plan file (boundaries and definition of done). @@ -49,22 +49,18 @@ This is where the workflow diverges from upfront planning. Instead of decomposin The quality of task selection depends on the quality of the design specs. If the specs clearly define boundaries and behavior, the agent has a deterministic contract to work against. Ambiguity in specs propagates into ambiguity in task selection. -## Phase 3: Write tests +## Loop step 2: Write tests An agent writes tests for the selected task based on the design spec. The spec defines the intended behavior; the tests encode it as a verifiable contract between this agent and the implementation agent that follows. A passing test suite means the task is complete. Test quality is the bottleneck of the entire workflow. If the tests are shallow or misinterpret the spec, the implementation agent will write code that passes bad tests. -## Phase 4: Implement +## Loop step 3: Implement An agent writes code to pass the tests. It can see the tests, the specs, and the full git history. When the tests pass, it exits. If the implementation agent encounters an issue — an ambiguous spec, a flawed test, or a dependency it cannot resolve — it sets the task status to `blocked` with notes in the task's Log before exiting. The next iteration routes to the planning phase to reassess. -## Loop - -Return to Phase 2. The task selection agent checks the plan file's definition of done against current state. If all acceptance criteria are met, the work is complete and the loop ends. - ## Key properties **Agent isolation with continuity.** Each agent starts fresh, but git history and the plan file provide full context. This prevents context pollution while maintaining coherence across iterations. From fad7932d20da9eee53bc2729424d510156d7606d Mon Sep 17 00:00:00 2001 From: Christopher Plain <me@christopherplain.com> Date: Fri, 3 Apr 2026 20:12:45 -0700 Subject: [PATCH 18/18] docs(examples): add latest lorah configuration example Include a complete .lorah scaffold with settings, main prompt, task template, and phase-specific workflow prompts (plan, test, implement). --- examples/latest/.lorah/prompt.md | 34 +++++++++++++++++++ examples/latest/.lorah/prompts/implement.md | 32 +++++++++++++++++ examples/latest/.lorah/prompts/plan.md | 28 +++++++++++++++ examples/latest/.lorah/prompts/test.md | 26 ++++++++++++++ examples/latest/.lorah/settings.json | 14 ++++++++ .../latest/.lorah/tasks/00-task-template.md | 33 ++++++++++++++++++ 6 files changed, 167 insertions(+) create mode 100644 examples/latest/.lorah/prompt.md create mode 100644 examples/latest/.lorah/prompts/implement.md create mode 100644 examples/latest/.lorah/prompts/plan.md create mode 100644 examples/latest/.lorah/prompts/test.md create mode 100644 examples/latest/.lorah/settings.json create mode 100644 examples/latest/.lorah/tasks/00-task-template.md diff --git a/examples/latest/.lorah/prompt.md b/examples/latest/.lorah/prompt.md new file mode 100644 index 0000000..4041b7a --- /dev/null +++ b/examples/latest/.lorah/prompt.md @@ -0,0 +1,34 @@ +# Planning API — Local Dev Scaffold + +Complete exactly one task per invocation. + +--- + +## Workflow + +1. **Orient** — Run `git log --oneline -10` to understand what was + done in prior iterations. + +2. **Route** — Scan `.lorah/tasks/` for task files. At most one + task is non-completed at any time. + - If a task has `status: blocked`, read its Log to understand the + issue, then read and follow `.lorah/prompts/plan.md`. + - Else if a task has `status: test`, read and follow + `.lorah/prompts/test.md`. + - Else if a task has `status: implement`, read and follow + `.lorah/prompts/implement.md`. + - Else, read and follow `.lorah/prompts/plan.md`. + +3. **Exit** — Stop. Do not proceed to the next task. + +--- + +## Rules + +- One task per invocation: complete one task, commit, exit. +- Design specs are authoritative: `docs/design/` defines the target + behavior. +- Task files use incrementing numeric prefix with kebab-case name + (e.g., `01-docker-compose.md`, `02-knexfile.md`). +- Each invocation must start in a clean git state. If uncommitted + changes exist, discard them before proceeding. diff --git a/examples/latest/.lorah/prompts/implement.md b/examples/latest/.lorah/prompts/implement.md new file mode 100644 index 0000000..52122ba --- /dev/null +++ b/examples/latest/.lorah/prompts/implement.md @@ -0,0 +1,32 @@ +# Implementation Phase + +## Workflow + +1. Read `.lorah/plan.md` for scope and acceptance criteria. +2. Read the relevant design specs indexed in `docs/design/README.md` + for behavioral details. +3. Review git history and current task file in `.lorah/tasks/`. +4. If tests were written in the testing phase, read them. Otherwise + skip to step 6. +5. Read the relevant design spec section(s) referenced in the task. +6. Write production code to satisfy the acceptance criteria (and make + tests pass, if they exist). Do not write new tests. +7. Verify: if tests exist, run the full test suite — all tests must + pass. Otherwise, verify acceptance criteria directly (e.g., run + commands, check file contents). +8. Update the task file: set status to `completed`, add + implementation notes to the Log. +9. If acceptance criteria has been met, update `.lorah/plan.md`. +10. Commit. + +## Blocked workflow + +If the existing tests conflict with the design spec: + +1. Discard uncommitted changes. +2. Set the task status to `blocked` with notes explaining the + conflict. +3. Exit without committing. + +The next iteration will route to the planning phase to reassess the +task. diff --git a/examples/latest/.lorah/prompts/plan.md b/examples/latest/.lorah/prompts/plan.md new file mode 100644 index 0000000..681f992 --- /dev/null +++ b/examples/latest/.lorah/prompts/plan.md @@ -0,0 +1,28 @@ +# Planning Phase + +## Workflow + +1. Read `.lorah/plan.md` for scope and acceptance criteria. +2. Read the relevant design specs indexed in `docs/design/README.md` + for behavioral details. +3. Review git history and completed tasks in `.lorah/tasks/` to + understand what has been built. +4. Check for a blocked task in `.lorah/tasks/`. If one exists, read + its Log and revise the task to address the issue. Set status to + `test` or `implement` (same criteria as step 7), add notes to the + Log, and skip to step 8. +5. Check the plan file's acceptance criteria against current git + state and test results. If all criteria are met, exit — the work + is complete. +6. Identify the single next task — the smallest unit of work that + moves toward acceptance criteria. +7. Create a new task file in `.lorah/tasks/` using the task file + format. Set the task status based on whether it has testable + behavior: + - `test` — the task implements logic, endpoints, or behavior that + benefits from test-first development. + - `implement` — the task is pure configuration, scaffolding, or + boilerplate with no behavioral logic to test. Note the rationale + in the task file's Log > Planning section. + Add planning notes to the Log. +8. Commit. diff --git a/examples/latest/.lorah/prompts/test.md b/examples/latest/.lorah/prompts/test.md new file mode 100644 index 0000000..0bb3416 --- /dev/null +++ b/examples/latest/.lorah/prompts/test.md @@ -0,0 +1,26 @@ +# Testing Phase + +## Workflow + +1. Read `.lorah/plan.md` for scope and acceptance criteria. +2. Read the relevant design specs indexed in `docs/design/README.md` + for behavioral details. +3. Read the current task file in `.lorah/tasks/`. +4. Read the relevant design spec section(s) referenced in the task. +5. Write tests that verify the behavior described in the task's + acceptance criteria. Do not write any production code. Add stubs + or interface definitions only if required to make tests + compilable. +6. Verify: run the test suite. Failures are expected (no + implementation yet), but panics and compilation errors must be + fixed. +7. Update the Testing section of the task file's Log with files + created and edge cases covered. +8. Update the task status from `test` to `implement`. +9. Commit. + +## Blocked workflow + +If the design spec is ambiguous or contradicts the task file, add a +note to the task file explaining the issue, set status to `blocked`, +and exit without committing test code. diff --git a/examples/latest/.lorah/settings.json b/examples/latest/.lorah/settings.json new file mode 100644 index 0000000..d988115 --- /dev/null +++ b/examples/latest/.lorah/settings.json @@ -0,0 +1,14 @@ +{ + "model": "opus", + "permissions": { + "defaultMode": "bypassPermissions" + }, + "sandbox": { + "enabled": true, + "autoAllowBashIfSandboxed": true, + "network": { + "allowedDomains": ["registry.npmjs.org"] + } + }, + "attribution": { "commit": "", "pr": "" } +} diff --git a/examples/latest/.lorah/tasks/00-task-template.md b/examples/latest/.lorah/tasks/00-task-template.md new file mode 100644 index 0000000..cb9e703 --- /dev/null +++ b/examples/latest/.lorah/tasks/00-task-template.md @@ -0,0 +1,33 @@ +--- +status: test +--- + +<!-- Valid statuses: test | implement | blocked | completed --> + +# Task: <title> + +## Behavior + +What this task implements — reference the relevant spec section(s). + +## Acceptance Criteria + +- Concrete, testable conditions. + +## Context + +Relevant files, prior task decisions, or anything the next agent needs. + +## Log + +### Planning + +- ... + +### Testing + +- ... + +### Implementation + +- ...