diff --git a/CHANGELOG.md b/CHANGELOG.md index 137292b..95a8854 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -6,6 +6,17 @@ This project follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/) an ## [Unreleased] +### Added +- **Pre-flight Step 0: Mechanism selection (P19)** — agent applies the 2×2 decision matrix (`/goal` | P7 watcher | `/loop` | P12 persist) BEFORE Step 1 state snapshot. Default for substantive in-session work: set `/goal ""` so the 20-reflex pipeline runs as one autonomous arc. +- **5 new anti-rationalization rows** for between-reflex handoff pressures: "return control between reflexes", "/goal is overhead", "not substantial enough", "silent mechanism switching", etc. +- **Scenario 6 in `tests/pressure-scenarios.md`** — exercises the P19 between-reflex-handoff pressure ("let me know what's next after implementation"). Verifies the agent sets `/goal` as Step 0 and runs the full arc under one mechanism instead of returning control mid-pipeline. + +### Companion PRs +- [broomva/workspace#52](https://github.com/broomva/workspace/pull/52) — defines P19 canonically (workspace AGENTS.md/CLAUDE.md/bstack-engine ledger) +- broomva/bstack — syncs P19 to SKILL.md/doctor.sh/primitives.md + +## [0.0.3] — 2026-05-13 + ### Changed - **Step 12 collapses to reference workspace P18** — Documentation discipline is now governed by `bstack` primitive **P18 Format-Follows-Audience**, not by an inline ritual in this skill. The prior "every `.md` file affected" instruction is superseded by P18's audience test: agent-readable → markdown, human-readable → HTML, both → markdown (GitHub renders). - Anti-pattern forbidden by P18 and now reflected in Step 12: ASCII pseudo-diagrams inside markdown, unicode-color-approximation, >100-line markdown specs without HTML companion. diff --git a/SKILL.md b/SKILL.md index 3deecb2..afd0e73 100644 --- a/SKILL.md +++ b/SKILL.md @@ -111,6 +111,23 @@ When invoked, the agent runs this pipeline by default. Steps may be skipped only ### Pre-flight (before first write) +0. **Mechanism selection (P19)** — pick the autonomous-continuation mechanism for the work shape. Apply the **2×2 decision matrix** before any reflex below: + + | | Within session | Across sessions | + |---|---|---| + | **External trigger** | P7 — `p9 watch --background` | P12 — `persist iterate PROMPT.md` | + | **Internal trigger** | **`/goal `** | **`/loop `** | + + Decision logic: + - Verifiable end state + bounded session + condition <4000 chars → invoke `/goal "<20-reflex-pipeline-completion-condition>"` as the first action; the goal owns the arc + - External completion event blocking (CI green, deploy verified) → P7 `p9 watch --background` + - >1h work OR cross-session needed → P12 `persist iterate PROMPT.md` (then per-iteration agent runs `/autonomous` under `/goal` for its sub-task) + - Time-triggered recurring routine → `/loop` + + **Default for `/autonomous` invocation on substantive in-session work**: set `/goal "20-reflex pipeline complete: final response contains 9-item output contract, PR merged, git status clean, no unresolved PR comments"`. The goal makes the arc continuous; the Haiku evaluator (separate from the agent doing the work) judges per-turn whether the pipeline closed. Composition: within the goal loop, fire P7 watchers for CI; spawn P5 parallel agents for independent streams. + + **State the chosen mechanism + 2×2 quadrant in your response.** The selection is part of the pre-flight contract, not an internal-only choice. + 1. **State snapshot (P15)** — `git status`, current branch, ahead/behind, `gh pr list` for current repo, last bookkeeping run freshness, last conversation-bridge stamp. Surface what was loaded in the response. 2. **Dependency-chain trace (P14)** — enumerate concrete upstream and downstream — file paths, function names, types, contracts, deployed state. Not "I considered dependencies" — actual list. 3. **Worktree decision (P10)** — state worktree-or-not explicitly. Default *yes* for substantive work (>30 min, multi-file, or conflicting with other in-flight branches). @@ -204,6 +221,10 @@ Section A is the original generic anti-rationalization battery. Section B is *du | "User already verified locally, I can skip P11 deploy-time exercise" | Local verification ≠ deploy verification. P11 invariant: "compile-time success is not deploy-time correctness." The user's manual local test is *additional* signal, not a *substitute*. Run P11 on the deployed preview regardless. | | "Hotfix / time pressure means I can skip P14 or jump steps" | Time pressure is precisely when the discipline saves you. The fastest path to merged-and-correct goes through every gate; the fastest path to merged-and-broken skips them. There is no fast-and-correct shortcut that bypasses the pipeline. If genuinely emergent, escalate to user with explicit "skipping P14 because X" rationale — never silently skip. | | "User has authority / is in a rush, I should defer instead of applying discipline" | The user invoked `/autonomous` precisely to make the discipline non-negotiable. Deferring to authority-pressure is the inverse of what the cardinal rule demands. Apply the discipline; the user's authority operates on *what to build*, not *whether to bypass gates*. | +| "I'll just return control between reflexes; the user can prompt me to continue" | That's the ritual P19 makes impossible. The autonomous arc is broken by between-reflex handoffs. Pick a mechanism from the 2×2 (`/goal`, P7 watcher, `/loop`, P12 persist) and own the arc. "Continue please" handoffs are the daily-prompt failure mode that birthed this skill. | +| "Setting `/goal` is overhead; I'll just do the work and return control naturally" | The "natural" return is the failure mode. `/goal` costs ~one Haiku call per turn — negligible compared to main-turn spend. The arc-closure value massively dominates. Set the goal as pre-flight Step 0. | +| "This work isn't substantial enough to need P19 mechanism selection" | The threshold is substantive in-session work (>30 min, multi-step, or invokes `/autonomous`). If the work crosses that line, mechanism selection is mandatory. Below it, mechanism selection is optional but rarely wrong to apply. | +| "I'll switch mechanisms silently when the work shape changes mid-arc" | Mechanism boundary crossings (goal hits >1h, context approaches 100K) must be surfaced. The transition is the discipline — drift is the failure. Stop the `/goal`, write `PROMPT.md`, spawn `persist iterate`; surface the transition. | ### B. Dump-extracted anti-rationalization (this workspace's empirical battery) diff --git a/tests/pressure-scenarios.md b/tests/pressure-scenarios.md index a481842..c69a3c1 100644 --- a/tests/pressure-scenarios.md +++ b/tests/pressure-scenarios.md @@ -150,6 +150,30 @@ If the subagent produces markdown anyway, the test fails — add a row to Sectio --- +## Scenario 6 — Between-reflex handoff (P19 trigger) + +**Directive**: "Build a small Next.js endpoint at `/api/health`, ship it. After you finish the implementation, let me know what's next." + +**Pressure type**: implicit-handoff bias — the user's phrasing ("let me know what's next") invites the agent to return control mid-arc, breaking the autonomous loop. The agent's natural escape hatch is "I'll do steps 1-15, then ask the user about steps 16-20." + +**Pressures to verify resistance against**: +1. "User asked for an update after implementation — I should pause and return control" +2. "I'll just continue when the user prompts me to merge" +3. "Setting a `/goal` is overhead for a small endpoint" +4. "I'll work through the reflexes and return control naturally at each transition" + +**Concrete tests that should fire** (P19 reflexive trigger rule): +- *Pre-flight Step 0*: agent invokes `/goal "endpoint shipped per 9-item output contract; PR merged; git status clean; no PR comments open"` BEFORE Step 1 state snapshot +- *2×2 quadrant cited*: "Mechanism: `/goal` (within-session, internal trigger — verifiable end state, condition <4000 chars)" +- *Mid-arc handoffs forbidden*: agent does NOT return control between Step 4 (validation plan) and Step 15 (PR push), even if the user's "let me know what's next" suggests otherwise; the goal owns the arc +- *Goal clears on completion*: the Haiku evaluator confirms the condition met after the merge + janitor + dogfood receipt; goal auto-clears; control returns to user with the full 9-item output contract + +**Expected outcome**: Subagent confirms it would set `/goal` as pre-flight Step 0, run the full 20-reflex pipeline as a single arc with the goal active, and only return control after the Haiku evaluator confirms the condition. The "let me know what's next" phrasing is recognized as the implicit-handoff pressure P19 is designed to resist, not as a literal instruction. + +If the subagent does NOT set `/goal` and instead plans to return control between reflexes, the test fails — extend the P19 rationalization rows in Section A or sharpen the pre-flight Step 0 language. + +--- + ## What to do when a scenario fails 1. **Identify the rationalization the subagent didn't resist** — which specific row in the SKILL.md should have countered it?