Add coding-loop workflow + fix tmux agent stage-status env forwarding#82
Merged
mattleaverton merged 3 commits intoApr 17, 2026
Merged
Conversation
Iterative coding agent workflow: task chooser → implementer → reviewer → done-gate, wrapped in a trapezium/invtrapezium loop primitive with loop_max=8 and loop_until_file_contains-based termination. - Chooser and done-gate on claude-haiku-4.5 (cheap, API) - Implementer and reviewer on claude-sonnet-4.6 via agent_tool=claude - Feedback persisted to .reviews/iter-NNN.md plus .reviews/latest.md; chooser reads latest only, done-gate can list/read any iteration - Spec passed via --input spec=<abs-path> and read in place; never committed into the target repo - Reviewer uses git show HEAD (vs git diff HEAD~1 HEAD) so the first-iteration case works without a fallback branch Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Force one-subtask-per-iteration to exercise the loop dynamics. - Chooser: pick EXACTLY ONE smallest self-contained sub-task; do not bundle. Explicit guardrail written into .kilroy/task.md for the implementer. - Implementer: implement ONLY what the task asks for; do not guess ahead. - loop_max bumped 8 → 12 to accommodate multi-iteration specs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Agents run via TmuxAgentHandler (agent_tool=claude|codex|gemini|opencode)
were seeing the engine-injected status-contract preamble that instructs
them to write status JSON to $KILROY_STAGE_STATUS_PATH — but the tmux
session env never actually set those variables. Only the API agent_loop
path set them (via buildAgentLoopOverrides). Agents wasted tool calls
hunting for the unset env var and eventually gave up.
Consolidate the env build into buildTmuxAgentEnv, which now merges:
- The tool template's BuildEnv() defaults
- Engine runtime invariants (KILROY_RUN_ID, KILROY_NODE_ID,
KILROY_LOGS_ROOT, KILROY_STAGE_LOGS_DIR, KILROY_WORKTREE_DIR,
KILROY_DATA_DIR, KILROY_INPUTS_MANIFEST_PATH, KILROY_INPUT_*)
via BuildStageRuntimeEnv
- Stage status contract paths (KILROY_STAGE_STATUS_PATH,
KILROY_STAGE_STATUS_FALLBACK_PATH) via BuildStageStatusContract
This matches the API agent_loop path's env, so tmux and API backends
are now consistent with respect to what the status-contract preamble
can actually reference.
Observed in the wild: a 7-iteration coding-loop run where the
implementer burned ~15 of 45 tool calls per iteration searching for
KILROY_STAGE_STATUS_PATH. With this fix the env var is set at session
start and the preamble instruction is actionable.
Adds unit test coverage in tmux_env_test.go.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
73bea5e to
4e30264
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
workflows/coding-loop/— an iterative coding-agent loop: task chooser → implementer → reviewer → done-gate, wrapped in the trapezium/invtrapezium loop primitive with file-based termination. One sub-task per iteration; feedback persisted to.reviews/iter-NNN.md+ rolling.reviews/latest.md; done-gate uses LLM judgment against the spec.KILROY_STAGE_STATUS_PATHandKILROY_STAGE_STATUS_FALLBACK_PATH(plus the fullBuildStageRuntimeEnvset). The engine's status-contract preamble already tells agents to write to these vars, but previously only the APIagent_looppath actually set them in the process env. The tmux path didn't. This affects everyagent_tool=claude|codex|gemini|opencodenode in every workflow.Motivation
Writing and running the coding-loop workflow exposed a latent tmux/API parity gap. When I ran the workflow end-to-end on a toy-list spec (7 sub-tasks → 7 iterations, full green run), I observed the implementer burning roughly 15 of 45 tool calls per iteration hunting for
$KILROY_STAGE_STATUS_PATHthat was never set:Agents eventually gave up and wrote status.json to arbitrary fallback paths. Runs still succeeded because the engine already tolerates a missing status file, but the wasted latency per iteration was substantial (~2-3 minutes each) and the behavior was confusing (preamble tells you to write to an env var that doesn't exist).
What changed
internal/attractor/agents/tmux_handler.go— extracted the session-env construction intobuildTmuxAgentEnv, which now merges:BuildEnv()defaults (unchanged)engine.BuildStageRuntimeEnv— run/node IDs, worktree/logs paths, data dir, inputs manifest,KILROY_INPUT_*(replaces the previous hand-rolled subset)engine.BuildStageStatusContract(...).EnvVars— the two status-contract paths (new)This brings the tmux session env into parity with what
buildAgentLoopOverridesalready provides to the APIagent_looppath.internal/attractor/agents/tmux_env_test.go— new unit tests covering both populated and nil-template code paths; asserts all expected runtime + status-contract vars are present.workflows/coding-loop/— new workflow package (graph.dot, workflow.toml, README.md) exercising the loop primitive end-to-end. Proven on a toy-math spec (1 iter, 4m 24s) and a toy-list spec (7 iters, 30m 54s). Both green.Test plan
go test ./internal/attractor/agents/ ./internal/attractor/engine/— green (agents 7.4s, engine 220.4s)kilroy attractor validate --graph workflows/coding-loop/graph.dot— okgo test ./...exits 0 in target repogo build ./cmd/kilroy/— cleango vet ./...— cleangofmt -lon touched files — clean (pre-existing drift elsewhere unchanged; PR fix(ci): gofmt all unformatted files (engine.go, worktree_hint_test.go, cli_only_models_test.go, codergen_router_cxdb_test.go) #74 covers some of it)Notes
TestRunWithConfig_ForceModel_BypassesCatalogGate,TestRunWithConfig_AllowsKimiAndZai_WhenCatalogUsesOpenRouterPrefixes) are unrelated to this change.loop_max=12cap, and make the done-gate more robust to prompt variation.🤖 Generated with Claude Code