kres-repl: halt on N consecutive task errors and surface them to the … by leitao · Pull Request #3 · masoncl/kres

leitao · 2026-04-27T17:17:41Z

Observed symptom — every fast/slow call returns 400 immediately:

API status 400: max_tokens: 164000 > 128000, which is the maximum
allowed number of output tokens for claude-sonnet-4-6

400 isn't in is_retryable_status, so the call returns Err right away and the task lands in Errored with empty analysis. The reaper then feeds that empty analysis to the todo agent, which reads "no analysis" as "task didn't run" and re-queues the same item forever with reason "Prior execution returned empty analysis; must re-run." Session 6f3f0daf-… spun like that for 50 min — 269 failed slow calls, 0 successes, output dir untouched, tasks #42 through #625 reshuffling the same 11 todo items.

Add a consecutive-error watchdog to the reaper: after K=3 reaped Errored tasks in a row, print a loud banner with the last error verbatim, drain pending/blocked todos to /followup so /continue can resume after the operator fixes the root cause, and (under --one) cancel the root shutdown. Reset on any Done — a single success means the pipeline is alive.

Also pass [task errored: ] (instead of "") to the todo agent when r.state == Errored, so even before the watchdog trips the agent has something concrete to react to instead of treating the empty string as "nothing happened, queue it again."

~/.kres/settings.json configures the slow-agent model per user, but the banner has been reporting claude-sonnet-4-6 on every run even when settings.json selects a different model. The override fires silently because --slow had a clap default_value of "sonnet". The override block in run_repl unconditionally fed args.slow through slow_tag_to_model_id(), mapping the unintentional default to claude-sonnet-4-6 and clobbering whatever was in settings.models.slow. Change ReplArgs.slow to Option<String>. The slow-agent config-file resolver still defaults to "sonnet" so ~/.kres/slow-code-agent-sonnet.json is found when --slow isn't passed; the model-id override only fires when args.slow is Some(_). Signed-off-by: Chris Mason <clm@meta.com>

/edit launches $EDITOR (vim by default) on a scratch file. On this host Esc produced on-screen garbage instead of reaching vim, making the editor unusable. kres installs a DECSTBM scroll region at REPL startup (kres-repl/src/status.rs:50, ESC[1;{bottom}r reserves rows H-1 and H for the status line + prompt). cmd_edit spawns the editor without resetting that region first, so the editor takes over a terminal whose bottom two rows are excluded from scroll. Its cursor math and input decoding drift against that constraint; Esc and other key sequences echo as visible escape-sequence text instead of reaching the editor. Call crate::status::restore() before the spawn_blocking that runs the editor and crate::status::install() afterwards. The editor now starts with a full-height scroll region (same shape it would see if launched from the operator's shell directly), and kres's status row is re-established on return so the REPL prompt and background status poller keep working. Signed-off-by: Chris Mason <clm@meta.com>

Even with the scroll region reset around the editor spawn, the in-editor cursor drifts after pressing Esc: the repaint shows up elsewhere and the caret lands in the wrong column. The status-row paint task (Self::run, the tokio::spawn at kres-repl/src/session.rs:465) ticks every 200ms. On each tick it writes status::paint output to stderr, which absolute-positions the cursor to row H-1, clears the line, writes text, and restores the cursor. stderr is the editor's output fd too, so vim's frame gets scribbled through, and the save-and-restore cursor dance drags the visible caret around independently of vim's own cursor state. The size-check branch is worse — it can re-install the scroll region behind the child's back. Add a Session::status_paused AtomicBool. cmd_edit sets it before handing the terminal off to the editor and clears it on return. The paint loop consults the flag at the top of each tick and continues without doing any work (including the size-check + install call) while the child owns the terminal. Signed-off-by: Chris Mason <clm@meta.com>

Even with the status painter paused and the scroll region reset around the editor spawn, the "> " prompt keeps redrawing on top of vim's frame while the editor is open. read_stdin (kres-repl/src/session.rs) runs on a spawn_blocking thread in a tight `loop { readline("> ") ... tx.send(line) }`. As soon as the line that typed "/edit" is handed to the main task via tx.send, the reader loop iterates and calls readline again — which paints "> " at the current cursor row — before the main task has even dispatched Command::Edit. Any subsequent completion / history / size-report repaint from rustyline lands on top of vim too. Add a Session::editor_lock tokio Mutex<()>. The reader loop takes it via blocking_lock() on each iteration and holds it across readline(); cmd_edit takes it with .lock().await before handing the terminal to the editor and drops it on return. That way the reader thread blocks inside blocking_lock() for the whole editor lifetime and never calls readline while the child owns the tty. The rustyline-init-failed fallback (read_stdin_plain) doesn't print a prompt, so it stays unchanged. Signed-off-by: Chris Mason <clm@meta.com>

Previous attempt (b2c3476) put a Mutex around rustyline::readline: cmd_edit grabbed it before spawning vim, the reader thread held it across readline(). That deadlocked the operator — after typing "/edit<Enter>" the reader re-acquired the mutex on the next iteration while cmd_edit was still awaiting its lock(), so the editor wouldn't launch until the operator hit Enter a second time to let the reader's in-flight readline() return. Replace the mutex with an ack channel. The reader now waits for an explicit ack from the main loop before printing each "> " prompt (after the first). The main loop sends the ack at the bottom of every iteration that consumed an rx.recv(); Quit skips the ack because it's about to tear the reader down anyway, and the auto-continue timer path doesn't ack because it wasn't servicing a user line. Effect: between tx.send("/edit") and the main loop finishing cmd_edit, the reader is parked in ack_rx.blocking_recv() and readline is not running — so rustyline can't paint "> " on top of the editor's frame. A single Enter fires /edit as intended. Signed-off-by: Chris Mason <clm@meta.com>

/stop cancels in-flight tasks and drains the todo list, but the reaper was still running its per-reap pipeline against the cancelled tasks: findings merger, goal check, goal-not-met missing-item injection, and the post-task todo-agent update. Every one of those is an API call, and the missing-item injection writes into the todo list the operator just emptied — so seconds after typing /stop the queue is back full of new work, the auto-continue latch is the only thing stopping the REPL from redispatching, and the operator sees "it's still going". Gate the reaper's post-append work on stop_latched. After the task's analysis is written to accumulated and report.md (and any coding-mode files are persisted under <workspace>/code/), the reaper checks the latch and `continue`s past the merger / goal / todo-agent block. A reap that was already in an inference call finishes that call before returning, but no new calls start, and no new items land in the todo queue. The --turns 0 stop block and all earlier parts of the reap iteration (report append, code_output persist) run as before so legitimate history from tasks that completed before /stop is still captured. Signed-off-by: Chris Mason <clm@meta.com>

Session a85dbc41 (2026-04-21): operator asked "read report.md, identify the first bug reported, code a fix". The pipeline loaded drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c as a 13 KB inline symbol and the slow-coding agent emitted code_output[0] = {path: "fix-bnxt-xdp-redirect-frag-leak.patch", content: "From: … diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c … @@ -297,6 +297,7 @@ …"}. The artifact sat next to the unchanged source; the operator had to manually verify every line number and hunk context against the real tree before it was usable. Add a FIXES AND PATCHES block to the slow-coding system prompt: - fix tasks (the word "fix", "patch", "apply", etc. in the task) require the verbatim current file contents in symbols/context before any output is produced. If the excerpt the agent was given doesn't cover the exact line being changed, the agent must request a `read` followup and WAIT for the next turn rather than reconstruct from memory. - code_output is never allowed to be a standalone .patch / .diff file. The repair goes either through an edit-tool followup (when available) or as code_output whose `path` IS the file being fixed and whose `content` is the full post-fix body copied from the input. - Line numbers and surrounding context in the output must match the file on disk exactly; reconstruction from a summary is explicitly called out as a bug, with the session id as the anchor. Signed-off-by: Chris Mason <clm@meta.com>

Coding-mode fixes have had no way to land surgically on a file. code_output writes (or rewrites) a whole file, which is fine for a fresh reproducer but a bad fit for a one-line fix: the model has to re-emit the entire target file from memory, burning tokens and opening the door to transcription errors. Session a85dbc41 (2026-04-21) fell off this edge and produced a .patch file with hand-reconstructed hunk context. Add an `edit` tool that matches Claude Code's Edit primitive: {type: "edit", file_path, old_string, new_string, replace_all} old_string is looked up literally in the current file contents and must appear exactly once unless replace_all is true. The file is rewritten via tmp + rename for crash safety. Field names match Claude Code (file_path, old_string, new_string, replace_all) on purpose — models already know that shape, and the `old_string` anchor forces them to quote bytes from the real file rather than reconstruct them. `path` and `file` are accepted aliases for `file_path`. Plumbing: - kres-agents/src/tools.rs: EditArgs + edit_file. Rejects empty old_string, old_string == new_string, zero or ambiguous matches, and workspace-escape paths. Returns a short preview of 5 lines centred on the first replacement so the fast agent has cheap evidence. - kres-agents/src/main_agent.rs: `"edit" =>` branch in dispatch_non_mcp and action_label. - kres-core mode.rs: CodeEdit lives here next to CodeFile so TaskOutcome and ReapedTask can carry it without depending on kres-agents. kres-agents re-exports it under the old name. - kres-agents response.rs + pipeline.rs + reaper: code_edits flows from the slow-coding reply through TaskSummary and TaskOutcome; the reaper calls a new apply_code_edits that loops edit_file over each entry, logging per-edit results. - completed_run_count now counts a coding task as produced when code_edits is non-empty even if code_output and analysis are empty. Prompts: - main-agent.system.md documents the "edit" action. - slow-code-agent-coding.system.md FIXES-AND-PATCHES now points at code_edits as the primary surgical-fix channel and keeps code_output full-file rewrite as a fallback. Tests: six new tokio tests cover unique-match / missing / ambiguous / replace-all / traversal / empty-and-identity. Full workspace test run is clean across all 11 bins. Signed-off-by: Chris Mason <clm@meta.com>

The read tool already supports line/end_line/count (aliases startLine/endLine; see kres-agents/src/tools.rs ReadArgs), but the main-agent system prompt only listed `"read" → read` — no shape, no example. The model reached for `bash sed -n '224,330p' file` to read a range instead, which is slower, races against bash's 60s timeout, and produces shell-quoted output. Expand the `"read"` entry with a worked example and the full alias set (`path`/`file`, `startLine`/`line`, `endLine`/`end_line`, `count` as an alternative to `end_line`), and tell the model to prefer it over `bash sed`. Signed-off-by: Chris Mason <clm@meta.com>

The prompt-text path scanner was eating the leading `.` from relative-path tokens, because strip_token_punctuation trimmed `.` symmetrically from both ends. A prompt like read ../linux.net/kres-tues/report.md, write a fix ... became, after trim: `/linux.net/kres-tues/report.md` — both leading dots gone. That path doesn't exist, resolve_candidate's std::fs::metadata probe failed, no grant was added, and the subsequent `read` tool call got the bare "escapes workspace" error with no hint that the operator HAD in fact named the directory. Trim asymmetrically. Leading characters are limited to chars that can never legitimately start a path (quote-likes, opening brackets); `.`, `,`, `:`, `;`, `!`, `?` are stripped only from the right end where sentence punctuation actually lives. `./foo` and `../foo` survive intact. Regression-tested by a scanner test that builds a sibling dir under tmp and feeds the scanner `"read ../sibling/report.md, write a fix"`; the old code failed to grant, the new code grants the sibling directory. Signed-off-by: Chris Mason <clm@meta.com>

Two small tool-dispatch gaps, both of which burned turns in real sessions: 1) The `find` dispatch only read `name` as the -name glob. Session 9fee284e (2026-04-21 17:11) emitted `{"type":"find","pattern":"report.md"}` — `pattern` is how `grep` names this argument and is the word the model naturally reaches for. The dispatcher silently dropped it, `args.name` came back None, and find(1) ran with no filter and dumped the entire workspace tree. The model then fell back to `bash find`. 2) The `read` dispatch only read `endLine` (camelCase). The main-agent system prompt advertises `end_line` as the canonical name with `endLine` as an alias — exactly backwards from what the dispatcher accepted. The snake_case form went straight through to None. Fix by adding the obvious aliases in main_agent.rs' dispatch_non_mcp: `pattern` and `glob` as aliases for `find`'s `name`, and `end_line` as the primary lookup for `read`'s end line (falling back to `endLine`). No API surface change to the underlying tools — just the JSON field lookup in the action-parsing layer. Prompt: document find's arg shape (name / path / kind) and warn that a missing `name` dumps the whole tree; document git's shape (`command` as one string); mention grep's `glob` and `limit`. Same approach that was just applied to `read` — every in-tree tool now has a worked example in the action list rather than relying on the model to guess the field names from the followup-schema word. Tests: two new tokio tests in main_agent::tests exercise the two dispatch paths directly — find with `pattern` alias must produce a filtered result, read with `end_line` must clip at the requested line. Workspace test run is clean. Signed-off-by: Chris Mason <clm@meta.com>

apply_code_edits landed edits (or failed to) and logged one line per result to stderr, then threw the outcomes away. If the slow agent emitted an edit whose `old_string` was stale — either reconstructed from memory or invalidated by an earlier edit in the same batch — the failure died on stderr. The next slow-agent turn read the accumulated ledger and report.md and saw NOTHING about the failure, so it kept re-emitting the same broken anchor. Thread the results back. - apply_code_edits now returns Vec<AppliedEdit> carrying per-file Ok(preview)/Err(message). stderr logs are unchanged for the operator; the structured vec is what the reaper uses. - format_applied_edits_trailer renders the vec into a short markdown section ("Edits applied (N/M, K FAILED):") with one bullet per edit. Failed entries are prefixed "[FAILED]" with the verbatim error text — same anchor text the slow agent needs to re-emit a corrected edit. - The reaper now runs persist_code_output and apply_code_edits BEFORE building effective_analysis, so the trailer can be folded in before the analysis lands in last_analysis, the accumulated ledger, and report.md. Follow-on tasks and /summary now see the same truth the operator sees on stderr. - The old post-analysis persist/apply pair is removed — duplication would double-write files. Multi-edit ordering, which was previously undocumented: - slow-code-agent-coding.system.md now states that code_edits apply in emission order against the file state AFTER prior edits in the same batch have landed, and that failures are NOT retried — they surface in the analysis trailer under [FAILED] so the next turn can re-emit. Guidance: prefer one edit per file per turn unless anchors are known not to collide. Tests: three unit tests cover the trailer formatter (failure markers, empty input, all-success no-FAILED-marker). Full workspace test run is clean. Signed-off-by: Chris Mason <clm@meta.com>

The main-agent read dispatch was returning a header-only string to the turn log: Read drivers/net/.../bnxt_xdp.c:270-371 (2587 chars), symbol 'bnxt_xdp.c:270-371' …with zero bytes of the actual file content. The bytes were attached to the symbol pool that forwards to the SLOW agent, but the MAIN agent — the one choosing its next action — saw only the ack. When it wanted lines 270-370 of a file to reason about the XDP redirect path, it called `read` with `end_line` (worked), got back a size header with no content, tried again with `count` (worked), got the same header with no content, then gave up and ran `bash sed -n '270,370p' ...` which DOES put the bytes in [stdout]. Session 04365bed (2026-04-21, turns 1-3) is the anchor. The prompt already says "prefer read over bash sed"; the tool was the one failing to hold up its end. Fix: append the file body to the turn-log text after the header. Truncate at TOOL_OUTPUT_CAP_GREP_FIND (20k chars) — the same envelope grep and find use — so a whole-file read with no range can't blow the turn budget. The symbol attachment is unchanged, so the slow agent still receives the full content via its symbol pool and the dedupe cache continues to function. Test: `read_text_result_contains_file_body` reads lines 3-5 of a tmp file via dispatch_non_mcp and asserts both "line 3" and "line 5" appear in the returned text alongside the existing header. Full workspace test run is clean. Signed-off-by: Chris Mason <clm@meta.com>

…ault Operators report the bash tool being reached for as a general escape hatch for things the typed tools already cover — `bash sed` for a range read, `bash find` for a filename locate. Even after the prompt was fixed to say "prefer read over bash sed" and the read tool was fixed to actually return file bytes (commit 2c56099, 2026-04-21), nothing stops a model that learns bash from training memory: a `{"type":"bash","command":"..."}` action would hit the unconditional `"bash" =>` branch of dispatch_non_mcp. Add a real gate. Settings schema (kres-repl/src/settings.rs) ------------------------------------------- - New `actions: { allowed: Option<Vec<String>> }` block in settings.json. - Semantics of `actions.allowed`: * `null` or key absent → falls back to DEFAULT_ALLOWED_ACTIONS (grep/find/read/git/edit — bash excluded). * `[]` → empty is the EXPLICIT "deny every non-MCP action" signal; the dispatcher enforces it (no collapse-to-defaults). * `["read", ...]` → exact set, no defaults merged in. - `load_merged` layers `<cwd>/.kres/settings.json` over `~/.kres/settings.json`. Project values replace global field-by-field; allowlists don't union — more specific wins. The split helpers `load_merged_with_paths` and `apply_project_overrides` exist so the merge is testable without mocking the operator's real home directory. - `effective_allowed_actions(cli_extras)` returns the final set. CLI `--allow` tokens are additive on top of whatever the settings resolved to. Unknown tokens (not in KNOWN_ACTION_TYPES and not the `"all"` escape hatch) are DROPPED — a dead entry in the allowlist serves no purpose and masks the typo. The companion `warn_unknown_action_tokens` surfaces them with a closest-match suggestion (Levenshtein distance ≤ 2). - KNOWN_ACTION_TYPES lists every action the main agent might emit: grep/find/read/git/edit/bash PLUS mcp. "mcp" is a no-op in the allowlist (MCP is gated by mcp.json server registration), included only to keep the typo-warner from false-positive-flagging `--allow mcp`. CLI (kres/src/main.rs) ---------------------- - New `--allow ACTION` flag. Repeatable (`--allow bash --allow git`) or comma-separated (`--allow bash,git`). Adds to whatever settings.json resolved to. Special token `--allow all` expands to every action type the dispatcher knows (including bash) — the one-off escape hatch. - kres loads merged settings via `Settings::load_merged(workspace)` at startup, calls `warn_unknown_action_tokens` before computing the set (so the operator sees a typo warning BEFORE the flag silently no-ops), then builds the Arc<BTreeSet<String>> shared into MainAgent. - Startup banner prints the resolved allowlist and distinguishes "bash disabled by default" from "bash disabled by explicit allowlist in settings.json" — both are valid operator choices; the wording respects each. The banner is gated on a main-agent config being resolved, so --summary and other non-LLM modes don't see noise about an allowlist that never runs. Dispatch gate (kres-agents/src/main_agent.rs) --------------------------------------------- - MainAgent carries `allowed_actions: Arc<BTreeSet<String>>`. - dispatch_non_mcp rejects any action whose `type` is not in the set with an `[error] action type 'X' is not in the allowed-action list for this session (list). To enable it, add 'X' to actions.allowed in ~/.kres/settings.json (or <cwd>/.kres/settings.json) or re-run kres with '--allow X'.` string that names the allowed set and the fix paths. - An empty set is an explicit deny-all (error message surfaces "none — every non-MCP action is denied this session") — NOT a fall-back-to-allow-everything. Tests pass explicit sets for the tools they exercise. - Malformed actions (missing `type` field) bypass the gate so they hit the existing `unknown action type: ?` error — a malformed action is not a gated action and deserves a clearer error. Prompt (configs/prompts/main-agent.system.md) --------------------------------------------- - The `bash` entry documents that bash is OFF by default and shows the rejection shape so the model doesn't re-emit the same bash action hoping it lands. Pointer to the typed alternatives (read/grep/find/git). Template (configs/settings.json) -------------------------------- - The installed template gains `_actions_doc` and `_actions_example` fields so an operator who opens ~/.kres/settings.json can see the schema and a commented-out example. serde_json ignores unknown keys so behaviour is unchanged. Tests ----- kres-repl/src/settings.rs (11 new): - default_allowlist_excludes_bash - cli_allow_adds_bash - cli_allow_all_adds_everything (asserts every DEFAULT_ALLOWED_ACTIONS entry + bash so a future list shrink fails here, not silently) - settings_allowlist_replaces_default - project_only_allowlist_replaces_defaults - load_merged_project_overrides_global (real merge, global + per- project files written and layered) - load_merged_global_only_when_no_project - explicit_empty_allowlist_stays_empty - warn_unknown_action_tokens_flags_typos - warn_unknown_action_tokens_silent_for_clean_input - closest_known_action_suggests_within_edit_distance_two kres-agents/src/main_agent.rs (3 new): - dispatch_rejects_action_not_in_allowlist (gate returns informative error with --allow / settings.json hints) - empty_allowlist_denies_all_actions (pins the corrected contract after the first review) - malformed_action_reports_unknown_not_gated (no-`type` field action hits "unknown action type" catch-all, NOT the allowlist error) kres/src/main.rs (2 new): - allow_flag_accepts_comma_separated (value_delimiter = ',' + repeatable both work) - allow_flag_defaults_to_empty Full workspace test run is clean: 14 kres-repl, 140 kres-agents, 51 kres unit tests pass. Signed-off-by: Chris Mason <clm@meta.com>

Two major user-visible features landed since the README was last written and aren't documented anywhere operators would find them. - **Coding mode** (commits 68fc5a9, d5fa179, 8163c85) — a prompt like "write a reproducer for X" or "fix the bug in Y" is classified by the goal agent as coding mode instead of analysis. The slow agent emits a single call whose output is source code via two channels: `code_output` (full files written under `<workspace>/code/<path>`) and `code_edits` (surgical Edit- primitive-style `{file_path, old_string, new_string, replace_all}` records applied in-place atomically). Failed edits fold into the analysis trailer under `[FAILED]` so the next turn can correct them. Coding mode can also emit `bash` followups for compile/run verification — mentioning the `--allow bash` requirement makes the dependency explicit. - **Action allowlist** (commit 86aca4f) — the main agent's non- MCP tools are gated behind a session-wide allowlist. Document the three precedence levels (CLI `--allow`, per-project `<cwd>/.kres/settings.json`, global `~/.kres/settings.json`), the defaults (grep/find/read/git/edit, bash OFF), the explicit empty-list "lock it down" signal, and the typo-detection behaviour. Include worked-example JSON blocks for both the permanent-enable and the tight-lockdown configs. CLI synopsis block at the bottom gains `--allow ACTION`. Signed-off-by: Chris Mason <clm@meta.com>

Signed-off-by: Chris Mason <clm@meta.com>

…prompt Builds on 3e9c3b4 (embed agent *.system.md files). That commit still read operator overrides from ~/.kres/prompts/<basename>, so a fresh install after an upgrade would keep reading stale files setup.sh wrote on the prior install — the exact shadow-by-stale- file problem the embed was meant to eliminate. This commit: 1. Moves the override directory to ~/.kres/system-prompts/ so leftover ~/.kres/prompts/*.system.md files from older installs are never consulted. 2. Extends the embed + override treatment to the bug-summary templates so /summary and `kres --summary` get the same "rebuild refreshes, optional override under system-prompts/" behaviour the agent prompts have. The split is consistent: `~/.kres/system-prompts/` holds operator overrides for compiled-in LLM system prompts (agent roles + bug-summary). `~/.kres/prompts/` continues to hold user-authored prompt TEMPLATES — review-template.md and any `<word>-template.md` — which have nothing to do with LLM system prompts. Code changes ------------ - configs/*.json: the five shipped agent configs now reference `system_file: "system-prompts/<name>.system.md"`. Path is resolved relative to the config file's directory, so at runtime it becomes ~/.kres/system-prompts/<name>.system.md. - kres-agents/src/embedded_prompts.rs: the TABLE gains two entries — `bug-summary.md` and `bug-summary-markdown.md` — alongside the six existing agent prompts. New unit test `bug_summary_templates_are_present` pins both keys. Module docstring rewritten to make clear the module covers every LLM system prompt (not just agent prompts) and to document that only the review/`<word>`-template files stay on disk. - kres-repl/src/summary.rs: the previous direct `include_str!` for BUG_SUMMARY_TEMPLATE and BUG_SUMMARY_MARKDOWN_TEMPLATE is replaced by `bug_summary_template()` / `bug_summary_markdown_template()` functions that read from `kres_agents::embedded_prompts` (unwrap-expect because the table's test guarantees the keys). `default_template_path()` and `default_markdown_template_path()` now point at ~/.kres/system-prompts/ instead of ~/.kres/prompts/. The module and SummaryInputs docstrings updated to match. - kres-repl/src/session.rs: load_prompt_disk_then_embedded (the loader used for coding/generic slow-mode prompts) joins ~/.kres/system-prompts/<basename>. ReplConfig.template_path and the REPL's /summary helper both reference the new path. - setup.sh: prompts-install step now skips both `*.system.md` AND `bug-summary{,-markdown}.md` — everything with an embedded copy. Comment block spells out the split between embedded LLM system prompts and disk-based prompt-templates, and explains the leftover-file rationale for the new directory name. - kres/src/main.rs: --template docstring mentions the ~/.kres/system-prompts/bug-summary.md override path and the embedded fallback. Docs ---- - README NEWS entry notes the move and the leftover-file-ignored behaviour for both `*.system.md` and `bug-summary*.md`. - README "System prompts" section: load order now covers both prompt classes (agent configs and /summary), lists the embedded basenames explicitly, and explains that stale files under ~/.kres/prompts/ are safe to delete. - README "Summary output" bullet updated — bug-summary is no longer installed under ~/.kres/prompts/. - CLAUDE.md config-directory table renamed the override entry to `system-prompts/*.system.md` with a clearer description that the directory is empty by default. Tests ----- - Existing config.rs tests continue to build tmp configs with explicit sibling paths and do not depend on the directory- name rule. - New embedded_prompts test (`bug_summary_templates_are_present`) pins that the new entries land in the shared table. - Full workspace test run clean: 14 / 146 / 58 / 63 / 16 / 51 unit tests across the six crates. Signed-off-by: Chris Mason <clm@meta.com>

…summary-markdown) Turns the review template into a first-class slash command and introduces `~/.kres/commands/` as the override surface for every slash-command template. Three shipped commands — `review`, `summary`, `summary-markdown` — are embedded in the kres binary and can be replaced or added to by dropping a file at `~/.kres/commands/<name>.md`. The review template was previously reached only via `--prompt "review: target"` through a path-based lookup under `~/.kres/prompts/review-template.md`. That path is kept as a back-compat fallback for operators with custom `<word>-template.md` files, but the primary lookup is now `~/.kres/commands/<name>.md` → embedded body. CLI: `--prompt "word: extra"` and `--prompt "/word extra"` ----------------------------------------------------------- `resolve_prompt_arg` now recognises both forms and resolves the command name through the new `user_commands::lookup`. Both examples are equivalent and produce the same composed prompt: kres --prompt 'review: fs/btrfs/ctree.c' kres --prompt '/review fs/btrfs/ctree.c' The splitter accepts `alphanum / - / _` for the command word; anything else falls through to inline-prompt. The `/word` variant uses the first whitespace run as the separator; the `word:` variant uses the first colon. New module: kres-agents::user_commands --------------------------------------- - TABLE keyed by command name (not filename) populated via include_str! for review-template.md, bug-summary.md, and bug-summary-markdown.md. - `lookup(name) -> Option<String>` checks `~/.kres/commands/<name>.md` on disk, falls back to the embedded table, else None. - Four unit tests: non-empty body for every entry, all three expected names present, unknown-name returns None, review body contains the `[investigate]` lens markers (catches a silent include_str! miswire). summary.rs now reads through user_commands ------------------------------------------ - `bug_summary_template()` and `bug_summary_markdown_template()` now return `String` and go through `user_commands::lookup` instead of `embedded_prompts::lookup`. Panic-message updated. - `default_template_path()` and `default_markdown_template_path()` return paths under `~/.kres/commands/` (was `~/.kres/system-prompts/` from the previous commit — the slash-command-templates override surface is distinct from the agent-system-prompts one). - Module docstring rewritten to describe the commands/summary[.md,-markdown.md] resolution path. - The run_summary caller now types `fallback_text` as `String` to match the new return type of the template fns. embedded_prompts table trimmed ------------------------------ `bug-summary.md` and `bug-summary-markdown.md` are removed from `kres-agents::embedded_prompts`; user_commands owns them now. The test that pinned their presence is dropped. embedded_prompts is back to its original scope (agent *.system.md only), which the module docstring now states. setup.sh -------- The prompts-install loop now skips `review-template.md` on top of the existing skips for `*.system.md` / `bug-summary{,-markdown}.md`. Comment rewritten to describe BOTH override directories (`system-prompts/` for agent prompts, `commands/` for slash commands) and the `<word>-template.md` back-compat lane. Dep: `dirs` added to kres-agents' runtime deps so user_commands can resolve $HOME. Matches the workspace pin. Docs ---- - README NEWS entry rewritten for the `~/.kres/commands/` path and the two override directories. - README `--prompt 'review: ...'` section rewritten to show both invocation forms as equivalent and explain the user_commands::lookup resolution order. New section `## Slash-command templates` added under "System prompts" with an invocation table and a list of the three shipped commands. - README "Parallel lenses" section updated: customising the review template now means dropping `~/.kres/commands/review.md`, not editing `~/.kres/prompts/review-template.md`. - README "Summary output" bullet updated to point at `~/.kres/commands/summary.md` as the override. - CLAUDE.md config-directory table gains a `commands/<name>.md` entry alongside the existing `system-prompts/` row. Full workspace test run is clean: 14 / 146 / 58 / 63 / 16 / 51 unit tests (+ new user_commands tests already counted in the kres-agents bucket). Signed-off-by: Chris Mason <clm@meta.com>

All issues identified in the follow-up review of 6a7ba38 addressed in one commit — behaviour fixes, test coverage, and doc sweeps. Behavior fixes -------------- - `/review <target>` and `/summary-markdown [FILE]` REPL commands added. The original 6a7ba38 landed the CLI equivalence (`--prompt "/review ..."` == `--prompt "review: ..."`) but left the REPL slash-command half of the user's ask unimplemented; typing `/review fs/btrfs/ctree.c` at the prompt hit `Command::Unknown("review")`. Now: * `/review <target>` composes the `review` template via `user_commands::compose` (same path the CLI uses) and submits the result as a new task. * `/summary-markdown [FILE]` behaves like `/summary` but selects the `summary-markdown` template and defaults the output filename to `bug-report.md`. * `cmd_summary` now takes a `markdown: bool` so both /summary and /summary-markdown share one code path. - `user_commands::lookup` rejects names that are not `[a-zA-Z0-9_-]+`. The CLI entry point already filtered that character set, but lookup is a public API and a future caller could hand it a path-traversal string like `"../etc/passwd"` which would otherwise resolve outside the commands directory. The guard lives in `is_valid_name` and applies to both `lookup` and the new `lookup_with_root` helper. - `user_commands::lookup_with_root(commands_dir, name)` split out so disk-override behaviour is testable without mocking `$HOME`. Mirrors the pattern used for `Settings::load_merged`. - `user_commands::compose(name, extra)` consolidates the "lookup + prepend extra" logic. Used by both `resolve_prompt_arg` in main.rs and the new `cmd_review` in session.rs so CLI and REPL flow through one implementation. Stale docs swept ---------------- Four docstrings pointed at the previous override path (`~/.kres/system-prompts/bug-summary{,-markdown}.md`); the code moved to `~/.kres/commands/<name>.md` in 6a7ba38 but the comments were missed. Updated: - `kres/src/main.rs:~170` — `--template` docstring - `kres-repl/src/summary.rs:~76` — SummaryInputs.template_path - `kres-repl/src/session.rs:~51` — ReplConfig.template_path - `kres-repl/src/session.rs:~2230` — `/summary --template` hint in the REPL cmd_summary comment README + CLAUDE.md sweeps ------------------------- - README invocation table rebuilt. The previous three-row table had a header mismatch (titled "How to invoke review" but the third row was `/summary`). Replaced with a per-command matrix covering review / summary / summary-markdown across CLI and REPL. - "Parallel lenses" note reworded from "prompt starts with `review:`" to the precise "colon-terminated leading word" form, with an explicit example of what doesn't trigger. - REPL command list at the bottom of the README adds `/summary-markdown [FILE]` and `/review <target>`. - CLAUDE.md REPL-commands table: `/summary` row expanded into three rows covering /summary, /summary-markdown, /review. Tests ----- - `user_commands` gains five tests (10 total in the module): * `disk_override_wins_over_embedded` — write a tmp file, assert lookup_with_root reads it instead of the embedded body. * `disk_override_empty_falls_through_to_embedded` — whitespace-only file doesn't brick the command. * `traversal_name_is_rejected` — path-traversal strings, empty / `.` / `..` all return None. * `compose_prepends_extra_to_body` — `compose("review", "X")` puts `X\n\n` at the front of the template body. * `compose_empty_extra_returns_bare_body` — empty extra text yields the body with no leading blank. * `compose_unknown_name_returns_none`. - `kres-repl::commands` gains two tests: * `parses_summary_markdown` — covers `/summary-markdown` with and without a filename arg. * `parses_review_with_target` and `review_without_target_is_unknown` — the target is required; missing it surfaces a helpful Unknown message. - `kres` main.rs gains four tests around `resolve_prompt_arg`: * `resolve_prompt_arg_word_colon_form_hits_user_commands` * `resolve_prompt_arg_slash_form_equivalent_to_colon_form` (pins the user's explicit CLI equivalence ask) * `resolve_prompt_arg_slash_unknown_command_falls_to_inline` * `resolve_prompt_arg_inline_colon_not_misparsed` Full workspace test run is clean: 156 kres-core, 58 kres-repl, 63 kres-llm, 16 kres-mcp, 54 kres unit tests pass. Signed-off-by: Chris Mason <clm@meta.com>

Signed-off-by: Chris Mason <clm@meta.com>

`user_commands::compose()` builds the review prompt as `format!("{extra}\n\n{body}")` (kres-agents/src/user_commands.rs:112), so the operator-supplied target lands *above* the template body. The opening line said "the target below", so when an operator invoked `/review` with no target but an extra paragraph that itself ended "... the target below", the fast agent found nothing below, emitted only a type:question asking for the target, the main agent replied with prose (no `<actions>`), gather returned empty, and every lens slow agent emitted "SCOPE CHECK FAILED — no source code was provided" (linux.net session b686e702-f6af). Fix by rewriting the opening paragraph to describe what actually precedes it (file path, function name, commit ref, diff, or snippet) so the fast agent latches onto the prepended target. Signed-off-by: Chris Mason <clm@meta.com>

Two parity gaps between `run_once_with_ctx` and the lens-mode `gather` let the fast loop burn rounds without making progress: 1. `gather` ignored `skill_reads`. `run_once_with_ctx` has `apply_skill_reads(&mut live_skills, &parsed.skill_reads)` at the top of the parse-handling block so any skill file the fast agent requests mid-loop reaches the slow agent. `gather` instead threaded `self.skills` straight through, so a lens-mode fast agent that emitted `skill_reads` got its request silently dropped and the lens slow agents saw the pre-gather skills payload. 2. Neither loop bailed when every followup had `kind == "question"`. Question-type followups are clarifications for the operator, not fetchable data; `MainAgent::fetch` turns them into prose with no `<actions>` (kres-agents/src/main_agent.rs:260-269), so `fetched.symbols` / `fetched.context` come back empty and the loop re-asks. Session b686e702-f6af on linux.net spent every fast round and every lens call on a prompt with no target. Fix by cloning `self.skills` into a `live_skills` binding inside `gather`, calling `apply_skill_reads` on it, returning it from `gather`, and passing it into the lens `CodePrompt`; and by breaking out of both the `run_once_with_ctx` and `gather` loops when `parsed.followups.iter().all(|f| f.kind == "question")`. Signed-off-by: Chris Mason <clm@meta.com>

semcode-mcp emits `notifications/message` log lines on stdout during a `tools/call`, so a single call produces two lines: the notification and the real response. kres-mcp's read loop tried to deserialize every line as a `Response`, whose `#[serde(flatten)] result: ResponseResult` demands either a `result` or an `error` top-level key. Notifications have neither, so serde rejected them and the caller surfaced "mcp server `semcode` produced malformed JSON: data did not match any variant of untagged enum ResponseResult at line 1 column 131" — the real response on the next line was then dropped, and callers like `find_function(mark_stack_read)` saw an error even though the tool had succeeded. Reproduced by piping a synthetic `initialize` + `tools/call` at semcode-mcp: line 2 of stdout is exactly 131 chars long and starts with `"jsonrpc":"2.0","method":"notifications/message"` — the same length serde reported as column 131. Fix by parsing each line as a raw `Value` first; if it has a `method` field and no `result`/`error`, treat it as a notification and continue the read loop. Only non-notification lines are deserialized into `Response`. Signed-off-by: Chris Mason <clm@meta.com>

Add a paragraph directing the slow agents to report every bug they find in the target area, not just the worst one. The prior wording (focus + chains-of-events) encouraged depth per finding but said nothing about breadth, so a lens that surfaced one use-after-free sometimes stopped there even when adjacent call sites had the same shape. The merger already deduplicates and ranks severity, so the cost of over-reporting is low and the cost of missing distinct bugs of the same class is high. Wording references downstream merger behaviour so the agent understands why duplication is cheap relative to omission. Signed-off-by: Chris Mason <clm@meta.com>

Add `kres_core::Plan { prompt, goal, mode, steps[], created_at }` produced by a new `define_plan` goal-agent task that runs right after `define_goal` on every operator-typed prompt. Each `TodoItem` grows a `step_id` pointing at the plan step it executes; `Plan::sync_from_todo` unions the step_id up-link with the legacy todo_ids down-link and rolls step status up from the linked todos' statuses. Every downstream agent sees the plan every turn: - fast + slow via `CodePrompt.plan` — serialised into the user JSON, but NOT in `CACHED_PREFIX_FIELDS` so plan rewrites do not bust the big prefix cache. - main via `DataFetcher::fetch(&followups, plan)` — passed per-call so concurrent tasks with different plans cannot clobber each other through a shared slot. - goal judge via `check_goal(..., plan)`. - todo agent via `update_todo_via_agent_with_logger(..., plan)`. Three writers can reshape the plan after the initial `define_plan`: the first slow call per top-level prompt (gated on `RunContext.allow_plan_rewrite`), the todo agent on every completed task, and the goal-not-met todo-agent injection. Rewrites emit `{steps: [...]}` on the wire via `kres_core::PlanRewrite` — the plan's identifying metadata (`prompt`, `goal`, `mode`, `created_at`) is inherited from the prior plan via `PlanRewrite::apply_to`, so a forgotten metadata field in an LLM reply cannot silently drop the rewrite. Step ids are kebab-case slugs (`audit-ring-buffer-init`) rather than positional tags (`s1`/`s2`). Slugs survive reorder and make reassignment to a different meaning obvious. The synthesis path uses `slugify_step_id(title)`; collisions walk `-2`, `-3`, … . The four planner/rewriter prompts (goal.txt, todo.txt + the three slow-agent system prompts) all teach the LLM the slug convention and forbid positional ids. `PlanStep.id` and `PlanStep.title` have serde defaults, and `PlanRewrite::apply_to` pipes the rewrite's steps through `normalize_steps` — missing-id / blank-title / collided rows get repaired rather than producing a corrupt plan. `TaskManager::set_plan` reconciles orphan step_ids: when a rewrite drops an id, any todo whose `step_id` referenced it gets the field cleared so `sync_from_todo` does not keep pointing at dead weight. The session persistence layer writes Plan into `<results>/session.json` alongside `todo`, `deferred`, `completed_run_count`, and `last_prompt`. Atomic save (tmp + fsync + rename + parent-dir fsync); a content-hash latch in the reaper throttles idle rewrites. `InProgress` normalisation on load plus `reset_in_progress_to_pending` on drain paths (ctrl-c, --turns cap, goal-met) so no todo gets orphaned at exit. Operator-facing surface: - `/plan` — shows current plan with per-step status and linked todos via both linkage directions. - `log_plan_change` at the four rewrite sites shows `+`/`-`/`~` diffs; `log_plan_status_transitions` logs `pending → done` etc. every reaper tick that changed a status. Tests: 88 kres-core, 168 kres-agents, 63 kres-repl. All green. Signed-off-by: Chris Mason <clm@meta.com>

Session persistence shipped as "point at an existing --results dir and kres picks up where it left off". That default bled prior sessions' plans, todos, and counters into a new run whenever the operator re-used a results dir. Gate resume behind an explicit opt-in: - `--resume` CLI flag. Without it, kres starts clean even when `--results` points at a directory with a `session.json`. When `session.json` is missing but `session.json.prev` exists (from the backup path below), `--resume` falls back to the backup and logs the swap. - On startup without `--resume`, any existing `session.json` in the results dir is renamed to `session.json.prev` BEFORE the reaper starts writing — the prior snapshot survives the first 250 ms reaper tick that would otherwise overwrite it. - `/resume [PATH]` REPL command. With PATH: load that file. Without: prefer `<results>/session.json.prev`, else the live `session.json`. Calls the same `resume_state_from` helper `--resume` uses, so the `InProgress → Pending` normalisation and the deferred-list seeding are identical. Operators now see an explicit choice at startup ("note: prior session snapshot moved to DIR/session.json.prev; starting clean") and can recover mid-session with `/resume` when they realise they wanted the prior state after all. Signed-off-by: Chris Mason <clm@meta.com>

When the kernel skill directs the fast agent to read subsystem.md and load matching subsystem guides, the paths inside that file are relative to @REVIEW_PROMPTS@/kernel/subsystem/ — not the top-level @REVIEW_PROMPTS@/kernel/ directory. Spell it out so the agent adjusts the paths when following the guide chain. Signed-off-by: Chris Mason <clm@meta.com>

README.md had grown to 736 lines covering the agent flow, parallel-lens review, `--turns` semantics, coding mode, action allowlist, review-prompts repo integration, semcode MCP, configuration, system-prompt overrides, slash-command templates, CLI reference, workspace layout, and pre-commit hook. Readers hit a wall when all they wanted was how to get a review running. Split into one file per topic: - NEWS.md (news section). - docs/agents.md (fast / main / slow / todo / merger flow + "Building up a larger review"). - docs/review-template.md (the `/review` slash-command and its five-lens fan-out). - docs/coding-tasks.md (reproducers, `code_output`, `code_edits`, `bash` verify). - docs/summary.md (`/summary`, `--summary`, bug-report format). - docs/turns-and-follow.md (`--turns`, `--follow`, stop modes). - docs/action-allowlist.md (`--allow`, precedence, typo detect). - docs/configuration.md (`~/.kres/`, model precedence, system prompt overrides). - docs/commands.md (slash-command templates + disk overrides). - docs/review-prompts.md (external review-prompts repo). - docs/semcode.md (semcode-mcp integration). - docs/cli.md (every CLI flag + REPL command). - docs/development.md (workspace layout, build/test/lint, pre-commit hook, wire-format references). README.md drops to 105 lines: title, a "Why kres exists" paragraph that names each role and what it's for, three-step quick start, and a link hub into the docs. Content is not rewritten — each section keeps its existing prose so the git history stays traceable. Signed-off-by: Chris Mason <clm@meta.com>

The topic guides split out of README.md carried over padding from the original long-form README, and several file:line citations had drifted against the current source (setup.sh line numbers in review-prompts.md and semcode.md, a task.rs range in turns-and-follow.md, a symbol.rs range in semcode.md). Prose across action-allowlist, agents, coding-tasks, commands, configuration, review-prompts, review-template, semcode, summary, and turns-and-follow compressed to fact-per-paragraph; stale numeric ranges either updated or dropped in favour of function/symbol names that do not rot. No content dropped - every fact reachable from the longer form is still there. Signed-off-by: Chris Mason <clm@meta.com>

agents.md referenced a TBD docs/planning.md that was never written; the only file at that path on disk is an untracked stray git-log dump. Drop the parenthetical so readers don't chase a broken link. commands.md claimed all three shipped slash-commands compose a template body with a trailing target via user_commands::compose. That only describes /review — /summary and /summary-markdown call user_commands::lookup (kres-repl/src/summary.rs:141) to fetch the body and pass it as the system prompt for a fast-agent call over report.md + findings.json, with no target composition. Split the bullet list so the two roles are distinct. Signed-off-by: Chris Mason <clm@meta.com>

Improve error reporting for network errors

The focus was too narrow, missing some bugs. Signed-off-by: Chris Mason <clm@meta.com>

A successful git query that matches nothing (e.g. `log <range> -- <paths>` over a window with no commits touching those paths) returns exit 0 with empty stdout and stderr. The main agent wraps each tool result in a `--- git ... ---\n<body>` envelope, so an empty body is indistinguishable from "the tool never ran" and the fast/slow agents loop, re-requesting the same query. Observed in kres session 81d6b079: the fast agent emitted the same `git log <window> -- include/net/ip_tunnels.h ...` five times and each empty reply was read as MISSING by the goal-check agent ("The git log returned no output (MISSING) ... the required operation was not completed"). Running that query on the tree by hand confirmed the range genuinely has zero matching commits; the tool was not blocked. Fix by appending "(no output)\n" when both stdout and stderr come back empty so callers can tell "ran, produced nothing" apart from "never ran." Signed-off-by: Chris Mason <clm@meta.com>

Operators who want to share or review individual findings currently have to grep through findings.json or carve up report.md by hand. Add a --export DIR flag that iterates findings.json (honouring --results / --findings like --summary does) and writes one subfolder per finding. Each entry is DIR/<tag>/, where <tag> is the finding id sanitized for directory use. Collisions after sanitizing get a numeric suffix so ids that only differ in punctuation don't silently overwrite. Each folder carries a meta.yaml (id, title, severity, status, workspace git HEAD sha + subject, cross-refs, symbol and file-section locations, open_questions) and a FINDING.md with the full body — summary, mechanism, reproducer, impact, fix sketch, open questions, relevant symbol definitions, file section contents, and the per-task analysis details captured in finding.details. Signed-off-by: Chris Mason <clm@meta.com>

Migrate the REPL's UI from the rustyline prompt + DECSTBM status bar to ratatui. The rustyline path remains reachable via --no-tui as an escape hatch, and --stdio stays unchanged for redirected output. Signed-off-by: Chris Mason <clm@meta.com>

The TUI pane printed the slow-agent analysis body as one big wall of text — fenced code blocks, prose, and indented listings all ran together. Carry the block through a dedicated sink so the render path can style it, and keep the non-TUI paths unchanged: * kres_core::io grows a markdown-block sink slot alongside the existing printer slot. install_markdown_sink installs a TUI sink that brackets the body with MD_BLOCK_START / MD_BLOCK_END sentinel lines before pushing it into the scrollback. The non-TUI path never installs a sink, so async_println_markdown folds back to async_println and --stdio output is byte- identical. * report_reaped now emits the analysis body via async_println_markdown so the TUI sees a marked block while --stdio and rustyline paths keep their old plaintext. * tui::render_markdown_block is a tiny in-house renderer that emits ratatui Line/Span with fenced code in Cyan, fence markers in dim Cyan, 4-space-indented code in Cyan, and inline \`backtick\` spans in Cyan. Everything else is a plain Span. No new deps — tui-markdown 0.3 requires ratatui-core::Line and 0.2 requires ratatui 0.28, neither of which extend into our Vec<ratatui::prelude::Line> without a cross-version shim. The render loop walks the windowed scrollback, pipes bracketed runs through render_markdown_block (drops sentinel lines), and renders everything else verbatim. A window slice that cuts through a bracketed region falls back to plain rendering for the visible half — acceptable for a first pass. Signed-off-by: Chris Mason <clm@meta.com>

Four fixes from the review of c861ed8: 1. Stale tui_markdown references in two comments (install_tui printer at kres-repl/src/tui.rs:189 and the render loop around :1036) now name render_markdown_block, matching reality after the dep was removed. 2. install_markdown_sink in kres-core/src/io.rs is now install- if-absent (returns Err on collision) and the replace variant lives separately as replace_markdown_sink, mirroring the install_printer / replace_printer split in the same module. install_tui_printer switched to replace_markdown_sink deliberately — the TUI wants the replace semantics for the same reason it uses replace_printer a line earlier. 3. Pin the --stdio byte-identical promise with a test: async_println_markdown with no sink installed produces the same captured output as async_println, and never emits sentinels. Plus a test for the install_if_absent contract. 4. Drop the unused `bytes` binding in split_inline_code; only bytes.len() was read, which is identical to line.len(). Signed-off-by: Chris Mason <clm@meta.com>

The --turns 0 stop condition (goal met, no-progress streak, or no-goal-agent batch finished) leaves the REPL open waiting for further operator input. A redirected/piped invocation has no operator on the other end, so the process sits forever after the work is done. Add ReplConfig::exit_on_idle, set it from main.rs to !stdout.is_terminal(), and have the reaper's --turns 0 stop branch set turns_exhausted and cancel root_shutdown when the flag is on. Mirrors the existing --turns N exit path so summary.txt is auto-rendered before teardown. Signed-off-by: Chris Mason <clm@meta.com>

metadata.yaml's only file pointers were buried inside relevant_symbols and relevant_file_sections list items, so a reader scanning a tree of finding folders had no top-level signal for which file each finding lived under. Emit a canonical filename: at the top of metadata.yaml, populated from the first relevant_symbol (falling back to the first relevant_file_section, then empty). Reserve a subsystem: slot in the same block — the Finding schema has no subsystem field today, so it renders empty for now and a follow-up todo will derive it from filename via a path-prefix rule. Signed-off-by: Chris Mason <clm@meta.com>

persist_code_output rejected every absolute path outright, so a coding task asked to write $DIR/summary.md (where $DIR is an absolute folder named in the prompt) had no way to put the file there: emitting an absolute path got dropped, and emitting a relative path landed at <workspace>/summary.md instead of the bug folder. Operators were forced to either re-launch kres with --workspace pointed at the target dir or work around it with a bash followup. Match the rule edit_file already uses (kres-agents/src/tools.rs:resolve_workspace): accept an absolute path when it resolves under the workspace OR under a directory the operator named in a prompt this session — consent::grant_paths_from_text already canonicalises and stores those mentions. Keep rejecting any path that contains '..' regardless of rooting; that's how a malformed reply would try to slip past both gates. Signed-off-by: Chris Mason <clm@meta.com>

On a real TTY, kres stays open after \`--turns N\` exhausts or \`--turns 0\` hits goal-met / no-progress / no-goal-batch-finished, waiting for the next operator prompt. Batch-style invocations want the process to end at that point without dropping into the redirected-stdout escape hatch. Add a \`--one\` bool flag (default off) that ORs into the existing exit_on_idle gate alongside the \`!stdout.is_terminal()\` check, so the reaper takes the same exit + auto-summary path either way. Signed-off-by: Chris Mason <clm@meta.com>

Two issues showed up while wiring a triage prompt to write summary.md back into the per-finding export folder. First, the template lived as an operator-installed file at \`~/.kres/prompts/triage-template.md\`, which made distribution and discoverability worse than the existing \`review\`/\`summary\` slash commands. Second, prose in the template like "\`relevant_symbols\` / \`relevant_file_sections\`" tokenised under \`split_whitespace\` into a bare \`/\`, which \`grant_paths_from_text\` resolved to filesystem root and added to the consent store — defeating the consent gate that \`persist_code_output\` and \`edit_file\` rely on, since \`is_allowed\` then returned true for every absolute path. Embed \`triage-template.md\` as a fourth slash-command via \`user_commands::TABLE\`, mirroring \`review\`/\`summary\`/\`summary-markdown\` — overridable by dropping a file at \`~/.kres/commands/triage.md\`. Skip the file in \`setup.sh\` so the legacy \`~/.kres/prompts/\` location can no longer shadow it. In \`consent::looks_like_path\`, reject bare \`/\`, \`.\`, and \`..\` tokens that come from prose conjunctions ("\`yes\` / \`no\`", "foo / bar"); \`grant_from_mention\` separately refuses any path whose parent is None as belt-and-braces for future callers. Replace the two bare-\`/\` separators in the template prose so the embedded copy can never trigger this even if a future caller bypasses the new guard. Signed-off-by: Chris Mason <clm@meta.com>

An earlier in-flight edit converted the top of bug-summary.md from question phrasing ("Does this code corrupt memory?") to plain statements ("This code corrupts memory."), but the rest of the file still demanded question form: the hard-rule banner, the ask-short-questions paragraph, the AVOID/USE INSTEAD examples, the structure section's step 5/6, and the worked sample at the bottom all kept the prior style, so a model following the template would still emit questions and contradict the new framing. Finish the conversion through the rest of the file: rename the banner enumeration, drop "ask short questions" in favour of stating the bug plainly, rewrite the widget_claim/widget_destroy worked examples in declarative form, replace "a series of statements followed by a question" with the punchline framing the rule was reaching for, and swap step 5's "phrased as a question where possible" for a flat "concise plain statement of the bug" with step 6 updated to match. Signed-off-by: Chris Mason <clm@meta.com>

The plain-text bug-summary template moved from question phrasing to flat statements (commit f1e6653); the markdown variant still demanded question form in the same spots — banner enumeration, ask-short-questions paragraph, AVOID/USE INSTEAD examples, structure step 5/6, and the worked sample at the bottom. Without this follow-up, --summary-markdown would emit questions while --summary emits statements. Apply the same conversion to bug-summary-markdown.md, preserving the markdown-only bits (\`backtick\`-wrapped identifiers, \`\`\`c fenced code blocks). The user-preserved "don't say: Did you corrupt memory here?" negative example stays as the one remaining \`?\`. Signed-off-by: Chris Mason <clm@meta.com>

Signed-off-by: Chris Mason <clm@meta.com>

On a clean \`--turns N\` run the REPL was rendering summary.txt right before teardown via cmd_summary, on top of the deferred-list banner and the "exiting REPL" line. The render is unwanted noise during batch invocations: the operator runs /summary themselves before quitting, or \`kres --summary\` against the results dir afterwards. Remove the post-loop auto-summary block. The companion \`turns_exhausted\` and \`any_coding_task\` AtomicBool fields had no other readers, so drop the field definitions, the two constructor inits, the reaper-side store sites, the now-stale comments mentioning the gate, and the submit_prompt latch that fed any_coding_task. The exit_on_idle goal-met branch keeps cancelling root_shutdown — same exit, just no auto-render. Signed-off-by: Chris Mason <clm@meta.com>

The default session-id was \`chrono::Utc::now().format("%Y%m%dT%H%M%SZ")\` — seconds resolution. A bulk-launch wrapper (the in-progress triage-all.py runs 20 kres processes at a time) had every parallel process collide on the same timestamp, multiple kres instances all sharing the same ~/.kres/sessions/<ts>/ dir, which then races on session.json / findings.json / report.md / prompt.md and crashes the Rust side with exit 101. Append \`-<pid>\` so the session id becomes e.g. 20260425T180000Z-12345. Two concurrent kres processes get distinct dirs; the timestamp prefix still keeps \`ls -lt ~/.kres/sessions/\` ordered. Signed-off-by: Chris Mason <clm@meta.com>

INDEX.md showed Severity / Date / Status / ID / Title; the new triage flow fills in metadata.yaml's subsystem field per finding, but the index table didn't surface it, so an operator scanning INDEX.md still had to drill into each FINDING.md to see where the bug lived. Adding the column lets the same overview answer "which subsystems do my findings cluster in?" without leaving the index page. Read `subsystem:` via the existing top_level_scalar helper, store it as Option<String> on IndexRow with empty values folded to None, and render it between Severity and Date so the bug-attribute facets sit together. Empty / absent values render as the em-dash placeholder already used for missing dates. Signed-off-by: Chris Mason <clm@meta.com>

Several recent commits in this branch shipped 200+ char single-line paragraphs through the `-m` flag, ignoring the 72-char wrap rule documented in the project's commit-message style guide. A hook makes the requirement enforceable instead of advisory. The hook lives at `.githooks/commit-msg` (so it ships with the repo; opt-in via `git config core.hooksPath .githooks`). It scans each non-comment line of the message file with awk and exits 1 if any line exceeds 100 characters, printing the offending line number, length, and content. 100 is deliberately looser than the 72-char wrap rule so quoted code / shas / URLs that legitimately exceed 72 pass without a fight, while wall-of-text `-m` paragraphs do not. Signed-off-by: Chris Mason <clm@meta.com>

`--search` only handled implicit-AND lists of `key:value` clauses with exact-string matching. Operators couldn't pivot a 100+ finding tree by area without scanning INDEX.md by hand, couldn't OR clauses together, and couldn't approximate-match subsystem strings the LLM wrote freeform into metadata.yaml. Three changes: * `--generate` now also emits `INDEX-<subsystem>.md` per distinct non-empty subsystem value alongside INDEX.md / index.html. Rows with a blank subsystem stay in the umbrella INDEX.md only. Stale `INDEX-*.md` files (from a previous run whose subsystems no longer exist) are removed so a renamed subsystem doesn't leave a dangling file. The filename slug strips non-alphanumeric chars to `-`. * `--search QUERY` parses a boolean expression — `clause`, `clause -a clause` (explicit AND, same as adjacency), `clause -o clause` (OR), and `( … )` for grouping. Stdlib only: no expression-parser library fit the bundled-into-kres / no-pip constraint, so a ~70-line recursive-descent parser does the job. Parens must be whitespace-separated; a literal paren in a regex value goes through `[(]` / `[)]`. * Every clause value except `since:` is now a case-insensitive regex over the matching column (severity / subsystem / status / regex). `since:` keeps its YYYY-MM-DD date-bound semantics. `collect_rows` also picks up a flat-layout fallback: if the export dir has no `findings/` subtree, scan the top level. The row dict gains a `tag_path` field so the markdown / html row-link form emits the right relative path in either layout. Tested against ~/local/kernel-bugs/findings/ (166 rows, 79 distinct subsystem strings — heterogeneity is a metadata.yaml data-quality issue, not a tool bug). All boolean / regex / paren forms exercised; malformed expressions and bad regexes both exit 2 with a one-line diagnostic. Signed-off-by: Chris Mason <clm@meta.com>

Generating one INDEX-<subsystem>.md per distinct subsystem string sounded useful, but on a real tree it produced 79 files for 166 findings — the LLM writes subsystem strings freeform into metadata.yaml, so values like "AMD-CCP-PSP-SEV-SNP", "AMD-CCP-SEV-SNP", and "AMD-SEV-SNP-PSP-CCP-driver" each get their own file. Net effect: the export root went from clean to noisy, with no clear pivot benefit over `--search subsystem:<regex>`. Drop the auto-fanout. `--generate` writes INDEX.md and index.html, nothing else. Operators who want subsystem filtering use `--search subsystem:<regex>` from the shell, or the in-page filter dropdown in index.html. The `build_subsystem_indexes` and `sanitize_subsystem` helpers go with the auto-fanout — there's no remaining caller and no flag to invoke them manually. Signed-off-by: Chris Mason <clm@meta.com>

Once a finding has been triaged, summary.md is the document an operator wants to land on — it's the short triage write-up. FINDING.md is the long-form raw evidence; landing there from a spreadsheet-style index makes the operator scroll past code blocks and call chains they didn't ask for. `collect_rows` now sets `link_file` per row: `summary.md` when that file is present in the finding directory, `FINDING.md` otherwise. The markdown and html row-link emitters use that field, so a half-triaged tree gets a mix — newly-summarised findings link to the summary, untriaged ones still link to the underlying FINDING.md. Signed-off-by: Chris Mason <clm@meta.com>

`kres --export` and `--export-index` produce a tree of per-finding folders plus entry-point files (INDEX.md, index.html, findings-index.py). An operator landing in the directory has no in-tree pointer telling them what `metadata.yaml` / `FINDING.md` / `summary.md` mean or how to drive `findings-index.py --search`. Bundle a `scripts/export-README.md` into the kres binary and copy it into the export dir on first run, with the same don't-overwrite contract as `findings-index.py`: an operator-edited README survives re-runs of `--export` / `--export-index`. The README covers the directory layout, the meaning of the per-finding files, and the `--generate` / `--search` syntax (boolean expressions, regex clauses, examples, error behaviour). `install_export_readme` lives next to `run_index_script` in kres-repl/src/export.rs and is called from `run_export_index`, so both kres entry points install the README. Failures log to stderr but don't propagate — the README is documentation, not load-bearing for the export. New test `export_index_installs_readme_and_preserves_ local_edits` pins the bundled body on first install and the operator-edit survival on re-run. Bundled with three drive-by lint cleanups the pre-commit hook flagged in the same area: * rustfmt on an `assert!` block in kres-core/src/io.rs and a chained `Style::default()` call in kres-repl/src/tui.rs; * Rust 1.94's `clippy::question_mark` lint on `if dir.parent().is_none() { return None; }` in `ConsentStore::grant_from_mention` — replaced with the bare `dir.parent()?;` form. Path::parent returns None only at filesystem root, so the early-return semantics match exactly. Signed-off-by: Chris Mason <clm@meta.com>

The README's job is to orient an operator landing in an export tree. The single most useful link from there is to INDEX.md — that's the table of every finding, and what most readers will want before they read prose about file layout and search syntax. Add a one-line `→ [Browse the findings index](INDEX.md)` link between the heading and the first prose paragraph so a GitHub Pages render of the README puts the obvious entry point one click away. Signed-off-by: Chris Mason <clm@meta.com>

`INDEX.md` and `index.html` link rows to `summary.md` once a finding has been triaged. The summary itself, though, was a dead-end: a reader who wanted to drop into the long-form FINDING.md or check something in metadata.yaml had to leave the page and navigate the filesystem. Both files sit in the same directory, so a relative link costs nothing. Have the triage prompt emit a one-line cross-link header before the `# Subject:` heading: [FINDING.md](FINDING.md) | [metadata.yaml](metadata.yaml) The Rules section's "no heading above Subject" guidance is unchanged in spirit — only this verbatim cross-link line is allowed before `# Subject:`. Signed-off-by: Chris Mason <clm@meta.com>

Several wrapper scripts (triage-all.py, the bug-batch runners) want the same shape: read a file of inputs one per line, append each input to a command template, run them in parallel with a per-task timeout, and tear every subprocess group down cleanly on Ctrl-C. The pattern lives in ~/local/src/review-prompts/kernel/scripts/claude_xargs.py — same control flow, but with claude-specific naming (cmd template called "claude command", inputs called "SHAs", `--series` / `--append` flags for prompt construction) baked in. Add a generic equivalent at scripts/xargs-timeout.py. Same parallel ThreadPoolExecutor + SIGINT/SIGTERM teardown + SIGTERM → SIGKILL escalation, with the claude knowledge stripped: `--command` / `--input-file` / `--parallel` / `--timeout` / `--verbose`. Blank lines and `#` comment lines in the input file are skipped. Each line is appended to the template and the result runs through `/bin/sh -c`, so shell metacharacters work the same way claude_xargs treated them. Smoke-tested with both the success path (3-input echo list) and the timeout-escalation path (sleep 0.1 vs sleep 5 with --timeout 1). Signed-off-by: Chris Mason <clm@meta.com>

`--search QUERY` is great for one-shot pivots, but the operator- useful subsets (`severity:high`, `subsystem:work`, `regex:uaf -a since:...`) get re-run every triage cycle. Letting `--generate` materialise them into named markdown files keeps the saved set in the export tree and means a GitHub Pages publish picks them up. `--generate` now also reads `<export-dir>/index-config.yaml`, when present, and writes one markdown file per `{file, query}` entry. Format is a list of two-key mappings: - file: INDEX-high.md query: severity:high - file: INDEX-recent-uaf.md query: regex:uaf -a since:2026-04-01 Stdlib has no YAML reader and the script can't take a pip dependency (it bundles into the kres binary and copies into export dirs), so the parser is a ~30-line hand-roll for the limited shape. `query` reuses the existing `parse_query` so any expression that works under `--search` works in the config. Filenames are validated as plain basenames inside the export dir — `..`, `/`, absolute paths, and leading `.` are refused so a typo can't write outside the tree. A bad regex or malformed query reports a one-line stderr diagnostic, skips that file, and continues. The exit code is 1 if any entry errored. The umbrella INDEX.md and index.html are always written; the config-driven files land on top. Absent config → silent skip. Documented in scripts/export-README.md alongside the existing `--search` syntax. Verified: absent config (umbrella only), valid 3-entry config (row counts match), malformed query (skip + exit 1), and suspicious filenames (refused — no files outside the export dir). Signed-off-by: Chris Mason <clm@meta.com>

The five existing query keys (`severity` / `subsystem` / `status` / `since` / `regex`) match against the row's table columns, but operators triaging a tree by area routinely want to pivot on the file or function the finding cites. `regex:` over the joined row text doesn't catch those — file paths and symbol names live in metadata.yaml's `relevant_symbols` / `relevant_file_sections` lists, not in any column the table renders. Add two new clauses with the same case-insensitive-regex semantics the others have: * `file:<regex>` matches any filename in the metadata.yaml — the primary `filename:` plus every `relevant_symbols` / `relevant_file_sections` entry. Same set FINDING.md cites in its `Relevant symbols` / `Relevant file sections` blocks. * `function:<regex>` matches any symbol name listed under `relevant_symbols:` (functions, macros, types). Same set FINDING.md cites in its `Relevant symbols` block. `collect_rows` extracts both at parse time via `all_filenames()` and `all_function_names()` and stashes them on the row dict so the matchers don't re-walk the YAML per row. The function-name extractor runs a tiny state machine that only collects `name:` lines while inside the `relevant_symbols:` block — kres-emitted metadata.yaml has no other top-level `name:` key, so a fresh-top-level-key check reliably exits the block. Documentation: * Bundled `scripts/export-README.md` gets the two new keys in the search-clause table, four worked `--search` examples (anchored function match, prefix file match, combined `file: -a severity:`, `function: -a severity:`), and two new entries in the saved- index `index-config.yaml` sample (`INDEX-net-ipv4.md` driven by `file:^net/ipv4/`, `INDEX-pwq-symbols.md` driven by `function:^pwq_`). * Top-of-script docstring lists the new clauses alongside the others so `--help` / `head` of the file is current. Verified against ~/local/kernel-bugs/: * `function:^acpi_os_execute$` — exact-match anchor finds the single OSL_DEBUGGER finding citing that symbol. * `function:put_unbound_pool` — substring matches 3 findings. * `file:workqueue.c -a severity:high` — combo finds 2 high workqueue findings. * Malformed regex (`function:[`) exits 2 with the regex error. Signed-off-by: Chris Mason <clm@meta.com>

…todo agent Observed symptom — every fast/slow call returns 400 immediately: API status 400: max_tokens: 164000 > 128000, which is the maximum allowed number of output tokens for claude-sonnet-4-6 400 isn't in is_retryable_status, so the call returns Err right away and the task lands in Errored with empty analysis. The reaper then feeds that empty analysis to the todo agent, which reads "no analysis" as "task didn't run" and re-queues the same item forever with reason "Prior execution returned empty analysis; must re-run." Session 6f3f0daf-… spun like that for 50 min — 269 failed slow calls, 0 successes, output dir untouched, tasks #42 through #625 reshuffling the same 11 todo items. Add a consecutive-error watchdog to the reaper: after K=3 reaped Errored tasks in a row, print a loud banner with the last error verbatim, drain pending/blocked todos to /followup so /continue can resume after the operator fixes the root cause, and (under --one) cancel the root shutdown. Reset on any Done — a single success means the pipeline is alive. Also pass [task errored: <err>] (instead of "") to the todo agent when r.state == Errored, so even before the watchdog trips the agent has something concrete to react to instead of treating the empty string as "nothing happened, queue it again." Signed-off-by: Breno Leitao <leitao@debian.org>

masoncl added 30 commits April 21, 2026 16:27

README.md: add a NEWS section

e667878

Signed-off-by: Chris Mason <clm@meta.com>

README.md: news update

8422418

Signed-off-by: Chris Mason <clm@meta.com>

masoncl and others added 29 commits April 23, 2026 18:18

Merge remote-tracking branch 'origin/pr/1'

34c0bce

Improve error reporting for network errors

review-template.md: Adjust focus on the target area

16923a0

The focus was too narrow, missing some bugs. Signed-off-by: Chris Mason <clm@meta.com>

templates: cleanup language and formatting

7d15274

Signed-off-by: Chris Mason <clm@meta.com>

masoncl force-pushed the main branch from 3d6ef28 to 118504c Compare May 11, 2026 19:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kres-repl: halt on N consecutive task errors and surface them to the …#3

kres-repl: halt on N consecutive task errors and surface them to the …#3
leitao wants to merge 77 commits into
masoncl:mainfrom
leitao:main

leitao commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

leitao commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants