kres-repl: halt on N consecutive task errors and surface them to the …#3
Open
leitao wants to merge 77 commits into
Open
kres-repl: halt on N consecutive task errors and surface them to the …#3leitao wants to merge 77 commits into
leitao wants to merge 77 commits into
Conversation
~/.kres/settings.json configures the slow-agent model per user, but the banner has been reporting claude-sonnet-4-6 on every run even when settings.json selects a different model. The override fires silently because --slow had a clap default_value of "sonnet". The override block in run_repl unconditionally fed args.slow through slow_tag_to_model_id(), mapping the unintentional default to claude-sonnet-4-6 and clobbering whatever was in settings.models.slow. Change ReplArgs.slow to Option<String>. The slow-agent config-file resolver still defaults to "sonnet" so ~/.kres/slow-code-agent-sonnet.json is found when --slow isn't passed; the model-id override only fires when args.slow is Some(_). Signed-off-by: Chris Mason <clm@meta.com>
/edit launches $EDITOR (vim by default) on a scratch file. On this
host Esc produced on-screen garbage instead of reaching vim, making
the editor unusable.
kres installs a DECSTBM scroll region at REPL startup
(kres-repl/src/status.rs:50, ESC[1;{bottom}r reserves rows H-1 and
H for the status line + prompt). cmd_edit spawns the editor without
resetting that region first, so the editor takes over a terminal
whose bottom two rows are excluded from scroll. Its cursor math
and input decoding drift against that constraint; Esc and other
key sequences echo as visible escape-sequence text instead of
reaching the editor.
Call crate::status::restore() before the spawn_blocking that runs
the editor and crate::status::install() afterwards. The editor
now starts with a full-height scroll region (same shape it would
see if launched from the operator's shell directly), and kres's
status row is re-established on return so the REPL prompt and
background status poller keep working.
Signed-off-by: Chris Mason <clm@meta.com>
Even with the scroll region reset around the editor spawn, the in-editor cursor drifts after pressing Esc: the repaint shows up elsewhere and the caret lands in the wrong column. The status-row paint task (Self::run, the tokio::spawn at kres-repl/src/session.rs:465) ticks every 200ms. On each tick it writes status::paint output to stderr, which absolute-positions the cursor to row H-1, clears the line, writes text, and restores the cursor. stderr is the editor's output fd too, so vim's frame gets scribbled through, and the save-and-restore cursor dance drags the visible caret around independently of vim's own cursor state. The size-check branch is worse — it can re-install the scroll region behind the child's back. Add a Session::status_paused AtomicBool. cmd_edit sets it before handing the terminal off to the editor and clears it on return. The paint loop consults the flag at the top of each tick and continues without doing any work (including the size-check + install call) while the child owns the terminal. Signed-off-by: Chris Mason <clm@meta.com>
Even with the status painter paused and the scroll region reset
around the editor spawn, the "> " prompt keeps redrawing on top of
vim's frame while the editor is open.
read_stdin (kres-repl/src/session.rs) runs on a spawn_blocking
thread in a tight `loop { readline("> ") ... tx.send(line) }`. As
soon as the line that typed "/edit" is handed to the main task via
tx.send, the reader loop iterates and calls readline again — which
paints "> " at the current cursor row — before the main task has
even dispatched Command::Edit. Any subsequent completion / history /
size-report repaint from rustyline lands on top of vim too.
Add a Session::editor_lock tokio Mutex<()>. The reader loop takes
it via blocking_lock() on each iteration and holds it across
readline(); cmd_edit takes it with .lock().await before handing
the terminal to the editor and drops it on return. That way the
reader thread blocks inside blocking_lock() for the whole editor
lifetime and never calls readline while the child owns the tty.
The rustyline-init-failed fallback (read_stdin_plain) doesn't
print a prompt, so it stays unchanged.
Signed-off-by: Chris Mason <clm@meta.com>
Previous attempt (b2c3476) put a Mutex around rustyline::readline: cmd_edit grabbed it before spawning vim, the reader thread held it across readline(). That deadlocked the operator — after typing "/edit<Enter>" the reader re-acquired the mutex on the next iteration while cmd_edit was still awaiting its lock(), so the editor wouldn't launch until the operator hit Enter a second time to let the reader's in-flight readline() return. Replace the mutex with an ack channel. The reader now waits for an explicit ack from the main loop before printing each "> " prompt (after the first). The main loop sends the ack at the bottom of every iteration that consumed an rx.recv(); Quit skips the ack because it's about to tear the reader down anyway, and the auto-continue timer path doesn't ack because it wasn't servicing a user line. Effect: between tx.send("/edit") and the main loop finishing cmd_edit, the reader is parked in ack_rx.blocking_recv() and readline is not running — so rustyline can't paint "> " on top of the editor's frame. A single Enter fires /edit as intended. Signed-off-by: Chris Mason <clm@meta.com>
/stop cancels in-flight tasks and drains the todo list, but the reaper was still running its per-reap pipeline against the cancelled tasks: findings merger, goal check, goal-not-met missing-item injection, and the post-task todo-agent update. Every one of those is an API call, and the missing-item injection writes into the todo list the operator just emptied — so seconds after typing /stop the queue is back full of new work, the auto-continue latch is the only thing stopping the REPL from redispatching, and the operator sees "it's still going". Gate the reaper's post-append work on stop_latched. After the task's analysis is written to accumulated and report.md (and any coding-mode files are persisted under <workspace>/code/), the reaper checks the latch and `continue`s past the merger / goal / todo-agent block. A reap that was already in an inference call finishes that call before returning, but no new calls start, and no new items land in the todo queue. The --turns 0 stop block and all earlier parts of the reap iteration (report append, code_output persist) run as before so legitimate history from tasks that completed before /stop is still captured. Signed-off-by: Chris Mason <clm@meta.com>
Session a85dbc41 (2026-04-21): operator asked "read report.md,
identify the first bug reported, code a fix". The pipeline loaded
drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c as a 13 KB inline
symbol and the slow-coding agent emitted code_output[0] = {path:
"fix-bnxt-xdp-redirect-frag-leak.patch", content: "From: … diff
--git a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c … @@ -297,6
+297,7 @@ …"}. The artifact sat next to the unchanged source; the
operator had to manually verify every line number and hunk context
against the real tree before it was usable.
Add a FIXES AND PATCHES block to the slow-coding system prompt:
- fix tasks (the word "fix", "patch", "apply", etc. in the task)
require the verbatim current file contents in symbols/context
before any output is produced. If the excerpt the agent was
given doesn't cover the exact line being changed, the agent
must request a `read` followup and WAIT for the next turn
rather than reconstruct from memory.
- code_output is never allowed to be a standalone .patch / .diff
file. The repair goes either through an edit-tool followup
(when available) or as code_output whose `path` IS the file
being fixed and whose `content` is the full post-fix body
copied from the input.
- Line numbers and surrounding context in the output must match
the file on disk exactly; reconstruction from a summary is
explicitly called out as a bug, with the session id as the
anchor.
Signed-off-by: Chris Mason <clm@meta.com>
Coding-mode fixes have had no way to land surgically on a file.
code_output writes (or rewrites) a whole file, which is fine for
a fresh reproducer but a bad fit for a one-line fix: the model
has to re-emit the entire target file from memory, burning tokens
and opening the door to transcription errors. Session a85dbc41
(2026-04-21) fell off this edge and produced a .patch file with
hand-reconstructed hunk context.
Add an `edit` tool that matches Claude Code's Edit primitive:
{type: "edit", file_path, old_string, new_string, replace_all}
old_string is looked up literally in the current file contents
and must appear exactly once unless replace_all is true. The
file is rewritten via tmp + rename for crash safety. Field names
match Claude Code (file_path, old_string, new_string, replace_all)
on purpose — models already know that shape, and the `old_string`
anchor forces them to quote bytes from the real file rather than
reconstruct them. `path` and `file` are accepted aliases for
`file_path`.
Plumbing:
- kres-agents/src/tools.rs: EditArgs + edit_file. Rejects empty
old_string, old_string == new_string, zero or ambiguous matches,
and workspace-escape paths. Returns a short preview of 5 lines
centred on the first replacement so the fast agent has cheap
evidence.
- kres-agents/src/main_agent.rs: `"edit" =>` branch in
dispatch_non_mcp and action_label.
- kres-core mode.rs: CodeEdit lives here next to CodeFile so
TaskOutcome and ReapedTask can carry it without depending on
kres-agents. kres-agents re-exports it under the old name.
- kres-agents response.rs + pipeline.rs + reaper: code_edits
flows from the slow-coding reply through TaskSummary and
TaskOutcome; the reaper calls a new apply_code_edits that
loops edit_file over each entry, logging per-edit results.
- completed_run_count now counts a coding task as produced when
code_edits is non-empty even if code_output and analysis are
empty.
Prompts:
- main-agent.system.md documents the "edit" action.
- slow-code-agent-coding.system.md FIXES-AND-PATCHES now points
at code_edits as the primary surgical-fix channel and keeps
code_output full-file rewrite as a fallback.
Tests: six new tokio tests cover unique-match / missing /
ambiguous / replace-all / traversal / empty-and-identity. Full
workspace test run is clean across all 11 bins.
Signed-off-by: Chris Mason <clm@meta.com>
The read tool already supports line/end_line/count (aliases startLine/endLine; see kres-agents/src/tools.rs ReadArgs), but the main-agent system prompt only listed `"read" → read` — no shape, no example. The model reached for `bash sed -n '224,330p' file` to read a range instead, which is slower, races against bash's 60s timeout, and produces shell-quoted output. Expand the `"read"` entry with a worked example and the full alias set (`path`/`file`, `startLine`/`line`, `endLine`/`end_line`, `count` as an alternative to `end_line`), and tell the model to prefer it over `bash sed`. Signed-off-by: Chris Mason <clm@meta.com>
The prompt-text path scanner was eating the leading `.` from
relative-path tokens, because strip_token_punctuation trimmed `.`
symmetrically from both ends. A prompt like
read ../linux.net/kres-tues/report.md, write a fix ...
became, after trim: `/linux.net/kres-tues/report.md` — both leading
dots gone. That path doesn't exist, resolve_candidate's
std::fs::metadata probe failed, no grant was added, and the
subsequent `read` tool call got the bare "escapes workspace" error
with no hint that the operator HAD in fact named the directory.
Trim asymmetrically. Leading characters are limited to chars that
can never legitimately start a path (quote-likes, opening brackets);
`.`, `,`, `:`, `;`, `!`, `?` are stripped only from the right end
where sentence punctuation actually lives. `./foo` and `../foo`
survive intact.
Regression-tested by a scanner test that builds a sibling dir under
tmp and feeds the scanner `"read ../sibling/report.md, write a
fix"`; the old code failed to grant, the new code grants the
sibling directory.
Signed-off-by: Chris Mason <clm@meta.com>
Two small tool-dispatch gaps, both of which burned turns in real
sessions:
1) The `find` dispatch only read `name` as the -name glob. Session
9fee284e (2026-04-21 17:11) emitted
`{"type":"find","pattern":"report.md"}` — `pattern` is how `grep`
names this argument and is the word the model naturally reaches
for. The dispatcher silently dropped it, `args.name` came back
None, and find(1) ran with no filter and dumped the entire
workspace tree. The model then fell back to `bash find`.
2) The `read` dispatch only read `endLine` (camelCase). The
main-agent system prompt advertises `end_line` as the canonical
name with `endLine` as an alias — exactly backwards from what the
dispatcher accepted. The snake_case form went straight through to
None.
Fix by adding the obvious aliases in main_agent.rs' dispatch_non_mcp:
`pattern` and `glob` as aliases for `find`'s `name`, and `end_line`
as the primary lookup for `read`'s end line (falling back to
`endLine`). No API surface change to the underlying tools — just
the JSON field lookup in the action-parsing layer.
Prompt: document find's arg shape (name / path / kind) and warn
that a missing `name` dumps the whole tree; document git's shape
(`command` as one string); mention grep's `glob` and `limit`. Same
approach that was just applied to `read` — every in-tree tool now
has a worked example in the action list rather than relying on the
model to guess the field names from the followup-schema word.
Tests: two new tokio tests in main_agent::tests exercise the two
dispatch paths directly — find with `pattern` alias must produce a
filtered result, read with `end_line` must clip at the requested
line. Workspace test run is clean.
Signed-off-by: Chris Mason <clm@meta.com>
apply_code_edits landed edits (or failed to) and logged one line
per result to stderr, then threw the outcomes away. If the slow
agent emitted an edit whose `old_string` was stale — either
reconstructed from memory or invalidated by an earlier edit in the
same batch — the failure died on stderr. The next slow-agent turn
read the accumulated ledger and report.md and saw NOTHING about
the failure, so it kept re-emitting the same broken anchor.
Thread the results back.
- apply_code_edits now returns Vec<AppliedEdit> carrying per-file
Ok(preview)/Err(message). stderr logs are unchanged for the
operator; the structured vec is what the reaper uses.
- format_applied_edits_trailer renders the vec into a short markdown
section ("Edits applied (N/M, K FAILED):") with one bullet per
edit. Failed entries are prefixed "[FAILED]" with the verbatim
error text — same anchor text the slow agent needs to re-emit a
corrected edit.
- The reaper now runs persist_code_output and apply_code_edits
BEFORE building effective_analysis, so the trailer can be folded
in before the analysis lands in last_analysis, the accumulated
ledger, and report.md. Follow-on tasks and /summary now see the
same truth the operator sees on stderr.
- The old post-analysis persist/apply pair is removed — duplication
would double-write files.
Multi-edit ordering, which was previously undocumented:
- slow-code-agent-coding.system.md now states that code_edits apply
in emission order against the file state AFTER prior edits in the
same batch have landed, and that failures are NOT retried — they
surface in the analysis trailer under [FAILED] so the next turn
can re-emit. Guidance: prefer one edit per file per turn unless
anchors are known not to collide.
Tests: three unit tests cover the trailer formatter (failure
markers, empty input, all-success no-FAILED-marker). Full workspace
test run is clean.
Signed-off-by: Chris Mason <clm@meta.com>
The main-agent read dispatch was returning a header-only string
to the turn log:
Read drivers/net/.../bnxt_xdp.c:270-371 (2587 chars),
symbol 'bnxt_xdp.c:270-371'
…with zero bytes of the actual file content. The bytes were
attached to the symbol pool that forwards to the SLOW agent, but
the MAIN agent — the one choosing its next action — saw only the
ack. When it wanted lines 270-370 of a file to reason about the
XDP redirect path, it called `read` with `end_line` (worked),
got back a size header with no content, tried again with `count`
(worked), got the same header with no content, then gave up and
ran `bash sed -n '270,370p' ...` which DOES put the bytes in
[stdout]. Session 04365bed (2026-04-21, turns 1-3) is the anchor.
The prompt already says "prefer read over bash sed"; the tool
was the one failing to hold up its end.
Fix: append the file body to the turn-log text after the header.
Truncate at TOOL_OUTPUT_CAP_GREP_FIND (20k chars) — the same
envelope grep and find use — so a whole-file read with no range
can't blow the turn budget. The symbol attachment is unchanged,
so the slow agent still receives the full content via its symbol
pool and the dedupe cache continues to function.
Test: `read_text_result_contains_file_body` reads lines 3-5 of a
tmp file via dispatch_non_mcp and asserts both "line 3" and
"line 5" appear in the returned text alongside the existing
header. Full workspace test run is clean.
Signed-off-by: Chris Mason <clm@meta.com>
…ault Operators report the bash tool being reached for as a general escape hatch for things the typed tools already cover — `bash sed` for a range read, `bash find` for a filename locate. Even after the prompt was fixed to say "prefer read over bash sed" and the read tool was fixed to actually return file bytes (commit 2c56099, 2026-04-21), nothing stops a model that learns bash from training memory: a `{"type":"bash","command":"..."}` action would hit the unconditional `"bash" =>` branch of dispatch_non_mcp. Add a real gate. Settings schema (kres-repl/src/settings.rs) ------------------------------------------- - New `actions: { allowed: Option<Vec<String>> }` block in settings.json. - Semantics of `actions.allowed`: * `null` or key absent → falls back to DEFAULT_ALLOWED_ACTIONS (grep/find/read/git/edit — bash excluded). * `[]` → empty is the EXPLICIT "deny every non-MCP action" signal; the dispatcher enforces it (no collapse-to-defaults). * `["read", ...]` → exact set, no defaults merged in. - `load_merged` layers `<cwd>/.kres/settings.json` over `~/.kres/settings.json`. Project values replace global field-by-field; allowlists don't union — more specific wins. The split helpers `load_merged_with_paths` and `apply_project_overrides` exist so the merge is testable without mocking the operator's real home directory. - `effective_allowed_actions(cli_extras)` returns the final set. CLI `--allow` tokens are additive on top of whatever the settings resolved to. Unknown tokens (not in KNOWN_ACTION_TYPES and not the `"all"` escape hatch) are DROPPED — a dead entry in the allowlist serves no purpose and masks the typo. The companion `warn_unknown_action_tokens` surfaces them with a closest-match suggestion (Levenshtein distance ≤ 2). - KNOWN_ACTION_TYPES lists every action the main agent might emit: grep/find/read/git/edit/bash PLUS mcp. "mcp" is a no-op in the allowlist (MCP is gated by mcp.json server registration), included only to keep the typo-warner from false-positive-flagging `--allow mcp`. CLI (kres/src/main.rs) ---------------------- - New `--allow ACTION` flag. Repeatable (`--allow bash --allow git`) or comma-separated (`--allow bash,git`). Adds to whatever settings.json resolved to. Special token `--allow all` expands to every action type the dispatcher knows (including bash) — the one-off escape hatch. - kres loads merged settings via `Settings::load_merged(workspace)` at startup, calls `warn_unknown_action_tokens` before computing the set (so the operator sees a typo warning BEFORE the flag silently no-ops), then builds the Arc<BTreeSet<String>> shared into MainAgent. - Startup banner prints the resolved allowlist and distinguishes "bash disabled by default" from "bash disabled by explicit allowlist in settings.json" — both are valid operator choices; the wording respects each. The banner is gated on a main-agent config being resolved, so --summary and other non-LLM modes don't see noise about an allowlist that never runs. Dispatch gate (kres-agents/src/main_agent.rs) --------------------------------------------- - MainAgent carries `allowed_actions: Arc<BTreeSet<String>>`. - dispatch_non_mcp rejects any action whose `type` is not in the set with an `[error] action type 'X' is not in the allowed-action list for this session (list). To enable it, add 'X' to actions.allowed in ~/.kres/settings.json (or <cwd>/.kres/settings.json) or re-run kres with '--allow X'.` string that names the allowed set and the fix paths. - An empty set is an explicit deny-all (error message surfaces "none — every non-MCP action is denied this session") — NOT a fall-back-to-allow-everything. Tests pass explicit sets for the tools they exercise. - Malformed actions (missing `type` field) bypass the gate so they hit the existing `unknown action type: ?` error — a malformed action is not a gated action and deserves a clearer error. Prompt (configs/prompts/main-agent.system.md) --------------------------------------------- - The `bash` entry documents that bash is OFF by default and shows the rejection shape so the model doesn't re-emit the same bash action hoping it lands. Pointer to the typed alternatives (read/grep/find/git). Template (configs/settings.json) -------------------------------- - The installed template gains `_actions_doc` and `_actions_example` fields so an operator who opens ~/.kres/settings.json can see the schema and a commented-out example. serde_json ignores unknown keys so behaviour is unchanged. Tests ----- kres-repl/src/settings.rs (11 new): - default_allowlist_excludes_bash - cli_allow_adds_bash - cli_allow_all_adds_everything (asserts every DEFAULT_ALLOWED_ACTIONS entry + bash so a future list shrink fails here, not silently) - settings_allowlist_replaces_default - project_only_allowlist_replaces_defaults - load_merged_project_overrides_global (real merge, global + per- project files written and layered) - load_merged_global_only_when_no_project - explicit_empty_allowlist_stays_empty - warn_unknown_action_tokens_flags_typos - warn_unknown_action_tokens_silent_for_clean_input - closest_known_action_suggests_within_edit_distance_two kres-agents/src/main_agent.rs (3 new): - dispatch_rejects_action_not_in_allowlist (gate returns informative error with --allow / settings.json hints) - empty_allowlist_denies_all_actions (pins the corrected contract after the first review) - malformed_action_reports_unknown_not_gated (no-`type` field action hits "unknown action type" catch-all, NOT the allowlist error) kres/src/main.rs (2 new): - allow_flag_accepts_comma_separated (value_delimiter = ',' + repeatable both work) - allow_flag_defaults_to_empty Full workspace test run is clean: 14 kres-repl, 140 kres-agents, 51 kres unit tests pass. Signed-off-by: Chris Mason <clm@meta.com>
Two major user-visible features landed since the README was last written and aren't documented anywhere operators would find them. - **Coding mode** (commits 68fc5a9, d5fa179, 8163c85) — a prompt like "write a reproducer for X" or "fix the bug in Y" is classified by the goal agent as coding mode instead of analysis. The slow agent emits a single call whose output is source code via two channels: `code_output` (full files written under `<workspace>/code/<path>`) and `code_edits` (surgical Edit- primitive-style `{file_path, old_string, new_string, replace_all}` records applied in-place atomically). Failed edits fold into the analysis trailer under `[FAILED]` so the next turn can correct them. Coding mode can also emit `bash` followups for compile/run verification — mentioning the `--allow bash` requirement makes the dependency explicit. - **Action allowlist** (commit 86aca4f) — the main agent's non- MCP tools are gated behind a session-wide allowlist. Document the three precedence levels (CLI `--allow`, per-project `<cwd>/.kres/settings.json`, global `~/.kres/settings.json`), the defaults (grep/find/read/git/edit, bash OFF), the explicit empty-list "lock it down" signal, and the typo-detection behaviour. Include worked-example JSON blocks for both the permanent-enable and the tight-lockdown configs. CLI synopsis block at the bottom gains `--allow ACTION`. Signed-off-by: Chris Mason <clm@meta.com>
Signed-off-by: Chris Mason <clm@meta.com>
…prompt
Builds on 3e9c3b4 (embed agent *.system.md files). That commit
still read operator overrides from ~/.kres/prompts/<basename>, so
a fresh install after an upgrade would keep reading stale files
setup.sh wrote on the prior install — the exact shadow-by-stale-
file problem the embed was meant to eliminate. This commit:
1. Moves the override directory to ~/.kres/system-prompts/ so
leftover ~/.kres/prompts/*.system.md files from older installs
are never consulted.
2. Extends the embed + override treatment to the bug-summary
templates so /summary and `kres --summary` get the same
"rebuild refreshes, optional override under system-prompts/"
behaviour the agent prompts have.
The split is consistent: `~/.kres/system-prompts/` holds
operator overrides for compiled-in LLM system prompts (agent
roles + bug-summary). `~/.kres/prompts/` continues to hold
user-authored prompt TEMPLATES — review-template.md and any
`<word>-template.md` — which have nothing to do with LLM system
prompts.
Code changes
------------
- configs/*.json: the five shipped agent configs now reference
`system_file: "system-prompts/<name>.system.md"`. Path is
resolved relative to the config file's directory, so at
runtime it becomes ~/.kres/system-prompts/<name>.system.md.
- kres-agents/src/embedded_prompts.rs: the TABLE gains two
entries — `bug-summary.md` and `bug-summary-markdown.md` —
alongside the six existing agent prompts. New unit test
`bug_summary_templates_are_present` pins both keys. Module
docstring rewritten to make clear the module covers every
LLM system prompt (not just agent prompts) and to document
that only the review/`<word>`-template files stay on disk.
- kres-repl/src/summary.rs: the previous direct
`include_str!` for BUG_SUMMARY_TEMPLATE and
BUG_SUMMARY_MARKDOWN_TEMPLATE is replaced by
`bug_summary_template()` / `bug_summary_markdown_template()`
functions that read from `kres_agents::embedded_prompts`
(unwrap-expect because the table's test guarantees the
keys). `default_template_path()` and
`default_markdown_template_path()` now point at
~/.kres/system-prompts/ instead of ~/.kres/prompts/.
The module and SummaryInputs docstrings updated to match.
- kres-repl/src/session.rs: load_prompt_disk_then_embedded (the
loader used for coding/generic slow-mode prompts) joins
~/.kres/system-prompts/<basename>. ReplConfig.template_path
and the REPL's /summary helper both reference the new path.
- setup.sh: prompts-install step now skips both `*.system.md`
AND `bug-summary{,-markdown}.md` — everything with an
embedded copy. Comment block spells out the split between
embedded LLM system prompts and disk-based prompt-templates,
and explains the leftover-file rationale for the new
directory name.
- kres/src/main.rs: --template docstring mentions the
~/.kres/system-prompts/bug-summary.md override path and the
embedded fallback.
Docs
----
- README NEWS entry notes the move and the leftover-file-ignored
behaviour for both `*.system.md` and `bug-summary*.md`.
- README "System prompts" section: load order now covers both
prompt classes (agent configs and /summary), lists the
embedded basenames explicitly, and explains that stale files
under ~/.kres/prompts/ are safe to delete.
- README "Summary output" bullet updated — bug-summary is no
longer installed under ~/.kres/prompts/.
- CLAUDE.md config-directory table renamed the override entry
to `system-prompts/*.system.md` with a clearer description
that the directory is empty by default.
Tests
-----
- Existing config.rs tests continue to build tmp configs with
explicit sibling paths and do not depend on the directory-
name rule.
- New embedded_prompts test
(`bug_summary_templates_are_present`) pins that the new
entries land in the shared table.
- Full workspace test run clean: 14 / 146 / 58 / 63 / 16 / 51
unit tests across the six crates.
Signed-off-by: Chris Mason <clm@meta.com>
…summary-markdown)
Turns the review template into a first-class slash command and
introduces `~/.kres/commands/` as the override surface for every
slash-command template. Three shipped commands — `review`,
`summary`, `summary-markdown` — are embedded in the kres binary
and can be replaced or added to by dropping a file at
`~/.kres/commands/<name>.md`.
The review template was previously reached only via
`--prompt "review: target"` through a path-based lookup under
`~/.kres/prompts/review-template.md`. That path is kept as a
back-compat fallback for operators with custom
`<word>-template.md` files, but the primary lookup is now
`~/.kres/commands/<name>.md` → embedded body.
CLI: `--prompt "word: extra"` and `--prompt "/word extra"`
-----------------------------------------------------------
`resolve_prompt_arg` now recognises both forms and resolves the
command name through the new `user_commands::lookup`. Both
examples are equivalent and produce the same composed prompt:
kres --prompt 'review: fs/btrfs/ctree.c'
kres --prompt '/review fs/btrfs/ctree.c'
The splitter accepts `alphanum / - / _` for the command word;
anything else falls through to inline-prompt. The `/word`
variant uses the first whitespace run as the separator; the
`word:` variant uses the first colon.
New module: kres-agents::user_commands
---------------------------------------
- TABLE keyed by command name (not filename) populated via
include_str! for review-template.md, bug-summary.md, and
bug-summary-markdown.md.
- `lookup(name) -> Option<String>` checks
`~/.kres/commands/<name>.md` on disk, falls back to the
embedded table, else None.
- Four unit tests: non-empty body for every entry, all three
expected names present, unknown-name returns None, review
body contains the `[investigate]` lens markers (catches a
silent include_str! miswire).
summary.rs now reads through user_commands
------------------------------------------
- `bug_summary_template()` and `bug_summary_markdown_template()`
now return `String` and go through `user_commands::lookup`
instead of `embedded_prompts::lookup`. Panic-message updated.
- `default_template_path()` and `default_markdown_template_path()`
return paths under `~/.kres/commands/` (was
`~/.kres/system-prompts/` from the previous commit — the
slash-command-templates override surface is distinct from the
agent-system-prompts one).
- Module docstring rewritten to describe the
commands/summary[.md,-markdown.md] resolution path.
- The run_summary caller now types `fallback_text` as `String`
to match the new return type of the template fns.
embedded_prompts table trimmed
------------------------------
`bug-summary.md` and `bug-summary-markdown.md` are removed from
`kres-agents::embedded_prompts`; user_commands owns them now.
The test that pinned their presence is dropped. embedded_prompts
is back to its original scope (agent *.system.md only), which
the module docstring now states.
setup.sh
--------
The prompts-install loop now skips `review-template.md` on top of
the existing skips for `*.system.md` / `bug-summary{,-markdown}.md`.
Comment rewritten to describe BOTH override directories
(`system-prompts/` for agent prompts, `commands/` for slash
commands) and the `<word>-template.md` back-compat lane.
Dep: `dirs` added to kres-agents' runtime deps so user_commands
can resolve $HOME. Matches the workspace pin.
Docs
----
- README NEWS entry rewritten for the `~/.kres/commands/` path
and the two override directories.
- README `--prompt 'review: ...'` section rewritten to show both
invocation forms as equivalent and explain the
user_commands::lookup resolution order. New section
`## Slash-command templates` added under "System prompts"
with an invocation table and a list of the three shipped
commands.
- README "Parallel lenses" section updated: customising the
review template now means dropping `~/.kres/commands/review.md`,
not editing `~/.kres/prompts/review-template.md`.
- README "Summary output" bullet updated to point at
`~/.kres/commands/summary.md` as the override.
- CLAUDE.md config-directory table gains a `commands/<name>.md`
entry alongside the existing `system-prompts/` row.
Full workspace test run is clean: 14 / 146 / 58 / 63 / 16 / 51
unit tests (+ new user_commands tests already counted in the
kres-agents bucket).
Signed-off-by: Chris Mason <clm@meta.com>
All issues identified in the follow-up review of 6a7ba38 addressed in one commit — behaviour fixes, test coverage, and doc sweeps. Behavior fixes -------------- - `/review <target>` and `/summary-markdown [FILE]` REPL commands added. The original 6a7ba38 landed the CLI equivalence (`--prompt "/review ..."` == `--prompt "review: ..."`) but left the REPL slash-command half of the user's ask unimplemented; typing `/review fs/btrfs/ctree.c` at the prompt hit `Command::Unknown("review")`. Now: * `/review <target>` composes the `review` template via `user_commands::compose` (same path the CLI uses) and submits the result as a new task. * `/summary-markdown [FILE]` behaves like `/summary` but selects the `summary-markdown` template and defaults the output filename to `bug-report.md`. * `cmd_summary` now takes a `markdown: bool` so both /summary and /summary-markdown share one code path. - `user_commands::lookup` rejects names that are not `[a-zA-Z0-9_-]+`. The CLI entry point already filtered that character set, but lookup is a public API and a future caller could hand it a path-traversal string like `"../etc/passwd"` which would otherwise resolve outside the commands directory. The guard lives in `is_valid_name` and applies to both `lookup` and the new `lookup_with_root` helper. - `user_commands::lookup_with_root(commands_dir, name)` split out so disk-override behaviour is testable without mocking `$HOME`. Mirrors the pattern used for `Settings::load_merged`. - `user_commands::compose(name, extra)` consolidates the "lookup + prepend extra" logic. Used by both `resolve_prompt_arg` in main.rs and the new `cmd_review` in session.rs so CLI and REPL flow through one implementation. Stale docs swept ---------------- Four docstrings pointed at the previous override path (`~/.kres/system-prompts/bug-summary{,-markdown}.md`); the code moved to `~/.kres/commands/<name>.md` in 6a7ba38 but the comments were missed. Updated: - `kres/src/main.rs:~170` — `--template` docstring - `kres-repl/src/summary.rs:~76` — SummaryInputs.template_path - `kres-repl/src/session.rs:~51` — ReplConfig.template_path - `kres-repl/src/session.rs:~2230` — `/summary --template` hint in the REPL cmd_summary comment README + CLAUDE.md sweeps ------------------------- - README invocation table rebuilt. The previous three-row table had a header mismatch (titled "How to invoke review" but the third row was `/summary`). Replaced with a per-command matrix covering review / summary / summary-markdown across CLI and REPL. - "Parallel lenses" note reworded from "prompt starts with `review:`" to the precise "colon-terminated leading word" form, with an explicit example of what doesn't trigger. - REPL command list at the bottom of the README adds `/summary-markdown [FILE]` and `/review <target>`. - CLAUDE.md REPL-commands table: `/summary` row expanded into three rows covering /summary, /summary-markdown, /review. Tests ----- - `user_commands` gains five tests (10 total in the module): * `disk_override_wins_over_embedded` — write a tmp file, assert lookup_with_root reads it instead of the embedded body. * `disk_override_empty_falls_through_to_embedded` — whitespace-only file doesn't brick the command. * `traversal_name_is_rejected` — path-traversal strings, empty / `.` / `..` all return None. * `compose_prepends_extra_to_body` — `compose("review", "X")` puts `X\n\n` at the front of the template body. * `compose_empty_extra_returns_bare_body` — empty extra text yields the body with no leading blank. * `compose_unknown_name_returns_none`. - `kres-repl::commands` gains two tests: * `parses_summary_markdown` — covers `/summary-markdown` with and without a filename arg. * `parses_review_with_target` and `review_without_target_is_unknown` — the target is required; missing it surfaces a helpful Unknown message. - `kres` main.rs gains four tests around `resolve_prompt_arg`: * `resolve_prompt_arg_word_colon_form_hits_user_commands` * `resolve_prompt_arg_slash_form_equivalent_to_colon_form` (pins the user's explicit CLI equivalence ask) * `resolve_prompt_arg_slash_unknown_command_falls_to_inline` * `resolve_prompt_arg_inline_colon_not_misparsed` Full workspace test run is clean: 156 kres-core, 58 kres-repl, 63 kres-llm, 16 kres-mcp, 54 kres unit tests pass. Signed-off-by: Chris Mason <clm@meta.com>
Signed-off-by: Chris Mason <clm@meta.com>
`user_commands::compose()` builds the review prompt as
`format!("{extra}\n\n{body}")` (kres-agents/src/user_commands.rs:112),
so the operator-supplied target lands *above* the template body.
The opening line said "the target below", so when an operator
invoked `/review` with no target but an extra paragraph that itself
ended "... the target below", the fast agent found nothing below,
emitted only a type:question asking for the target, the main agent
replied with prose (no `<actions>`), gather returned empty, and
every lens slow agent emitted "SCOPE CHECK FAILED — no source code
was provided" (linux.net session b686e702-f6af).
Fix by rewriting the opening paragraph to describe what actually
precedes it (file path, function name, commit ref, diff, or snippet)
so the fast agent latches onto the prepended target.
Signed-off-by: Chris Mason <clm@meta.com>
Two parity gaps between `run_once_with_ctx` and the lens-mode `gather` let the fast loop burn rounds without making progress: 1. `gather` ignored `skill_reads`. `run_once_with_ctx` has `apply_skill_reads(&mut live_skills, &parsed.skill_reads)` at the top of the parse-handling block so any skill file the fast agent requests mid-loop reaches the slow agent. `gather` instead threaded `self.skills` straight through, so a lens-mode fast agent that emitted `skill_reads` got its request silently dropped and the lens slow agents saw the pre-gather skills payload. 2. Neither loop bailed when every followup had `kind == "question"`. Question-type followups are clarifications for the operator, not fetchable data; `MainAgent::fetch` turns them into prose with no `<actions>` (kres-agents/src/main_agent.rs:260-269), so `fetched.symbols` / `fetched.context` come back empty and the loop re-asks. Session b686e702-f6af on linux.net spent every fast round and every lens call on a prompt with no target. Fix by cloning `self.skills` into a `live_skills` binding inside `gather`, calling `apply_skill_reads` on it, returning it from `gather`, and passing it into the lens `CodePrompt`; and by breaking out of both the `run_once_with_ctx` and `gather` loops when `parsed.followups.iter().all(|f| f.kind == "question")`. Signed-off-by: Chris Mason <clm@meta.com>
semcode-mcp emits `notifications/message` log lines on stdout during a `tools/call`, so a single call produces two lines: the notification and the real response. kres-mcp's read loop tried to deserialize every line as a `Response`, whose `#[serde(flatten)] result: ResponseResult` demands either a `result` or an `error` top-level key. Notifications have neither, so serde rejected them and the caller surfaced "mcp server `semcode` produced malformed JSON: data did not match any variant of untagged enum ResponseResult at line 1 column 131" — the real response on the next line was then dropped, and callers like `find_function(mark_stack_read)` saw an error even though the tool had succeeded. Reproduced by piping a synthetic `initialize` + `tools/call` at semcode-mcp: line 2 of stdout is exactly 131 chars long and starts with `"jsonrpc":"2.0","method":"notifications/message"` — the same length serde reported as column 131. Fix by parsing each line as a raw `Value` first; if it has a `method` field and no `result`/`error`, treat it as a notification and continue the read loop. Only non-notification lines are deserialized into `Response`. Signed-off-by: Chris Mason <clm@meta.com>
Add a paragraph directing the slow agents to report every bug they find in the target area, not just the worst one. The prior wording (focus + chains-of-events) encouraged depth per finding but said nothing about breadth, so a lens that surfaced one use-after-free sometimes stopped there even when adjacent call sites had the same shape. The merger already deduplicates and ranks severity, so the cost of over-reporting is low and the cost of missing distinct bugs of the same class is high. Wording references downstream merger behaviour so the agent understands why duplication is cheap relative to omission. Signed-off-by: Chris Mason <clm@meta.com>
Add `kres_core::Plan { prompt, goal, mode, steps[], created_at }`
produced by a new `define_plan` goal-agent task that runs right
after `define_goal` on every operator-typed prompt. Each
`TodoItem` grows a `step_id` pointing at the plan step it
executes; `Plan::sync_from_todo` unions the step_id up-link with
the legacy todo_ids down-link and rolls step status up from the
linked todos' statuses.
Every downstream agent sees the plan every turn:
- fast + slow via `CodePrompt.plan` — serialised into the user
JSON, but NOT in `CACHED_PREFIX_FIELDS` so plan rewrites do
not bust the big prefix cache.
- main via `DataFetcher::fetch(&followups, plan)` — passed
per-call so concurrent tasks with different plans cannot
clobber each other through a shared slot.
- goal judge via `check_goal(..., plan)`.
- todo agent via `update_todo_via_agent_with_logger(..., plan)`.
Three writers can reshape the plan after the initial
`define_plan`: the first slow call per top-level prompt (gated
on `RunContext.allow_plan_rewrite`), the todo agent on every
completed task, and the goal-not-met todo-agent injection.
Rewrites emit `{steps: [...]}` on the wire via
`kres_core::PlanRewrite` — the plan's identifying metadata
(`prompt`, `goal`, `mode`, `created_at`) is inherited from the
prior plan via `PlanRewrite::apply_to`, so a forgotten metadata
field in an LLM reply cannot silently drop the rewrite.
Step ids are kebab-case slugs (`audit-ring-buffer-init`) rather
than positional tags (`s1`/`s2`). Slugs survive reorder and
make reassignment to a different meaning obvious. The
synthesis path uses `slugify_step_id(title)`; collisions walk
`-2`, `-3`, … . The four planner/rewriter prompts (goal.txt,
todo.txt + the three slow-agent system prompts) all teach the
LLM the slug convention and forbid positional ids.
`PlanStep.id` and `PlanStep.title` have serde defaults, and
`PlanRewrite::apply_to` pipes the rewrite's steps through
`normalize_steps` — missing-id / blank-title / collided rows
get repaired rather than producing a corrupt plan.
`TaskManager::set_plan` reconciles orphan step_ids: when a
rewrite drops an id, any todo whose `step_id` referenced it
gets the field cleared so `sync_from_todo` does not keep
pointing at dead weight.
The session persistence layer writes Plan into
`<results>/session.json` alongside `todo`, `deferred`,
`completed_run_count`, and `last_prompt`. Atomic save (tmp +
fsync + rename + parent-dir fsync); a content-hash latch in
the reaper throttles idle rewrites. `InProgress` normalisation
on load plus `reset_in_progress_to_pending` on drain paths
(ctrl-c, --turns cap, goal-met) so no todo gets orphaned at
exit.
Operator-facing surface:
- `/plan` — shows current plan with per-step status and linked
todos via both linkage directions.
- `log_plan_change` at the four rewrite sites shows `+`/`-`/`~`
diffs; `log_plan_status_transitions` logs `pending → done`
etc. every reaper tick that changed a status.
Tests: 88 kres-core, 168 kres-agents, 63 kres-repl. All green.
Signed-off-by: Chris Mason <clm@meta.com>
Session persistence shipped as "point at an existing --results
dir and kres picks up where it left off". That default bled
prior sessions' plans, todos, and counters into a new run
whenever the operator re-used a results dir.
Gate resume behind an explicit opt-in:
- `--resume` CLI flag. Without it, kres starts clean even when
`--results` points at a directory with a `session.json`.
When `session.json` is missing but `session.json.prev` exists
(from the backup path below), `--resume` falls back to the
backup and logs the swap.
- On startup without `--resume`, any existing `session.json`
in the results dir is renamed to `session.json.prev` BEFORE
the reaper starts writing — the prior snapshot survives the
first 250 ms reaper tick that would otherwise overwrite it.
- `/resume [PATH]` REPL command. With PATH: load that file.
Without: prefer `<results>/session.json.prev`, else the live
`session.json`. Calls the same `resume_state_from` helper
`--resume` uses, so the `InProgress → Pending` normalisation
and the deferred-list seeding are identical.
Operators now see an explicit choice at startup ("note: prior
session snapshot moved to DIR/session.json.prev; starting
clean") and can recover mid-session with `/resume` when they
realise they wanted the prior state after all.
Signed-off-by: Chris Mason <clm@meta.com>
When the kernel skill directs the fast agent to read subsystem.md and load matching subsystem guides, the paths inside that file are relative to @REVIEW_PROMPTS@/kernel/subsystem/ — not the top-level @REVIEW_PROMPTS@/kernel/ directory. Spell it out so the agent adjusts the paths when following the guide chain. Signed-off-by: Chris Mason <clm@meta.com>
README.md had grown to 736 lines covering the agent flow, parallel-lens review, `--turns` semantics, coding mode, action allowlist, review-prompts repo integration, semcode MCP, configuration, system-prompt overrides, slash-command templates, CLI reference, workspace layout, and pre-commit hook. Readers hit a wall when all they wanted was how to get a review running. Split into one file per topic: - NEWS.md (news section). - docs/agents.md (fast / main / slow / todo / merger flow + "Building up a larger review"). - docs/review-template.md (the `/review` slash-command and its five-lens fan-out). - docs/coding-tasks.md (reproducers, `code_output`, `code_edits`, `bash` verify). - docs/summary.md (`/summary`, `--summary`, bug-report format). - docs/turns-and-follow.md (`--turns`, `--follow`, stop modes). - docs/action-allowlist.md (`--allow`, precedence, typo detect). - docs/configuration.md (`~/.kres/`, model precedence, system prompt overrides). - docs/commands.md (slash-command templates + disk overrides). - docs/review-prompts.md (external review-prompts repo). - docs/semcode.md (semcode-mcp integration). - docs/cli.md (every CLI flag + REPL command). - docs/development.md (workspace layout, build/test/lint, pre-commit hook, wire-format references). README.md drops to 105 lines: title, a "Why kres exists" paragraph that names each role and what it's for, three-step quick start, and a link hub into the docs. Content is not rewritten — each section keeps its existing prose so the git history stays traceable. Signed-off-by: Chris Mason <clm@meta.com>
The topic guides split out of README.md carried over padding from the original long-form README, and several file:line citations had drifted against the current source (setup.sh line numbers in review-prompts.md and semcode.md, a task.rs range in turns-and-follow.md, a symbol.rs range in semcode.md). Prose across action-allowlist, agents, coding-tasks, commands, configuration, review-prompts, review-template, semcode, summary, and turns-and-follow compressed to fact-per-paragraph; stale numeric ranges either updated or dropped in favour of function/symbol names that do not rot. No content dropped - every fact reachable from the longer form is still there. Signed-off-by: Chris Mason <clm@meta.com>
agents.md referenced a TBD docs/planning.md that was never written; the only file at that path on disk is an untracked stray git-log dump. Drop the parenthetical so readers don't chase a broken link. commands.md claimed all three shipped slash-commands compose a template body with a trailing target via user_commands::compose. That only describes /review — /summary and /summary-markdown call user_commands::lookup (kres-repl/src/summary.rs:141) to fetch the body and pass it as the system prompt for a fast-agent call over report.md + findings.json, with no target composition. Split the bullet list so the two roles are distinct. Signed-off-by: Chris Mason <clm@meta.com>
Improve error reporting for network errors
The focus was too narrow, missing some bugs. Signed-off-by: Chris Mason <clm@meta.com>
A successful git query that matches nothing (e.g. `log <range> --
<paths>` over a window with no commits touching those paths) returns
exit 0 with empty stdout and stderr. The main agent wraps each tool
result in a `--- git ... ---\n<body>` envelope, so an empty body is
indistinguishable from "the tool never ran" and the fast/slow agents
loop, re-requesting the same query.
Observed in kres session 81d6b079: the fast agent emitted the same
`git log <window> -- include/net/ip_tunnels.h ...` five times and
each empty reply was read as MISSING by the goal-check agent
("The git log returned no output (MISSING) ... the required operation
was not completed"). Running that query on the tree by hand
confirmed the range genuinely has zero matching commits; the tool
was not blocked.
Fix by appending "(no output)\n" when both stdout and stderr come
back empty so callers can tell "ran, produced nothing" apart from
"never ran."
Signed-off-by: Chris Mason <clm@meta.com>
Operators who want to share or review individual findings currently have to grep through findings.json or carve up report.md by hand. Add a --export DIR flag that iterates findings.json (honouring --results / --findings like --summary does) and writes one subfolder per finding. Each entry is DIR/<tag>/, where <tag> is the finding id sanitized for directory use. Collisions after sanitizing get a numeric suffix so ids that only differ in punctuation don't silently overwrite. Each folder carries a meta.yaml (id, title, severity, status, workspace git HEAD sha + subject, cross-refs, symbol and file-section locations, open_questions) and a FINDING.md with the full body — summary, mechanism, reproducer, impact, fix sketch, open questions, relevant symbol definitions, file section contents, and the per-task analysis details captured in finding.details. Signed-off-by: Chris Mason <clm@meta.com>
Migrate the REPL's UI from the rustyline prompt + DECSTBM status bar to ratatui. The rustyline path remains reachable via --no-tui as an escape hatch, and --stdio stays unchanged for redirected output. Signed-off-by: Chris Mason <clm@meta.com>
The TUI pane printed the slow-agent analysis body as one big wall
of text — fenced code blocks, prose, and indented listings all ran
together. Carry the block through a dedicated sink so the render
path can style it, and keep the non-TUI paths unchanged:
* kres_core::io grows a markdown-block sink slot alongside the
existing printer slot. install_markdown_sink installs a TUI
sink that brackets the body with MD_BLOCK_START / MD_BLOCK_END
sentinel lines before pushing it into the scrollback. The
non-TUI path never installs a sink, so async_println_markdown
folds back to async_println and --stdio output is byte-
identical.
* report_reaped now emits the analysis body via
async_println_markdown so the TUI sees a marked block while
--stdio and rustyline paths keep their old plaintext.
* tui::render_markdown_block is a tiny in-house renderer that
emits ratatui Line/Span with fenced code in Cyan, fence
markers in dim Cyan, 4-space-indented code in Cyan, and inline
\`backtick\` spans in Cyan. Everything else is a plain Span.
No new deps — tui-markdown 0.3 requires ratatui-core::Line and
0.2 requires ratatui 0.28, neither of which extend into our
Vec<ratatui::prelude::Line> without a cross-version shim.
The render loop walks the windowed scrollback, pipes bracketed
runs through render_markdown_block (drops sentinel lines), and
renders everything else verbatim. A window slice that cuts through
a bracketed region falls back to plain rendering for the visible
half — acceptable for a first pass.
Signed-off-by: Chris Mason <clm@meta.com>
Four fixes from the review of c861ed8: 1. Stale tui_markdown references in two comments (install_tui printer at kres-repl/src/tui.rs:189 and the render loop around :1036) now name render_markdown_block, matching reality after the dep was removed. 2. install_markdown_sink in kres-core/src/io.rs is now install- if-absent (returns Err on collision) and the replace variant lives separately as replace_markdown_sink, mirroring the install_printer / replace_printer split in the same module. install_tui_printer switched to replace_markdown_sink deliberately — the TUI wants the replace semantics for the same reason it uses replace_printer a line earlier. 3. Pin the --stdio byte-identical promise with a test: async_println_markdown with no sink installed produces the same captured output as async_println, and never emits sentinels. Plus a test for the install_if_absent contract. 4. Drop the unused `bytes` binding in split_inline_code; only bytes.len() was read, which is identical to line.len(). Signed-off-by: Chris Mason <clm@meta.com>
The --turns 0 stop condition (goal met, no-progress streak, or no-goal-agent batch finished) leaves the REPL open waiting for further operator input. A redirected/piped invocation has no operator on the other end, so the process sits forever after the work is done. Add ReplConfig::exit_on_idle, set it from main.rs to !stdout.is_terminal(), and have the reaper's --turns 0 stop branch set turns_exhausted and cancel root_shutdown when the flag is on. Mirrors the existing --turns N exit path so summary.txt is auto-rendered before teardown. Signed-off-by: Chris Mason <clm@meta.com>
metadata.yaml's only file pointers were buried inside relevant_symbols and relevant_file_sections list items, so a reader scanning a tree of finding folders had no top-level signal for which file each finding lived under. Emit a canonical filename: at the top of metadata.yaml, populated from the first relevant_symbol (falling back to the first relevant_file_section, then empty). Reserve a subsystem: slot in the same block — the Finding schema has no subsystem field today, so it renders empty for now and a follow-up todo will derive it from filename via a path-prefix rule. Signed-off-by: Chris Mason <clm@meta.com>
persist_code_output rejected every absolute path outright, so a coding task asked to write $DIR/summary.md (where $DIR is an absolute folder named in the prompt) had no way to put the file there: emitting an absolute path got dropped, and emitting a relative path landed at <workspace>/summary.md instead of the bug folder. Operators were forced to either re-launch kres with --workspace pointed at the target dir or work around it with a bash followup. Match the rule edit_file already uses (kres-agents/src/tools.rs:resolve_workspace): accept an absolute path when it resolves under the workspace OR under a directory the operator named in a prompt this session — consent::grant_paths_from_text already canonicalises and stores those mentions. Keep rejecting any path that contains '..' regardless of rooting; that's how a malformed reply would try to slip past both gates. Signed-off-by: Chris Mason <clm@meta.com>
On a real TTY, kres stays open after \`--turns N\` exhausts or \`--turns 0\` hits goal-met / no-progress / no-goal-batch-finished, waiting for the next operator prompt. Batch-style invocations want the process to end at that point without dropping into the redirected-stdout escape hatch. Add a \`--one\` bool flag (default off) that ORs into the existing exit_on_idle gate alongside the \`!stdout.is_terminal()\` check, so the reaper takes the same exit + auto-summary path either way. Signed-off-by: Chris Mason <clm@meta.com>
Two issues showed up while wiring a triage prompt to write summary.md
back into the per-finding export folder. First, the template lived as an
operator-installed file at \`~/.kres/prompts/triage-template.md\`, which
made distribution and discoverability worse than the existing
\`review\`/\`summary\` slash commands. Second, prose in the template
like "\`relevant_symbols\` / \`relevant_file_sections\`" tokenised under
\`split_whitespace\` into a bare \`/\`, which \`grant_paths_from_text\`
resolved to filesystem root and added to the consent store — defeating
the consent gate that \`persist_code_output\` and \`edit_file\` rely on,
since \`is_allowed\` then returned true for every absolute path.
Embed \`triage-template.md\` as a fourth slash-command via
\`user_commands::TABLE\`, mirroring
\`review\`/\`summary\`/\`summary-markdown\` — overridable by dropping a
file at \`~/.kres/commands/triage.md\`. Skip the file in \`setup.sh\` so
the legacy \`~/.kres/prompts/\` location can no longer shadow it. In
\`consent::looks_like_path\`, reject bare \`/\`, \`.\`, and \`..\`
tokens that come from prose conjunctions ("\`yes\` / \`no\`", "foo /
bar"); \`grant_from_mention\` separately refuses any path whose parent
is None as belt-and-braces for future callers. Replace the two
bare-\`/\` separators in the template prose so the embedded copy can
never trigger this even if a future caller bypasses the new guard.
Signed-off-by: Chris Mason <clm@meta.com>
An earlier in-flight edit converted the top of bug-summary.md from
question phrasing ("Does this code corrupt memory?") to plain statements
("This code corrupts memory."), but the rest of the file still demanded
question form: the hard-rule banner, the ask-short-questions paragraph,
the AVOID/USE INSTEAD examples, the structure section's step 5/6, and
the worked sample at the bottom all kept the prior style, so a model
following the template would still emit questions and contradict the new
framing.
Finish the conversion through the rest of the file: rename the banner
enumeration, drop "ask short questions" in favour of stating the bug
plainly, rewrite the widget_claim/widget_destroy worked examples in
declarative form, replace "a series of statements followed by a
question" with the punchline framing the rule was reaching for, and swap
step 5's "phrased as a question where possible" for a flat "concise
plain statement of the bug" with step 6 updated to match.
Signed-off-by: Chris Mason <clm@meta.com>
The plain-text bug-summary template moved from question phrasing to flat statements (commit f1e6653); the markdown variant still demanded question form in the same spots — banner enumeration, ask-short-questions paragraph, AVOID/USE INSTEAD examples, structure step 5/6, and the worked sample at the bottom. Without this follow-up, --summary-markdown would emit questions while --summary emits statements. Apply the same conversion to bug-summary-markdown.md, preserving the markdown-only bits (\`backtick\`-wrapped identifiers, \`\`\`c fenced code blocks). The user-preserved "don't say: Did you corrupt memory here?" negative example stays as the one remaining \`?\`. Signed-off-by: Chris Mason <clm@meta.com>
Signed-off-by: Chris Mason <clm@meta.com>
On a clean \`--turns N\` run the REPL was rendering summary.txt right before teardown via cmd_summary, on top of the deferred-list banner and the "exiting REPL" line. The render is unwanted noise during batch invocations: the operator runs /summary themselves before quitting, or \`kres --summary\` against the results dir afterwards. Remove the post-loop auto-summary block. The companion \`turns_exhausted\` and \`any_coding_task\` AtomicBool fields had no other readers, so drop the field definitions, the two constructor inits, the reaper-side store sites, the now-stale comments mentioning the gate, and the submit_prompt latch that fed any_coding_task. The exit_on_idle goal-met branch keeps cancelling root_shutdown — same exit, just no auto-render. Signed-off-by: Chris Mason <clm@meta.com>
The default session-id was
\`chrono::Utc::now().format("%Y%m%dT%H%M%SZ")\` — seconds resolution. A
bulk-launch wrapper (the in-progress triage-all.py runs 20 kres
processes at a time) had every parallel process collide on the same
timestamp, multiple kres instances all sharing the same
~/.kres/sessions/<ts>/ dir, which then races on session.json /
findings.json / report.md / prompt.md and crashes the Rust side with
exit 101.
Append \`-<pid>\` so the session id becomes e.g. 20260425T180000Z-12345.
Two concurrent kres processes get distinct dirs; the timestamp prefix
still keeps \`ls -lt ~/.kres/sessions/\` ordered.
Signed-off-by: Chris Mason <clm@meta.com>
INDEX.md showed Severity / Date / Status / ID / Title; the new triage flow fills in metadata.yaml's subsystem field per finding, but the index table didn't surface it, so an operator scanning INDEX.md still had to drill into each FINDING.md to see where the bug lived. Adding the column lets the same overview answer "which subsystems do my findings cluster in?" without leaving the index page. Read `subsystem:` via the existing top_level_scalar helper, store it as Option<String> on IndexRow with empty values folded to None, and render it between Severity and Date so the bug-attribute facets sit together. Empty / absent values render as the em-dash placeholder already used for missing dates. Signed-off-by: Chris Mason <clm@meta.com>
Several recent commits in this branch shipped 200+ char single-line paragraphs through the `-m` flag, ignoring the 72-char wrap rule documented in the project's commit-message style guide. A hook makes the requirement enforceable instead of advisory. The hook lives at `.githooks/commit-msg` (so it ships with the repo; opt-in via `git config core.hooksPath .githooks`). It scans each non-comment line of the message file with awk and exits 1 if any line exceeds 100 characters, printing the offending line number, length, and content. 100 is deliberately looser than the 72-char wrap rule so quoted code / shas / URLs that legitimately exceed 72 pass without a fight, while wall-of-text `-m` paragraphs do not. Signed-off-by: Chris Mason <clm@meta.com>
`--search` only handled implicit-AND lists of `key:value` clauses
with exact-string matching. Operators couldn't pivot a 100+ finding
tree by area without scanning INDEX.md by hand, couldn't OR clauses
together, and couldn't approximate-match subsystem strings the LLM
wrote freeform into metadata.yaml.
Three changes:
* `--generate` now also emits `INDEX-<subsystem>.md` per distinct
non-empty subsystem value alongside INDEX.md / index.html. Rows
with a blank subsystem stay in the umbrella INDEX.md only. Stale
`INDEX-*.md` files (from a previous run whose subsystems no
longer exist) are removed so a renamed subsystem doesn't leave a
dangling file. The filename slug strips non-alphanumeric chars
to `-`.
* `--search QUERY` parses a boolean expression — `clause`,
`clause -a clause` (explicit AND, same as adjacency), `clause -o
clause` (OR), and `( … )` for grouping. Stdlib only: no
expression-parser library fit the bundled-into-kres / no-pip
constraint, so a ~70-line recursive-descent parser does the job.
Parens must be whitespace-separated; a literal paren in a regex
value goes through `[(]` / `[)]`.
* Every clause value except `since:` is now a case-insensitive
regex over the matching column (severity / subsystem / status /
regex). `since:` keeps its YYYY-MM-DD date-bound semantics.
`collect_rows` also picks up a flat-layout fallback: if the export
dir has no `findings/` subtree, scan the top level. The row dict
gains a `tag_path` field so the markdown / html row-link form
emits the right relative path in either layout.
Tested against ~/local/kernel-bugs/findings/ (166 rows, 79 distinct
subsystem strings — heterogeneity is a metadata.yaml data-quality
issue, not a tool bug). All boolean / regex / paren forms exercised;
malformed expressions and bad regexes both exit 2 with a one-line
diagnostic.
Signed-off-by: Chris Mason <clm@meta.com>
Generating one INDEX-<subsystem>.md per distinct subsystem string sounded useful, but on a real tree it produced 79 files for 166 findings — the LLM writes subsystem strings freeform into metadata.yaml, so values like "AMD-CCP-PSP-SEV-SNP", "AMD-CCP-SEV-SNP", and "AMD-SEV-SNP-PSP-CCP-driver" each get their own file. Net effect: the export root went from clean to noisy, with no clear pivot benefit over `--search subsystem:<regex>`. Drop the auto-fanout. `--generate` writes INDEX.md and index.html, nothing else. Operators who want subsystem filtering use `--search subsystem:<regex>` from the shell, or the in-page filter dropdown in index.html. The `build_subsystem_indexes` and `sanitize_subsystem` helpers go with the auto-fanout — there's no remaining caller and no flag to invoke them manually. Signed-off-by: Chris Mason <clm@meta.com>
Once a finding has been triaged, summary.md is the document an operator wants to land on — it's the short triage write-up. FINDING.md is the long-form raw evidence; landing there from a spreadsheet-style index makes the operator scroll past code blocks and call chains they didn't ask for. `collect_rows` now sets `link_file` per row: `summary.md` when that file is present in the finding directory, `FINDING.md` otherwise. The markdown and html row-link emitters use that field, so a half-triaged tree gets a mix — newly-summarised findings link to the summary, untriaged ones still link to the underlying FINDING.md. Signed-off-by: Chris Mason <clm@meta.com>
`kres --export` and `--export-index` produce a tree of per-finding
folders plus entry-point files (INDEX.md, index.html,
findings-index.py). An operator landing in the directory has no
in-tree pointer telling them what `metadata.yaml` / `FINDING.md` /
`summary.md` mean or how to drive `findings-index.py --search`.
Bundle a `scripts/export-README.md` into the kres binary and copy
it into the export dir on first run, with the same don't-overwrite
contract as `findings-index.py`: an operator-edited README survives
re-runs of `--export` / `--export-index`. The README covers the
directory layout, the meaning of the per-finding files, and the
`--generate` / `--search` syntax (boolean expressions, regex
clauses, examples, error behaviour).
`install_export_readme` lives next to `run_index_script` in
kres-repl/src/export.rs and is called from `run_export_index`, so
both kres entry points install the README. Failures log to stderr
but don't propagate — the README is documentation, not load-bearing
for the export. New test `export_index_installs_readme_and_preserves_
local_edits` pins the bundled body on first install and the
operator-edit survival on re-run.
Bundled with three drive-by lint cleanups the pre-commit hook
flagged in the same area:
* rustfmt on an `assert!` block in kres-core/src/io.rs and a
chained `Style::default()` call in kres-repl/src/tui.rs;
* Rust 1.94's `clippy::question_mark` lint on
`if dir.parent().is_none() { return None; }` in
`ConsentStore::grant_from_mention` — replaced with the bare
`dir.parent()?;` form. Path::parent returns None only at
filesystem root, so the early-return semantics match exactly.
Signed-off-by: Chris Mason <clm@meta.com>
The README's job is to orient an operator landing in an export tree. The single most useful link from there is to INDEX.md — that's the table of every finding, and what most readers will want before they read prose about file layout and search syntax. Add a one-line `→ [Browse the findings index](INDEX.md)` link between the heading and the first prose paragraph so a GitHub Pages render of the README puts the obvious entry point one click away. Signed-off-by: Chris Mason <clm@meta.com>
`INDEX.md` and `index.html` link rows to `summary.md` once a finding
has been triaged. The summary itself, though, was a dead-end: a
reader who wanted to drop into the long-form FINDING.md or check
something in metadata.yaml had to leave the page and navigate the
filesystem. Both files sit in the same directory, so a relative link
costs nothing.
Have the triage prompt emit a one-line cross-link header before the
`# Subject:` heading:
[FINDING.md](FINDING.md) | [metadata.yaml](metadata.yaml)
The Rules section's "no heading above Subject" guidance is unchanged
in spirit — only this verbatim cross-link line is allowed before
`# Subject:`.
Signed-off-by: Chris Mason <clm@meta.com>
Several wrapper scripts (triage-all.py, the bug-batch runners) want the same shape: read a file of inputs one per line, append each input to a command template, run them in parallel with a per-task timeout, and tear every subprocess group down cleanly on Ctrl-C. The pattern lives in ~/local/src/review-prompts/kernel/scripts/claude_xargs.py — same control flow, but with claude-specific naming (cmd template called "claude command", inputs called "SHAs", `--series` / `--append` flags for prompt construction) baked in. Add a generic equivalent at scripts/xargs-timeout.py. Same parallel ThreadPoolExecutor + SIGINT/SIGTERM teardown + SIGTERM → SIGKILL escalation, with the claude knowledge stripped: `--command` / `--input-file` / `--parallel` / `--timeout` / `--verbose`. Blank lines and `#` comment lines in the input file are skipped. Each line is appended to the template and the result runs through `/bin/sh -c`, so shell metacharacters work the same way claude_xargs treated them. Smoke-tested with both the success path (3-input echo list) and the timeout-escalation path (sleep 0.1 vs sleep 5 with --timeout 1). Signed-off-by: Chris Mason <clm@meta.com>
`--search QUERY` is great for one-shot pivots, but the operator-
useful subsets (`severity:high`, `subsystem:work`, `regex:uaf -a
since:...`) get re-run every triage cycle. Letting `--generate`
materialise them into named markdown files keeps the saved set in
the export tree and means a GitHub Pages publish picks them up.
`--generate` now also reads `<export-dir>/index-config.yaml`, when
present, and writes one markdown file per `{file, query}` entry.
Format is a list of two-key mappings:
- file: INDEX-high.md
query: severity:high
- file: INDEX-recent-uaf.md
query: regex:uaf -a since:2026-04-01
Stdlib has no YAML reader and the script can't take a pip
dependency (it bundles into the kres binary and copies into export
dirs), so the parser is a ~30-line hand-roll for the limited
shape. `query` reuses the existing `parse_query` so any expression
that works under `--search` works in the config.
Filenames are validated as plain basenames inside the export dir
— `..`, `/`, absolute paths, and leading `.` are refused so a
typo can't write outside the tree. A bad regex or malformed query
reports a one-line stderr diagnostic, skips that file, and
continues. The exit code is 1 if any entry errored.
The umbrella INDEX.md and index.html are always written; the
config-driven files land on top. Absent config → silent skip.
Documented in scripts/export-README.md alongside the existing
`--search` syntax.
Verified: absent config (umbrella only), valid 3-entry config
(row counts match), malformed query (skip + exit 1), and
suspicious filenames (refused — no files outside the export dir).
Signed-off-by: Chris Mason <clm@meta.com>
The five existing query keys (`severity` / `subsystem` / `status` /
`since` / `regex`) match against the row's table columns, but
operators triaging a tree by area routinely want to pivot on the
file or function the finding cites. `regex:` over the joined row
text doesn't catch those — file paths and symbol names live in
metadata.yaml's `relevant_symbols` / `relevant_file_sections` lists,
not in any column the table renders.
Add two new clauses with the same case-insensitive-regex semantics
the others have:
* `file:<regex>` matches any filename in the metadata.yaml — the
primary `filename:` plus every `relevant_symbols` /
`relevant_file_sections` entry. Same set FINDING.md cites in its
`Relevant symbols` / `Relevant file sections` blocks.
* `function:<regex>` matches any symbol name listed under
`relevant_symbols:` (functions, macros, types). Same set
FINDING.md cites in its `Relevant symbols` block.
`collect_rows` extracts both at parse time via `all_filenames()` and
`all_function_names()` and stashes them on the row dict so the
matchers don't re-walk the YAML per row. The function-name extractor
runs a tiny state machine that only collects `name:` lines while
inside the `relevant_symbols:` block — kres-emitted metadata.yaml
has no other top-level `name:` key, so a fresh-top-level-key check
reliably exits the block.
Documentation:
* Bundled `scripts/export-README.md` gets the two new keys in the
search-clause table, four worked `--search` examples (anchored
function match, prefix file match, combined `file: -a severity:`,
`function: -a severity:`), and two new entries in the saved-
index `index-config.yaml` sample (`INDEX-net-ipv4.md` driven by
`file:^net/ipv4/`, `INDEX-pwq-symbols.md` driven by
`function:^pwq_`).
* Top-of-script docstring lists the new clauses alongside the
others so `--help` / `head` of the file is current.
Verified against ~/local/kernel-bugs/:
* `function:^acpi_os_execute$` — exact-match anchor finds the
single OSL_DEBUGGER finding citing that symbol.
* `function:put_unbound_pool` — substring matches 3 findings.
* `file:workqueue.c -a severity:high` — combo finds 2 high
workqueue findings.
* Malformed regex (`function:[`) exits 2 with the regex error.
Signed-off-by: Chris Mason <clm@meta.com>
…todo agent Observed symptom — every fast/slow call returns 400 immediately: API status 400: max_tokens: 164000 > 128000, which is the maximum allowed number of output tokens for claude-sonnet-4-6 400 isn't in is_retryable_status, so the call returns Err right away and the task lands in Errored with empty analysis. The reaper then feeds that empty analysis to the todo agent, which reads "no analysis" as "task didn't run" and re-queues the same item forever with reason "Prior execution returned empty analysis; must re-run." Session 6f3f0daf-… spun like that for 50 min — 269 failed slow calls, 0 successes, output dir untouched, tasks #42 through #625 reshuffling the same 11 todo items. Add a consecutive-error watchdog to the reaper: after K=3 reaped Errored tasks in a row, print a loud banner with the last error verbatim, drain pending/blocked todos to /followup so /continue can resume after the operator fixes the root cause, and (under --one) cancel the root shutdown. Reset on any Done — a single success means the pipeline is alive. Also pass [task errored: <err>] (instead of "") to the todo agent when r.state == Errored, so even before the watchdog trips the agent has something concrete to react to instead of treating the empty string as "nothing happened, queue it again." Signed-off-by: Breno Leitao <leitao@debian.org>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Observed symptom — every fast/slow call returns 400 immediately:
API status 400: max_tokens: 164000 > 128000, which is the maximum
allowed number of output tokens for claude-sonnet-4-6
400 isn't in is_retryable_status, so the call returns Err right away and the task lands in Errored with empty analysis. The reaper then feeds that empty analysis to the todo agent, which reads "no analysis" as "task didn't run" and re-queues the same item forever with reason "Prior execution returned empty analysis; must re-run." Session 6f3f0daf-… spun like that for 50 min — 269 failed slow calls, 0 successes, output dir untouched, tasks #42 through #625 reshuffling the same 11 todo items.
Add a consecutive-error watchdog to the reaper: after K=3 reaped Errored tasks in a row, print a loud banner with the last error verbatim, drain pending/blocked todos to /followup so /continue can resume after the operator fixes the root cause, and (under --one) cancel the root shutdown. Reset on any Done — a single success means the pipeline is alive.
Also pass [task errored: ] (instead of "") to the todo agent when r.state == Errored, so even before the watchdog trips the agent has something concrete to react to instead of treating the empty string as "nothing happened, queue it again."