Skip to content

Layered defense against TUI pump-blocking hangs#84

Open
tony wants to merge 10 commits into
masterfrom
stream-static-analyzer
Open

Layered defense against TUI pump-blocking hangs#84
tony wants to merge 10 commits into
masterfrom
stream-static-analyzer

Conversation

@tony

@tony tony commented Jun 27, 2026

Copy link
Copy Markdown
Owner

Summary

  • Fix the two live CPU stalls that froze the Textual TUI on real stores: the find-in-detail re-highlight re-tokenized the whole JSON body on every keystroke/step, and a filter-widen re-rendered the entire results list on the pump.
  • Add a runtime detection/prevention layer in ui/_runtime.py: a pump-thread sys.addaudithook that flags blocking-I/O initiation denylist-free, plus a log-only heartbeat watchdog that now defaults on for an interactive terminal.
  • Widen the static non-blocking guard so it follows same-class helper calls, classifies @on/scheduler/call_from_thread/subscribe callees, and resolves import aliases against an expanded denylist.
  • Add a reusable textual-non-blocking-pump skill plus AGENTS.md review rules and an ADR 0011 "coverage limits" record, so the methodology survives beyond this branch.
  • Motivation: TUI hangs kept recurring. 100% static prevention is provably impossible (Rice's theorem — "blocks the pump" is a semantic, not syntactic, property), so this ships a two-gate defense — static analysis prevents the decidable subset at merge; a cause-agnostic runtime oracle detects the undecidable residue — and fixes the two hangs that were actually occurring.

Changes by area

The two live hangs

  • src/agentgrep/ui/app_screen.py — Find-in-detail caches the syntax+search+filter base and copies it, overlaying only the find-match spans, so stepping matches (n/N) never re-tokenizes the body; the cache is invalidated on every detail re-render.
  • src/agentgrep/ui/widgets/results.py — Results rows are memoized by record id, so a filter rebuild reuses renders instead of rebuilding every Text; invalidated on theme switch and cleared with the widget on a new search.

Runtime detection / prevention

  • src/agentgrep/ui/_runtime.pysys.addaudithook scoped to the pump thread flags blocking-I/O initiation (socket.connect, subprocess.Popen, sqlite3.connect, time.sleep, …) regardless of how the call is spelled; raises under pytest, logs interactively (opt-in via AGENTGREP_TUI_WATCHDOG). The heartbeat watchdog now defaults on for an interactive TTY (log-only), decoupled from the @pump_only/@offload asserts so the default can never crash a user on a latent violation.

Static guard

  • tests/test_tui_non_blocking.py — Same-class interprocedural closure (follows self.helper()); @on and scheduler/call_from_thread/subscribe callee classification; import-aware denylist normalization plus conservative family expansion (network, fs-walk, json.load, input). The NB-9 JSON-confinement check stays intraprocedural by design.

Methodology

  • .agents/skills/textual-non-blocking-pump/ — Pump-entrypoint catalog, the static-prevent vs runtime-detect split, per-change review rules, and a hang-audit procedure (.claude/skills symlinks to it).
  • AGENTS.md, docs/dev/adr/0011-non-blocking-tui-invariants.md — Non-blocking review rules and the "Coverage limits and the runtime complement" record.

Design decisions

  • Two-gate defense, not a "perfect" lint. Static analysis can only ever flag a syntactic proxy for a semantic property, so it prevents the enumerable, decidable subset at merge while a wall-clock watchdog catches the undecidable residue (CPU spin, native, data-dependent loops). The audit hook adds denylist-free prevention of I/O initiation specifically.
  • Render cache, not an async restructure, for the filter fix. Profiling the rebuild (synthetic records, timings only) showed _render_record × N dominated (~75%) and was redundant — the rows were already rendered during streaming — so a synchronous id-keyed cache fixes it with zero concurrency risk. on_filter_completed is a pump-critical message handler, so awaiting a chunked apply would have blocked queued keystrokes anyway.
  • Watchdog default-on, asserts opt-in. The watchdog is log-only and safe to default on; the @pump_only/@offload asserts can raise, so they stay gated on AGENTGREP_TUI_WATCHDOG and are deliberately decoupled from the watchdog predicate.
  • NB-9 kept intraprocedural. The interprocedural closure runs only for the NB-1 blocking-call scan, so a pump method that calls the one sanctioned JSON builder is not falsely flagged.

Test plan

  • test_pump_audit_hook_behavior (parametrized) — the hook flags blocking-I/O initiation only when armed and on the pump; a non-I/O event and a pure CPU spin never trip it
  • test_runtime_enable_predicates (parametrized truth-table) — watchdog_enabled / guard-asserts / audit_hook_enabled decouple over env, pytest marker, and TTY
  • test_forbidden_call_detector_resolves_aliases_and_families, test_classifier_sees_scheduled_callables_and_on_handlers, test_closure_follows_self_helpers — the guard catches alias/family evasion, @on/scheduler callees, and watch_x → _do_io → open()
  • test_pump_methods_have_no_blocking_calls — the upgraded interprocedural scan passes clean on the real UI code
  • test_detail_find_step_reuses_syntax_base — an n/N step reuses the cached base (no Syntax rebuild)
  • test_filter_rebuild_reuses_cached_row_renders, test_new_search_clears_results_render_cache — a rebuild reuses cached rows; a fresh search releases them
  • Full completion gate: ruff check, ruff format, ty check, py.test --reruns 0, just build-docs

why: TUI hangs recur because "blocks the pump" is a semantic
property no static lint can fully catch (Rice's theorem), and the
ADR 0011 guard is intraprocedural and name-classified — so it
misses helper-extracted I/O, @on/timer/call_from_thread callees,
and unbounded CPU. A reusable skill encodes the audit method and
the layered defense so future TUI work applies it consistently.

what:
- Add .agents/skills/textual-non-blocking-pump/SKILL.md: the full
  pump-entrypoint catalog, the static-prevent vs runtime-detect
  defense split, the I/O-vs-CPU asymmetry (audit hook + watchdog),
  per-change review rules, and the hang-audit procedure.
- Live in .agents/skills (cross-runtime canonical home); the
  gitignored .claude/skills symlinks here for Claude Code.
tony added 8 commits June 27, 2026 17:17
why: The ADR 0011 static guard prevents only a decidable subset —
a denylisted call written directly in a name-classified pump
method. "Blocks the pump" is a Rice-undecidable property, so no
static gate reaches 100%; recording the gap keeps future TUI work
from over-trusting it.

what:
- ADR 0011: add "Coverage limits and the runtime complement" —
  enumerate-don't-prefix-guess pump entrypoints, the
  interprocedural and CPU blind spots, and the prevention
  (audit-hook over I/O initiation) vs detection (wall-clock
  watchdog) split; name the two uncovered live sites.
- AGENTS.md: add concise non-blocking review rules to the TUI
  module entry, pointing at ADR 0011 and the
  textual-non-blocking-pump skill.
why: Real users got no non-blocking enforcement — the heartbeat
watchdog and the @pump_only/@offload asserts were opt-in and off
by default. The static guard also cannot see blocking I/O reached
through aliasing or dynamic dispatch. A pump-thread audit hook
catches blocking-I/O initiation denylist-free, and defaulting the
log-only watchdog on for interactive terminals makes a wedged pump
observable to the user it actually hurts.

what:
- _runtime: add BlockingOnPumpError plus install/arm/disarm helpers
  and _pump_audit_hook — a sys.addaudithook that, on the bound pump
  thread, flags blocking-I/O initiation (socket.connect, subprocess,
  sqlite3.connect, time.sleep, ...) and raises under pytest / logs
  otherwise. Excludes open/import and byte-transfer events.
- _runtime: default watchdog_enabled() on for an interactive TTY
  (AGENTGREP_TUI_WATCHDOG overrides either way; never under pytest),
  and decouple guard-assert enabling from the watchdog so the TTY
  default cannot arm the raising asserts for users.
- app_screen: arm the audit hook when opted in on mount; disarm on
  teardown.
- Cover the hook (raise/log/off-pump/disarm) and the watchdog vs
  guard decoupling.
why: Making the heartbeat watchdog default-on for an interactive
TTY left the _runtime module docstring and the skill describing it
as opt-in / off in production — now wrong for the watchdog half
(the asserts and audit hook remain opt-in).

what:
- _runtime: scope the "no-ops unless enabled" claim to the
  asserts; note the log-only watchdog defaults on for a TTY.
- skill: rewrite the two-gate closing paragraph to the shipped
  state (asserts + audit hook opt-in; watchdog TTY-default).
why: _present_detail_find re-ran rich Syntax over the whole JSON
body on every find keystroke and n/N step — an unbounded pump
re-tokenization (NB-9) on large records. Only the find-match
overlay changes between presents; the syntax+search+filter base
does not.

what:
- Cache the highlighted base (_detail_find_base) keyed by source;
  _present_detail_find copies it and layers only the find spans so
  stepping/typing never re-tokenizes the body.
- Invalidate the cache in _present_detail (every body re-render:
  record switch, filter change, theme re-render, offloaded build).
- Cover that an n/N step reuses the base (no Syntax rebuild).
why: A filter widen rebuilt the whole results list on the pump,
and _render_record per record dominated — ~75% of a ~200ms freeze
at 10k matches (profiled). The rows were already rendered during
streaming, so re-rendering them is wasted work.

what:
- Memoize rendered rows by record id in SearchResultsList
  (_render_cache); _rebuild_options/append_records reuse them so a
  filter rebuild pays only the OptionList add cost.
- Invalidate on clear() and rerender_records() (theme switch); a
  fresh search now clears the results widget so released record
  ids cannot return a stale row.
- Cover row reuse on rebuild and cache release on a new search.
why: The static guard missed alias/from-import evasion of its
denylist, whole blocking families (network, fs-walk, json.load,
input), and the pump callables Textual invokes via @on, a
scheduler, call_from_thread, or a signal — the gaps the ADR 0011
coverage-limits section names. Recon confirmed no existing pump
code uses these, so this is forward-going coverage.

what:
- _forbidden_calls is import-aware (resolves `import x as y` /
  `from x import y` to canonical names); the denylist gains
  conservative families: input, json.load/dump, os.walk/listdir/
  scandir/stat/read, .iterdir/.rglob, and network roots
  (socket/urllib/requests/httpx/ftplib). Generic attrs
  (.get/.join/.wait/.result) stay out — they cannot be typed.
- _is_pump_method also classifies @on handlers and the callables
  seeded from set_timer/set_interval/call_from_thread/subscribe
  (_SCHEDULED_PUMP_NAMES).
- Detector self-tests cover alias/family resolution and the new
  classification.
why: The static guard scanned only a pump method's own body, so a
blocking call extracted one hop into a helper was invisible — the
dominant evasion (the natural refactor that shipped the os.write
hang). The real code is clean through its helper chains today;
this catches the next regression.

what:
- Add same-class call-graph closure: a pump method's scan follows
  self.<helper>() into sibling methods (visited-set bounded).
  @offload workers are passed to run_worker, not called as
  self.x(), so the closure does not follow into them; NB-9
  json-confinement stays intraprocedural, untouched.
- test_pump_methods_have_no_blocking_calls uses the closure; a
  detector test pins that it follows watch_x -> _do_io -> open().
why: Fixing the two CPU stalls and widening the static guard
(same-class closure, @on/scheduler seeding, import-aware denylist)
left the skill, AGENTS.md, and ADR 0011 describing the old weaker
guard and listing already-fixed sites as live.

what:
- skill: reframe the "trap" holes to what still slips
  (cross-module/dynamic dispatch, lambda/partial scheduler targets,
  generic-attr/CPU); mark the @on/scheduler/signal catalog rows
  seeded; note the two former "live gaps" are now bounded.
- AGENTS.md + ADR 0011: the guard now follows same-class helpers
  and seeds @on/scheduler callees; residual is cross-module or
  dynamic dispatch and CPU spin.
@tony tony force-pushed the stream-static-analyzer branch from cd3994c to 8a493e8 Compare June 27, 2026 22:17
@tony

tony commented Jun 27, 2026

Copy link
Copy Markdown
Owner Author

Code review

Found 1 issue:

  1. _detail_find_base_for keys its find-overlay cache only on source, but bakes the live results-filter terms into the cached Text via _apply_filter_highlight (which reads self._filter_terms). The sole invalidation site, _present_detail, is skipped by on_filter_completed when the top filtered record is unchanged — so changing the results filter while find-in-detail is active returns stale filter highlights on every subsequent find keystroke/step. This is a regression from the pre-cache behavior that rebuilt the overlay each call. The sibling body cache _detail_cache_key already folds self._filter_terms into its key; this cache should too (or invalidate when the filter changes).

cached = self._detail_find_base
if cached is not None and self._detail_find_base_source == source:
return cached
if detect_content_format(source) == "json":
text = _RichSyntax(source, "json", theme="ansi_dark", word_wrap=True).highlight(source)
text.no_wrap = False
self._apply_search_highlight(text)
else:
text = highlight_matches(
source,
list(self.query.terms),
case_sensitive=self.query.case_sensitive,
regex=self.query.regex,
style=self._match_style("search"),
)
self._apply_filter_highlight(text)
self._detail_find_base = text
self._detail_find_base_source = source

🤖 Generated with Claude Code

- If this code review was useful, please react with 👍. Otherwise, react with 👎.

why: The find-detail base cache was keyed only by rendered body text. When a filter changed while the same record stayed selected, detail rendering was skipped and find navigation could reuse filter-highlight spans from the previous filter state.

what:
- Key the cached find-detail base by rendered body plus search and filter highlight state.
- Clear the cached key with the cached body on fresh detail renders.
- Add a same-record filter-change regression covering find navigation after skipped detail rendering.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant