Layered defense against TUI pump-blocking hangs by tony · Pull Request #84 · tony/agentgrep

tony · 2026-06-27T21:46:51Z

Summary

Fix the two live CPU stalls that froze the Textual TUI on real stores: the find-in-detail re-highlight re-tokenized the whole JSON body on every keystroke/step, and a filter-widen re-rendered the entire results list on the pump.
Add a runtime detection/prevention layer in ui/_runtime.py: a pump-thread sys.addaudithook that flags blocking-I/O initiation denylist-free, plus a log-only heartbeat watchdog that now defaults on for an interactive terminal.
Widen the static non-blocking guard so it follows same-class helper calls, classifies @on/scheduler/call_from_thread/subscribe callees, and resolves import aliases against an expanded denylist.
Add a reusable textual-non-blocking-pump skill plus AGENTS.md review rules and an ADR 0011 "coverage limits" record, so the methodology survives beyond this branch.
Motivation: TUI hangs kept recurring. 100% static prevention is provably impossible (Rice's theorem — "blocks the pump" is a semantic, not syntactic, property), so this ships a two-gate defense — static analysis prevents the decidable subset at merge; a cause-agnostic runtime oracle detects the undecidable residue — and fixes the two hangs that were actually occurring.

Changes by area

The two live hangs

src/agentgrep/ui/app_screen.py — Find-in-detail caches the syntax+search+filter base and copies it, overlaying only the find-match spans, so stepping matches (n/N) never re-tokenizes the body; the cache is invalidated on every detail re-render.
src/agentgrep/ui/widgets/results.py — Results rows are memoized by record id, so a filter rebuild reuses renders instead of rebuilding every Text; invalidated on theme switch and cleared with the widget on a new search.

Runtime detection / prevention

src/agentgrep/ui/_runtime.py — sys.addaudithook scoped to the pump thread flags blocking-I/O initiation (socket.connect, subprocess.Popen, sqlite3.connect, time.sleep, …) regardless of how the call is spelled; raises under pytest, logs interactively (opt-in via AGENTGREP_TUI_WATCHDOG). The heartbeat watchdog now defaults on for an interactive TTY (log-only), decoupled from the @pump_only/@offload asserts so the default can never crash a user on a latent violation.

Static guard

tests/test_tui_non_blocking.py — Same-class interprocedural closure (follows self.helper()); @on and scheduler/call_from_thread/subscribe callee classification; import-aware denylist normalization plus conservative family expansion (network, fs-walk, json.load, input). The NB-9 JSON-confinement check stays intraprocedural by design.

Methodology

.agents/skills/textual-non-blocking-pump/ — Pump-entrypoint catalog, the static-prevent vs runtime-detect split, per-change review rules, and a hang-audit procedure (.claude/skills symlinks to it).
AGENTS.md, docs/dev/adr/0011-non-blocking-tui-invariants.md — Non-blocking review rules and the "Coverage limits and the runtime complement" record.

Design decisions

Two-gate defense, not a "perfect" lint. Static analysis can only ever flag a syntactic proxy for a semantic property, so it prevents the enumerable, decidable subset at merge while a wall-clock watchdog catches the undecidable residue (CPU spin, native, data-dependent loops). The audit hook adds denylist-free prevention of I/O initiation specifically.
Render cache, not an async restructure, for the filter fix. Profiling the rebuild (synthetic records, timings only) showed _render_record × N dominated (~75%) and was redundant — the rows were already rendered during streaming — so a synchronous id-keyed cache fixes it with zero concurrency risk. on_filter_completed is a pump-critical message handler, so awaiting a chunked apply would have blocked queued keystrokes anyway.
Watchdog default-on, asserts opt-in. The watchdog is log-only and safe to default on; the @pump_only/@offload asserts can raise, so they stay gated on AGENTGREP_TUI_WATCHDOG and are deliberately decoupled from the watchdog predicate.
NB-9 kept intraprocedural. The interprocedural closure runs only for the NB-1 blocking-call scan, so a pump method that calls the one sanctioned JSON builder is not falsely flagged.

Test plan

test_pump_audit_hook_behavior (parametrized) — the hook flags blocking-I/O initiation only when armed and on the pump; a non-I/O event and a pure CPU spin never trip it
test_runtime_enable_predicates (parametrized truth-table) — watchdog_enabled / guard-asserts / audit_hook_enabled decouple over env, pytest marker, and TTY
test_forbidden_call_detector_resolves_aliases_and_families, test_classifier_sees_scheduled_callables_and_on_handlers, test_closure_follows_self_helpers — the guard catches alias/family evasion, @on/scheduler callees, and watch_x → _do_io → open()
test_pump_methods_have_no_blocking_calls — the upgraded interprocedural scan passes clean on the real UI code
test_detail_find_step_reuses_syntax_base — an n/N step reuses the cached base (no Syntax rebuild)
test_filter_rebuild_reuses_cached_row_renders, test_new_search_clears_results_render_cache — a rebuild reuses cached rows; a fresh search releases them
Full completion gate: ruff check, ruff format, ty check, py.test --reruns 0, just build-docs

why: TUI hangs recur because "blocks the pump" is a semantic property no static lint can fully catch (Rice's theorem), and the ADR 0011 guard is intraprocedural and name-classified — so it misses helper-extracted I/O, @on/timer/call_from_thread callees, and unbounded CPU. A reusable skill encodes the audit method and the layered defense so future TUI work applies it consistently. what: - Add .agents/skills/textual-non-blocking-pump/SKILL.md: the full pump-entrypoint catalog, the static-prevent vs runtime-detect defense split, the I/O-vs-CPU asymmetry (audit hook + watchdog), per-change review rules, and the hang-audit procedure. - Live in .agents/skills (cross-runtime canonical home); the gitignored .claude/skills symlinks here for Claude Code.

why: The ADR 0011 static guard prevents only a decidable subset — a denylisted call written directly in a name-classified pump method. "Blocks the pump" is a Rice-undecidable property, so no static gate reaches 100%; recording the gap keeps future TUI work from over-trusting it. what: - ADR 0011: add "Coverage limits and the runtime complement" — enumerate-don't-prefix-guess pump entrypoints, the interprocedural and CPU blind spots, and the prevention (audit-hook over I/O initiation) vs detection (wall-clock watchdog) split; name the two uncovered live sites. - AGENTS.md: add concise non-blocking review rules to the TUI module entry, pointing at ADR 0011 and the textual-non-blocking-pump skill.

why: Real users got no non-blocking enforcement — the heartbeat watchdog and the @pump_only/@offload asserts were opt-in and off by default. The static guard also cannot see blocking I/O reached through aliasing or dynamic dispatch. A pump-thread audit hook catches blocking-I/O initiation denylist-free, and defaulting the log-only watchdog on for interactive terminals makes a wedged pump observable to the user it actually hurts. what: - _runtime: add BlockingOnPumpError plus install/arm/disarm helpers and _pump_audit_hook — a sys.addaudithook that, on the bound pump thread, flags blocking-I/O initiation (socket.connect, subprocess, sqlite3.connect, time.sleep, ...) and raises under pytest / logs otherwise. Excludes open/import and byte-transfer events. - _runtime: default watchdog_enabled() on for an interactive TTY (AGENTGREP_TUI_WATCHDOG overrides either way; never under pytest), and decouple guard-assert enabling from the watchdog so the TTY default cannot arm the raising asserts for users. - app_screen: arm the audit hook when opted in on mount; disarm on teardown. - Cover the hook (raise/log/off-pump/disarm) and the watchdog vs guard decoupling.

why: Making the heartbeat watchdog default-on for an interactive TTY left the _runtime module docstring and the skill describing it as opt-in / off in production — now wrong for the watchdog half (the asserts and audit hook remain opt-in). what: - _runtime: scope the "no-ops unless enabled" claim to the asserts; note the log-only watchdog defaults on for a TTY. - skill: rewrite the two-gate closing paragraph to the shipped state (asserts + audit hook opt-in; watchdog TTY-default).

why: _present_detail_find re-ran rich Syntax over the whole JSON body on every find keystroke and n/N step — an unbounded pump re-tokenization (NB-9) on large records. Only the find-match overlay changes between presents; the syntax+search+filter base does not. what: - Cache the highlighted base (_detail_find_base) keyed by source; _present_detail_find copies it and layers only the find spans so stepping/typing never re-tokenizes the body. - Invalidate the cache in _present_detail (every body re-render: record switch, filter change, theme re-render, offloaded build). - Cover that an n/N step reuses the base (no Syntax rebuild).

why: A filter widen rebuilt the whole results list on the pump, and _render_record per record dominated — ~75% of a ~200ms freeze at 10k matches (profiled). The rows were already rendered during streaming, so re-rendering them is wasted work. what: - Memoize rendered rows by record id in SearchResultsList (_render_cache); _rebuild_options/append_records reuse them so a filter rebuild pays only the OptionList add cost. - Invalidate on clear() and rerender_records() (theme switch); a fresh search now clears the results widget so released record ids cannot return a stale row. - Cover row reuse on rebuild and cache release on a new search.

@on

why: The static guard missed alias/from-import evasion of its denylist, whole blocking families (network, fs-walk, json.load, input), and the pump callables Textual invokes via @on, a scheduler, call_from_thread, or a signal — the gaps the ADR 0011 coverage-limits section names. Recon confirmed no existing pump code uses these, so this is forward-going coverage. what: - _forbidden_calls is import-aware (resolves `import x as y` / `from x import y` to canonical names); the denylist gains conservative families: input, json.load/dump, os.walk/listdir/ scandir/stat/read, .iterdir/.rglob, and network roots (socket/urllib/requests/httpx/ftplib). Generic attrs (.get/.join/.wait/.result) stay out — they cannot be typed. - _is_pump_method also classifies @on handlers and the callables seeded from set_timer/set_interval/call_from_thread/subscribe (_SCHEDULED_PUMP_NAMES). - Detector self-tests cover alias/family resolution and the new classification.

why: The static guard scanned only a pump method's own body, so a blocking call extracted one hop into a helper was invisible — the dominant evasion (the natural refactor that shipped the os.write hang). The real code is clean through its helper chains today; this catches the next regression. what: - Add same-class call-graph closure: a pump method's scan follows self.<helper>() into sibling methods (visited-set bounded). @offload workers are passed to run_worker, not called as self.x(), so the closure does not follow into them; NB-9 json-confinement stays intraprocedural, untouched. - test_pump_methods_have_no_blocking_calls uses the closure; a detector test pins that it follows watch_x -> _do_io -> open().

why: Fixing the two CPU stalls and widening the static guard (same-class closure, @on/scheduler seeding, import-aware denylist) left the skill, AGENTS.md, and ADR 0011 describing the old weaker guard and listing already-fixed sites as live. what: - skill: reframe the "trap" holes to what still slips (cross-module/dynamic dispatch, lambda/partial scheduler targets, generic-attr/CPU); mark the @on/scheduler/signal catalog rows seeded; note the two former "live gaps" are now bounded. - AGENTS.md + ADR 0011: the guard now follows same-class helpers and seeds @on/scheduler callees; residual is cross-module or dynamic dispatch and CPU spin.

tony · 2026-06-27T23:36:02Z

Code review

Found 1 issue:

_detail_find_base_for keys its find-overlay cache only on source, but bakes the live results-filter terms into the cached Text via _apply_filter_highlight (which reads self._filter_terms). The sole invalidation site, _present_detail, is skipped by on_filter_completed when the top filtered record is unchanged — so changing the results filter while find-in-detail is active returns stale filter highlights on every subsequent find keystroke/step. This is a regression from the pre-cache behavior that rebuilt the overlay each call. The sibling body cache _detail_cache_key already folds self._filter_terms into its key; this cache should too (or invalidate when the filter changes).

agentgrep/src/agentgrep/ui/app_screen.py

Lines 1996 to 2013 in 8a493e8

    
           cached = self._detail_find_base 
        
           if cached is not None and self._detail_find_base_source == source: 
        
               return cached 
        
           if detect_content_format(source) == "json": 
        
               text = _RichSyntax(source, "json", theme="ansi_dark", word_wrap=True).highlight(source) 
        
               text.no_wrap = False 
        
               self._apply_search_highlight(text) 
        
           else: 
        
               text = highlight_matches( 
        
                   source, 
        
                   list(self.query.terms), 
        
                   case_sensitive=self.query.case_sensitive, 
        
                   regex=self.query.regex, 
        
                   style=self._match_style("search"), 
        
               ) 
        
           self._apply_filter_highlight(text) 
        
           self._detail_find_base = text 
        
           self._detail_find_base_source = source

🤖 Generated with Claude Code

_{- If this code review was useful, please react with 👍. Otherwise, react with 👎.}

why: The find-detail base cache was keyed only by rendered body text. When a filter changed while the same record stayed selected, detail rendering was skipped and find navigation could reuse filter-highlight spans from the previous filter state. what: - Key the cached find-detail base by rendered body plus search and filter highlight state. - Clear the cached key with the cached body on fresh detail renders. - Add a same-record filter-change regression covering find navigation after skipped detail rendering.

tony temporarily deployed to docs June 27, 2026 21:46 — with GitHub Actions Inactive

tony added 8 commits June 27, 2026 17:17

tony force-pushed the stream-static-analyzer branch from cd3994c to 8a493e8 Compare June 27, 2026 22:17

tony temporarily deployed to docs June 27, 2026 22:17 — with GitHub Actions Inactive

tony deployed to docs June 28, 2026 01:03 — with GitHub Actions Active

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Layered defense against TUI pump-blocking hangs#84

Layered defense against TUI pump-blocking hangs#84
tony wants to merge 10 commits into
masterfrom
stream-static-analyzer

tony commented Jun 27, 2026

Uh oh!

tony commented Jun 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tony commented Jun 27, 2026

Summary

Changes by area

The two live hangs

Runtime detection / prevention

Static guard

Methodology

Design decisions

Test plan

Uh oh!

tony commented Jun 27, 2026

Code review

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant