Layered defense against TUI pump-blocking hangs#84
Open
tony wants to merge 10 commits into
Open
Conversation
why: TUI hangs recur because "blocks the pump" is a semantic property no static lint can fully catch (Rice's theorem), and the ADR 0011 guard is intraprocedural and name-classified — so it misses helper-extracted I/O, @on/timer/call_from_thread callees, and unbounded CPU. A reusable skill encodes the audit method and the layered defense so future TUI work applies it consistently. what: - Add .agents/skills/textual-non-blocking-pump/SKILL.md: the full pump-entrypoint catalog, the static-prevent vs runtime-detect defense split, the I/O-vs-CPU asymmetry (audit hook + watchdog), per-change review rules, and the hang-audit procedure. - Live in .agents/skills (cross-runtime canonical home); the gitignored .claude/skills symlinks here for Claude Code.
why: The ADR 0011 static guard prevents only a decidable subset — a denylisted call written directly in a name-classified pump method. "Blocks the pump" is a Rice-undecidable property, so no static gate reaches 100%; recording the gap keeps future TUI work from over-trusting it. what: - ADR 0011: add "Coverage limits and the runtime complement" — enumerate-don't-prefix-guess pump entrypoints, the interprocedural and CPU blind spots, and the prevention (audit-hook over I/O initiation) vs detection (wall-clock watchdog) split; name the two uncovered live sites. - AGENTS.md: add concise non-blocking review rules to the TUI module entry, pointing at ADR 0011 and the textual-non-blocking-pump skill.
why: Real users got no non-blocking enforcement — the heartbeat watchdog and the @pump_only/@offload asserts were opt-in and off by default. The static guard also cannot see blocking I/O reached through aliasing or dynamic dispatch. A pump-thread audit hook catches blocking-I/O initiation denylist-free, and defaulting the log-only watchdog on for interactive terminals makes a wedged pump observable to the user it actually hurts. what: - _runtime: add BlockingOnPumpError plus install/arm/disarm helpers and _pump_audit_hook — a sys.addaudithook that, on the bound pump thread, flags blocking-I/O initiation (socket.connect, subprocess, sqlite3.connect, time.sleep, ...) and raises under pytest / logs otherwise. Excludes open/import and byte-transfer events. - _runtime: default watchdog_enabled() on for an interactive TTY (AGENTGREP_TUI_WATCHDOG overrides either way; never under pytest), and decouple guard-assert enabling from the watchdog so the TTY default cannot arm the raising asserts for users. - app_screen: arm the audit hook when opted in on mount; disarm on teardown. - Cover the hook (raise/log/off-pump/disarm) and the watchdog vs guard decoupling.
why: Making the heartbeat watchdog default-on for an interactive TTY left the _runtime module docstring and the skill describing it as opt-in / off in production — now wrong for the watchdog half (the asserts and audit hook remain opt-in). what: - _runtime: scope the "no-ops unless enabled" claim to the asserts; note the log-only watchdog defaults on for a TTY. - skill: rewrite the two-gate closing paragraph to the shipped state (asserts + audit hook opt-in; watchdog TTY-default).
why: _present_detail_find re-ran rich Syntax over the whole JSON body on every find keystroke and n/N step — an unbounded pump re-tokenization (NB-9) on large records. Only the find-match overlay changes between presents; the syntax+search+filter base does not. what: - Cache the highlighted base (_detail_find_base) keyed by source; _present_detail_find copies it and layers only the find spans so stepping/typing never re-tokenizes the body. - Invalidate the cache in _present_detail (every body re-render: record switch, filter change, theme re-render, offloaded build). - Cover that an n/N step reuses the base (no Syntax rebuild).
why: A filter widen rebuilt the whole results list on the pump, and _render_record per record dominated — ~75% of a ~200ms freeze at 10k matches (profiled). The rows were already rendered during streaming, so re-rendering them is wasted work. what: - Memoize rendered rows by record id in SearchResultsList (_render_cache); _rebuild_options/append_records reuse them so a filter rebuild pays only the OptionList add cost. - Invalidate on clear() and rerender_records() (theme switch); a fresh search now clears the results widget so released record ids cannot return a stale row. - Cover row reuse on rebuild and cache release on a new search.
why: The static guard missed alias/from-import evasion of its denylist, whole blocking families (network, fs-walk, json.load, input), and the pump callables Textual invokes via @on, a scheduler, call_from_thread, or a signal — the gaps the ADR 0011 coverage-limits section names. Recon confirmed no existing pump code uses these, so this is forward-going coverage. what: - _forbidden_calls is import-aware (resolves `import x as y` / `from x import y` to canonical names); the denylist gains conservative families: input, json.load/dump, os.walk/listdir/ scandir/stat/read, .iterdir/.rglob, and network roots (socket/urllib/requests/httpx/ftplib). Generic attrs (.get/.join/.wait/.result) stay out — they cannot be typed. - _is_pump_method also classifies @on handlers and the callables seeded from set_timer/set_interval/call_from_thread/subscribe (_SCHEDULED_PUMP_NAMES). - Detector self-tests cover alias/family resolution and the new classification.
why: The static guard scanned only a pump method's own body, so a blocking call extracted one hop into a helper was invisible — the dominant evasion (the natural refactor that shipped the os.write hang). The real code is clean through its helper chains today; this catches the next regression. what: - Add same-class call-graph closure: a pump method's scan follows self.<helper>() into sibling methods (visited-set bounded). @offload workers are passed to run_worker, not called as self.x(), so the closure does not follow into them; NB-9 json-confinement stays intraprocedural, untouched. - test_pump_methods_have_no_blocking_calls uses the closure; a detector test pins that it follows watch_x -> _do_io -> open().
why: Fixing the two CPU stalls and widening the static guard (same-class closure, @on/scheduler seeding, import-aware denylist) left the skill, AGENTS.md, and ADR 0011 describing the old weaker guard and listing already-fixed sites as live. what: - skill: reframe the "trap" holes to what still slips (cross-module/dynamic dispatch, lambda/partial scheduler targets, generic-attr/CPU); mark the @on/scheduler/signal catalog rows seeded; note the two former "live gaps" are now bounded. - AGENTS.md + ADR 0011: the guard now follows same-class helpers and seeds @on/scheduler callees; residual is cross-module or dynamic dispatch and CPU spin.
cd3994c to
8a493e8
Compare
Owner
Author
Code reviewFound 1 issue:
agentgrep/src/agentgrep/ui/app_screen.py Lines 1996 to 2013 in 8a493e8 🤖 Generated with Claude Code - If this code review was useful, please react with 👍. Otherwise, react with 👎. |
why: The find-detail base cache was keyed only by rendered body text. When a filter changed while the same record stayed selected, detail rendering was skipped and find navigation could reuse filter-highlight spans from the previous filter state. what: - Key the cached find-detail base by rendered body plus search and filter highlight state. - Clear the cached key with the cached body on fresh detail renders. - Add a same-record filter-change regression covering find navigation after skipped detail rendering.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
ui/_runtime.py: a pump-threadsys.addaudithookthat flags blocking-I/O initiation denylist-free, plus a log-only heartbeat watchdog that now defaults on for an interactive terminal.@on/scheduler/call_from_thread/subscribecallees, and resolves import aliases against an expanded denylist.textual-non-blocking-pumpskill plus AGENTS.md review rules and an ADR 0011 "coverage limits" record, so the methodology survives beyond this branch.Changes by area
The two live hangs
src/agentgrep/ui/app_screen.py— Find-in-detail caches the syntax+search+filter base and copies it, overlaying only the find-match spans, so stepping matches (n/N) never re-tokenizes the body; the cache is invalidated on every detail re-render.src/agentgrep/ui/widgets/results.py— Results rows are memoized by record id, so a filter rebuild reuses renders instead of rebuilding everyText; invalidated on theme switch and cleared with the widget on a new search.Runtime detection / prevention
src/agentgrep/ui/_runtime.py—sys.addaudithookscoped to the pump thread flags blocking-I/O initiation (socket.connect,subprocess.Popen,sqlite3.connect,time.sleep, …) regardless of how the call is spelled; raises under pytest, logs interactively (opt-in viaAGENTGREP_TUI_WATCHDOG). The heartbeat watchdog now defaults on for an interactive TTY (log-only), decoupled from the@pump_only/@offloadasserts so the default can never crash a user on a latent violation.Static guard
tests/test_tui_non_blocking.py— Same-class interprocedural closure (followsself.helper());@onand scheduler/call_from_thread/subscribecallee classification; import-aware denylist normalization plus conservative family expansion (network, fs-walk,json.load,input). The NB-9 JSON-confinement check stays intraprocedural by design.Methodology
.agents/skills/textual-non-blocking-pump/— Pump-entrypoint catalog, the static-prevent vs runtime-detect split, per-change review rules, and a hang-audit procedure (.claude/skillssymlinks to it).AGENTS.md,docs/dev/adr/0011-non-blocking-tui-invariants.md— Non-blocking review rules and the "Coverage limits and the runtime complement" record.Design decisions
_render_record × Ndominated (~75%) and was redundant — the rows were already rendered during streaming — so a synchronous id-keyed cache fixes it with zero concurrency risk.on_filter_completedis a pump-critical message handler, so awaiting a chunked apply would have blocked queued keystrokes anyway.@pump_only/@offloadasserts can raise, so they stay gated onAGENTGREP_TUI_WATCHDOGand are deliberately decoupled from the watchdog predicate.Test plan
test_pump_audit_hook_behavior(parametrized) — the hook flags blocking-I/O initiation only when armed and on the pump; a non-I/O event and a pure CPU spin never trip ittest_runtime_enable_predicates(parametrized truth-table) —watchdog_enabled/ guard-asserts /audit_hook_enableddecouple over env, pytest marker, and TTYtest_forbidden_call_detector_resolves_aliases_and_families,test_classifier_sees_scheduled_callables_and_on_handlers,test_closure_follows_self_helpers— the guard catches alias/family evasion,@on/scheduler callees, andwatch_x → _do_io → open()test_pump_methods_have_no_blocking_calls— the upgraded interprocedural scan passes clean on the real UI codetest_detail_find_step_reuses_syntax_base— ann/Nstep reuses the cached base (noSyntaxrebuild)test_filter_rebuild_reuses_cached_row_renders,test_new_search_clears_results_render_cache— a rebuild reuses cached rows; a fresh search releases themruff check,ruff format,ty check,py.test --reruns 0,just build-docs