fix(webview/cdp): replace wall-clock load timeout with idle-watchdog (#1213)#1312
Conversation
📝 WalkthroughWalkthroughThe PR replaces a hardcoded 15-second load watchdog with an idle-based system that resets on page-progress events. A progress channel coordinates between the CDP session and a spawned idle-watchdog task, which emits timeout signals if the page stays idle beyond IDLE_BUDGET or exceeds HARD_CEILING. The tokio dev-dependency gains the test-util feature to support the new test suite. ChangesIdle Watchdog System
Sequence DiagramsequenceDiagram
participant Session as CDP Session Task
participant ProgressCh as Progress Channel
participant Watchdog as Idle Watchdog Task
participant Emit as emit_load_finished
Session->>ProgressCh: Create channel
Session->>Watchdog: Spawn watchdog with receiver
activate Watchdog
Watchdog->>Watchdog: Start idle timer (IDLE_BUDGET) and total timer (HARD_CEILING)
Session->>Session: Enable Page lifecycle events
Session->>Session: Receive CDP Page.* event
alt is_progress_signal(event)
Session->>ProgressCh: try_send(progress ping)
ProgressCh->>Watchdog: Receive ping
Watchdog->>Watchdog: Reset idle timer
else terminal loadEventFired
Session->>ProgressCh: Drop sender
end
alt idle expired OR hard ceiling exceeded
Watchdog->>Emit: emit_load_finished("timeout")
end
deactivate Watchdog
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
app/src-tauri/src/cdp/session.rs (1)
163-175: ⚡ Quick winAdd watchdog lifecycle logs.
This new flow emits the frontend timeout, but it never logs the watchdog's own start, clean sender-drop exit, or whether the timeout came from idle silence vs hard ceiling. That makes future false-positive investigations much harder than they need to be.
As per coding guidelines "Default to verbose diagnostics on new/changed flows with entry/exit logging, branches, external calls, retries/timeouts, state transitions, and errors; use stable grep-friendly prefixes and correlation fields; never log secrets or full PII".
Also applies to: 222-258
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/src-tauri/src/cdp/session.rs` around lines 163 - 175, Add explicit, grep-friendly lifecycle logs around the watchdog spawned in the watchdog block: log when the watchdog starts (include account_id and real_url), log on normal exit vs sender-drop exit, and log the reason for any timeout (distinguish "idle_silence" vs "hard_ceiling") when run_idle_watchdog completes; instrument the run_spawned task that wraps run_idle_watchdog (the tokio::spawn closure using progress_rx, IDLE_BUDGET, HARD_CEILING) and the emit_load_finished call so logs include stable prefixes and correlation fields (account_id, real_url) and do not leak PII or secrets. Ensure logs use the existing project logging facility consistent with other code paths (same level and formatting) and add similar start/exit/branch logs for the related block at the second occurrence (lines around the other run_idle_watchdog usage).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@app/src-tauri/src/cdp/session.rs`:
- Around line 242-255: The idle-watchdog tokio::select! should be made biased so
progress pings are prioritized: modify the select! in session.rs (the block that
uses progress_rx.recv(), wake_after, and on_timeout) by adding `biased;` at the
top of the select and reorder the branches so the recv = progress_rx.recv()
branch is listed before the sleep(wake_after) branch; this ensures the
progress_rx.recv() branch is always polled first and prevents false-positive
timeouts when both are ready.
- Line 17: The watchdog currently uses std::time::Instant so Tokio's mocked
clock advances (tokio::time::advance) are ignored and the test
idle_watchdog_hard_ceiling_caps_runaway_progress hangs; replace
std::time::Instant with tokio::time::Instant in the imports and update any
Instant::now() usages in the watchdog code (e.g., inside the idle watchdog
functions/structs and any place referenced by
idle_watchdog_hard_ceiling_caps_runaway_progress) to call
tokio::time::Instant::now() so the mocked Tokio clock affects elapsed time
checks while keeping std::time::Duration for duration arithmetic.
---
Nitpick comments:
In `@app/src-tauri/src/cdp/session.rs`:
- Around line 163-175: Add explicit, grep-friendly lifecycle logs around the
watchdog spawned in the watchdog block: log when the watchdog starts (include
account_id and real_url), log on normal exit vs sender-drop exit, and log the
reason for any timeout (distinguish "idle_silence" vs "hard_ceiling") when
run_idle_watchdog completes; instrument the run_spawned task that wraps
run_idle_watchdog (the tokio::spawn closure using progress_rx, IDLE_BUDGET,
HARD_CEILING) and the emit_load_finished call so logs include stable prefixes
and correlation fields (account_id, real_url) and do not leak PII or secrets.
Ensure logs use the existing project logging facility consistent with other code
paths (same level and formatting) and add similar start/exit/branch logs for the
related block at the second occurrence (lines around the other run_idle_watchdog
usage).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: c66220f9-8ba6-4987-8c09-71de88043ace
⛔ Files ignored due to path filters (2)
Cargo.lockis excluded by!**/*.lockapp/src-tauri/Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (2)
app/src-tauri/Cargo.tomlapp/src-tauri/src/cdp/session.rs
senamakel
left a comment
There was a problem hiding this comment.
resolve merge conflicts and coderabbit comments
…inyhumansai#1213) The previous design fired a single 15s wall-clock timer after spawn and emitted `webview-account:load{state:"timeout"}` if it elapsed before either the native `on_page_load` handler or the CDP `Page.loadEventFired` event arrived. That was the wrong primitive: providers like Google Meet take 20-30s to fully hydrate on a cold open, so the watchdog kept spuriously firing the timeout overlay even though the page was making steady progress. Bumping the constant to 25s only papered over the symptom — the real fault is using elapsed wall time as a proxy for "page is stuck". The fix replaces the constant with an idle-watchdog driven by CDP progress signals. `spawn_session` now also creates a small `mpsc::channel<()>` between the session task and the watchdog. The `pump_events` callback inside `run_session_cycle` sends a `()` ping on every progress-relevant Page-domain method: * `Page.frameStartedLoading` * `Page.frameStoppedLoading` * `Page.frameNavigated` * `Page.lifecycleEvent` (after `Page.setLifecycleEventsEnabled(true)`) * `Page.loadEventFired` * `Page.domContentEventFired` The watchdog runs a `tokio::select!` between an idle-budget sleep (`IDLE_BUDGET = 8s`) and `progress_rx.recv()`. Each ping resets the sleep; `IDLE_BUDGET` of true silence fires the timeout. A separate `HARD_CEILING = 60s` caps total runtime so an infinite-redirect loop or chatty long-poll can't hold the spinner up forever. When the session task ends cleanly the sender drops, `recv()` returns `None`, and the watchdog exits without emitting a stale timeout. The dedup guard in `emit_load_finished` (`loaded_accounts`) still suppresses any late timeout that races a real `finished` event. Page-domain only — we do NOT enable `Network.enable` here so existing provider scanners that rely on toggling that domain themselves don't get clobbered. Tests: 6 new unit tests in `cdp::session::tests`: * `is_progress_signal_recognises_known_page_methods` * `is_progress_signal_rejects_unrelated_methods` * `idle_watchdog_fires_after_idle_budget_with_no_progress` — virtual time * `idle_watchdog_resets_on_each_progress_ping` — drip pings <8s, no fire * `idle_watchdog_exits_clean_when_sender_dropped_before_idle` * `idle_watchdog_hard_ceiling_caps_runaway_progress` — pings every 1s for 65s still fires the ceiling Adds `tokio` `test-util` dev-feature so `#[tokio::test(start_paused = true)]` compiles. `run_idle_watchdog` and `is_progress_signal` are `pub(crate)` so the tests can drive them with a mock channel without standing up a real CDP connection. Closes tinyhumansai#1213. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
e9da826 to
bbf586c
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@app/src-tauri/src/cdp/session.rs`:
- Around line 184-199: Logs in the watchdog code are currently emitting the full
provider URL (real_url) which may contain PII/secrets; update the logging to
redact the URL by computing a safe string (e.g., use the URL origin or a
redacted_url that strips path and query) before calling log::debug / log::info /
log::warn in the watchdog block (refer to run_idle_watchdog invocation,
WatchdogOutcome handling, and account_id usage); replace real_url in those log
calls with the redacted value and apply the same change to the other watchdog
log statements (the ones around the WatchdogOutcome exit path) to ensure no full
URLs are logged.
- Around line 299-333: run_idle_watchdog currently only listens to progress_rx
and timeouts, so it can outlive a terminal load event; change it to also listen
for an explicit completion signal and return immediately when that fires.
Concretely: update run_idle_watchdog signature to accept an additional
completion receiver (e.g. a oneshot::Receiver<()> or a watch::Receiver<bool>
named finished_rx) and add a branch in the tokio::select! to handle
finished_rx.recv() (or finished_rx.closed()/changed()) that returns a new
WatchdogOutcome::Finished (or reuse SenderDropped) so the watchdog stops
immediately on first terminal load event; then update the caller that currently
holds progress_tx to send/close that completion channel when Page.loadEventFired
(the first terminal load) — alternatively, re-arm/cancel the watchdog per load
cycle by creating a new progress/finished pair per navigation — apply the same
change at the other occurrence noted in the review (lines 649-657).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: af063972-12ac-48e3-a91b-19c5ab812560
📒 Files selected for processing (2)
app/src-tauri/Cargo.tomlapp/src-tauri/src/cdp/session.rs
…l load (tinyhumansai#1213) Two CodeRabbit nits from the post-rebase re-review. 1. The new watchdog start/timeout/exit log lines persisted the full provider URL (Meet/Slack URLs include account fragments and OAuth bounce params that count as PII). Pipe `real_url` through the existing `webview_accounts::redact_url_for_log` helper so query + fragment are stripped before the URL hits the log. Made the helper `pub(crate)` to share it across modules. 2. `run_idle_watchdog` previously only exited on timeout or sender drop. After a successful load the `progress_tx` Sender stayed alive inside `pump_events`, so the watchdog kept running through the full idle budget after `Page.loadEventFired` before finally exiting (the resulting `timeout` event was suppressed by `loaded_accounts` dedup, but the task itself lingered ~8 s longer than necessary). Replace the bare `mpsc::Sender<()>` parameter with an `Arc<Mutex<Option<Sender>>>` slot threaded through `run_session_forever` → `run_session_cycle` → pump_events callback. The original Sender lives ONLY inside this slot (no extra clones), so the callback's `take()` on `Page.loadEventFired` drops the sole sender and the watchdog `recv()` returns `None` → `WatchdogOutcome::SenderDropped` exit. `take()` is idempotent so late SPA route-change `Page.loadEventFired`s after the first load are no-ops. Tests: existing 19 `cdp::session` + 73 `webview_accounts` cases all green. The sender-drop/biased-recv/idle-budget/hard-ceiling unit tests already cover the watchdog state machine; the new slot indirection is exercised transitively. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
app/src-tauri/src/webview_accounts/mod.rs (1)
1437-1447:⚠️ Potential issue | 🟠 Major | ⚡ Quick winRedaction still leaks URL userinfo credentials
redact_url_for_logremoves query/fragment, buthttps://user:pass@host/...still logsuser[:pass]. Please redact userinfo too before formatting.🔧 Suggested fix
pub(crate) fn redact_url_for_log(raw: &str) -> String { match Url::parse(raw) { Ok(mut u) => { + // Strip potential basic-auth credentials from authority. + let _ = u.set_username(""); + let _ = u.set_password(None); u.set_query(None); u.set_fragment(None); u.to_string() } Err(_) => { // Fallback: drop everything from the first '?' or '#'. raw.split(['?', '#']).next().unwrap_or(raw).to_string() } } }As per coding guidelines:
**/*.rs: “never log secrets or full PII”.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/src-tauri/src/webview_accounts/mod.rs` around lines 1437 - 1447, redact_url_for_log currently clears query and fragment but leaves URL userinfo; update it to remove userinfo before returning: in the Url::parse(…) Ok(mut u) branch call u.set_username("") and u.set_password(None) (or equivalent) before u.set_query(None)/u.set_fragment(None) so the produced string has no user or password; in the Err(_) fallback also strip any leading "userinfo@" (e.g. take the substring after the last '@') and then drop query/fragment as currently done so malformed URLs do not leak credentials.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@app/src-tauri/src/webview_accounts/mod.rs`:
- Around line 1437-1447: redact_url_for_log currently clears query and fragment
but leaves URL userinfo; update it to remove userinfo before returning: in the
Url::parse(…) Ok(mut u) branch call u.set_username("") and u.set_password(None)
(or equivalent) before u.set_query(None)/u.set_fragment(None) so the produced
string has no user or password; in the Err(_) fallback also strip any leading
"userinfo@" (e.g. take the substring after the last '@') and then drop
query/fragment as currently done so malformed URLs do not leak credentials.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 771714da-8808-4141-b155-e70575353605
📒 Files selected for processing (2)
app/src-tauri/src/cdp/session.rsapp/src-tauri/src/webview_accounts/mod.rs
|
@senamakel — addressed in
19 |
Summary
LOAD_TIMEOUT = 15swatchdog with a CDP-progress-driven idle watchdog. The watchdog only fireswebview-account:load{state:"timeout"}afterIDLE_BUDGET = 8sof true silence (no Page-domain progress signal) OR after aHARD_CEILING = 60stotal runtime guard.mpsc::channel<()>between the session task and the watchdog so the existingpump_eventscallback incdp/session.rscan ping the watchdog on every progress-relevant CDP method.Page.setLifecycleEventsEnabled(true)so finer-grained signals (init,firstPaint,DOMContentLoaded,load,networkAlmostIdle,networkIdle) reset the idle timer.Problem
#1213reported "Google Meet account is taking longer than expected" UI firing on cold-add even though the page was making steady progress. Root cause: the previous design fired a single 15s wall-clock timer afterspawn_sessionand emitted the timeout overlay if either the nativeon_page_loadhandler orPage.loadEventFireddid not arrive in time.That was the wrong primitive for the actual failure mode. Providers like Google Meet take 20–30s to fully hydrate on a cold open, so the watchdog kept firing on healthy pages while a genuinely stuck page got no more than 15s of runway. Bumping the constant to 25s only papered over the symptom — the real fault is using elapsed wall-clock time as a proxy for "page is stuck".
Smoke-test confirmed the symptom on
mainand the prior 25s attempt: gmeet still timed out at the wall-clock budget despite ongoing CDP progress.Solution
Replace the wall-clock with an idle watchdog driven by CDP progress signals:
spawn_sessioncreatesmpsc::channel::<()>(64)and hands the receiver to the watchdog task and the sender to the session task.tokio::select!betweentokio::time::sleep(IDLE_BUDGET)andprogress_rx.recv(). Each()ping resets the loop.IDLE_BUDGETof true silence firesemit_load_finished("timeout")once and returns.HARD_CEILING = 60stotal-runtime guard ensures an infinite-redirect loop or chatty long-poll cannot hold the spinner up forever.recv()returnsNone, and the watchdog exits without emitting a stale timeout.pump_eventscallback insiderun_session_cyclecallstry_sendon every progress-relevant CDP method (Page.frameStartedLoading,Page.frameStoppedLoading,Page.frameNavigated,Page.lifecycleEvent,Page.loadEventFired,Page.domContentEventFired).try_sendso a hypothetical full channel never blocks the CDP event loop.Page.setLifecycleEventsEnabled(true)is best-effort — if it fails, the watchdog still has frame-start/stop + loadEventFired to work with.emit_load_finished(loaded_accounts) still suppresses any late timeout that races a realfinishedevent, so a watchdog timeout that happens to fire after a quiet load completion is correctly dropped.Page-domain only —
Network.enableis not invoked here so existing per-provider scanners that toggle that domain themselves don't get clobbered.run_idle_watchdogandis_progress_signalarepub(crate)so the unit tests can drive them with a mock channel andtokio::time::pause()without standing up a real CDP connection.Manual smoke (this branch): three Google Meet accounts loaded back-to-back. Cold-add #1 took 32 s to reach
recipe-runtime ready— the previous 15 s wall-clock and the 25 s bump would both have fired a spurious timeout. Cold-add #2 reachedstate=finishedin 3 s. Cold-add #3 (post-logout) reachedstate=finishedin 5 s. Zeroload timeout eventlines emitted across all three cycles.Submission Checklist
docs/TESTING-STRATEGY.md— 6 new unit tests incdp::session::testscover the idle/progress/sender-drop/hard-ceiling paths plus theis_progress_signalwhitelistrun_idle_watchdogandis_progress_signalare exercised by 6 new unit tests;spawn_sessionandrun_session_cycleplumbing changes are covered transitively by the existing 59 webview_accounts tests + 16 cdp::session tests (all passing locally)docs/TEST-COVERAGE-MATRIX.md## Related— same reason; no matrix rows touchedCloses #1213in## RelatedImpact
Page.setLifecycleEventsEnabledcall per session cycle. The progress-ping path usestry_sendso a backpressured channel can never block the CDP event loop.spawn_sessionsignature unchanged. Behavioural change only inside the watchdog. Latestate=timeoutevents that race a realfinishedevent continue to be deduped byloaded_accountsinwebview_accounts/mod.rs.Page.setLifecycleEventsEnabledis local CDP only.Related
AI Authored PR Metadata (required for Codex/Linear PRs)
Linear Issue
Commit & Branch
fix/1213-gmeet-idle-watchdoge9da8264(idle-watchdog),0ef4838e(Cargo.lock bump)Validation Run
pnpm --filter openhuman-app format:checkpnpm typecheck(no frontend changes — implicit pass)cargo test --manifest-path app/src-tauri/Cargo.toml --lib cdp::session(16/16 passing including 6 new),cargo test --lib webview_accounts(59/59 passing)cargo fmt --manifest-path app/src-tauri/Cargo.toml --all --checkclean;cargo check --manifest-path app/src-tauri/Cargo.tomlcleanValidation Blocked
command:N/Aerror:N/Aimpact:N/ABehavior Changes
Parity Contract
emit_load_finisheddedup of late timeouts vs earlyfinishedevents unchanged;spawn_sessionsignature unchanged; reveal/webview-account:loadsemantics unchanged for finished-state events.Page.setLifecycleEventsEnabled(true)is best-effort — if it fails, the watchdog still receives frame-start/stop +Page.loadEventFiredsignals.Duplicate / Superseded PR Handling
fix/gmeet-load-timeout-11659484110531404059on the fork (no PR opened from it; branch will be deleted post-merge).🤖 Generated with Claude Code
Summary by CodeRabbit
Bug Fixes
Tests
Chores