diff --git a/docs/cdp/INDEX.md b/docs/cdp/INDEX.md new file mode 100644 index 0000000..a552ff6 --- /dev/null +++ b/docs/cdp/INDEX.md @@ -0,0 +1,63 @@ +# CDP zettel for superpowers-chrome + +Atomic notes on Chrome DevTools Protocol topics most relevant to a library that drives Chrome over CDP for automation and agent use. Each card makes one claim, in its own words, with sources and links to related cards. + +Written 2026-05-11 by Yarrow (Bob). Form follows the `taking-smart-notes` skill; cards live here (per Matt's request) rather than in a `notes/zettel/` slipbox. + +## Reading order for a newcomer + +If you're new to CDP and want to learn it in order: + +1. [flatten-mode-and-sessionid-envelope](flatten-mode-and-sessionid-envelope.md) — the protocol convention everything else builds on +2. [one-ws-many-sessions-architecture](one-ws-many-sessions-architecture.md) — the client architecture that uses it +3. [per-session-message-id-counters](per-session-message-id-counters.md) — the correctness invariant you must preserve +4. [target-domain-target-types](target-domain-target-types.md) — what's out there to attach to +5. [target-autoattach-vs-discovertargets](target-autoattach-vs-discovertargets.md) — how to capture children safely +6. [runtime-evaluate-three-modes](runtime-evaluate-three-modes.md) — the workhorse command and its trap modes +7. [navigation-listener-ordering-race](navigation-listener-ordering-race.md) — the event-ordering invariant +8. [page-loadeventfired-is-not-ready](page-loadeventfired-is-not-ready.md) — what "page loaded" really means + +## Index — all cards + +### Session protocol & architecture + +- **[flatten-mode-and-sessionid-envelope](flatten-mode-and-sessionid-envelope.md)** — Flatten mode makes sessionId a message-envelope field, not a connection property; it's the modern default. +- **[one-ws-many-sessions-architecture](one-ws-many-sessions-architecture.md)** — One browser-level WebSocket multiplexing N flatten-mode sessions is the contemporary transport pattern. +- **[per-session-message-id-counters](per-session-message-id-counters.md)** — Each CDP session has its own id counter; collapsing id space silently breaks correlation. + +### Targets & lifecycle + +- **[target-domain-target-types](target-domain-target-types.md)** — CDP targets are not just pages: workers, iframes, browser, "tab" are all distinct with their own session shape. +- **[target-autoattach-vs-discovertargets](target-autoattach-vs-discovertargets.md)** — `setAutoAttach` captures children safely; `setDiscoverTargets` is observation only. +- **[browser-context-for-test-isolation](browser-context-for-test-isolation.md)** — `disposeBrowserContext` is atomic; per-test cookie-scrubbing is incomplete by construction. +- **[target-attached-without-detach-leaks](target-attached-without-detach-leaks.md)** — Forgetting `detachFromTarget` leaks sessionIds and subscriptions until process exit. + +### Page interaction primitives + +- **[runtime-evaluate-three-modes](runtime-evaluate-three-modes.md)** — `Runtime.evaluate`'s three orthogonal modes (returnByValue, awaitPromise, exceptionDetails) and when each matters. +- **[isolated-worlds-and-execution-contexts](isolated-worlds-and-execution-contexts.md)** — Isolated worlds share the DOM but not the JS heap; the defence against hostile-page monkey-patching. +- **[navigation-listener-ordering-race](navigation-listener-ordering-race.md)** — Register the load-event listener before issuing `Page.navigate`, or fast pages will fire it before you're listening. +- **[page-loadeventfired-is-not-ready](page-loadeventfired-is-not-ready.md)** — `Page.loadEventFired` is `window.onload`, not "page is ready to interact with." + +### Network instrumentation + +- **[network-vs-fetch-domains](network-vs-fetch-domains.md)** — Network observes; Fetch intercepts. `Network.requestIntercepted` is deprecated. + +### Process, transport, and modes + +- **[chrome-process-lifecycle-traps](chrome-process-lifecycle-traps.md)** — Four lifecycle traps any Chrome-spawning library spends complexity on. +- **[cdp-pipe-vs-websocket-transport](cdp-pipe-vs-websocket-transport.md)** — `--remote-debugging-pipe` is structurally safer than `--remote-debugging-port`; Chrome 136 added a `--user-data-dir` requirement. +- **[headless-new-vs-shell](headless-new-vs-shell.md)** — Chrome 132 removed `--headless=old`; `chrome-headless-shell` is now a separate download. + +### Ecosystem context + +- **[puppeteer-as-cdp-reference-implementation](puppeteer-as-cdp-reference-implementation.md)** — When the docs are ambiguous, Puppeteer is the canonical implementation to read. +- **[webdriver-bidi-vs-cdp-trajectory](webdriver-bidi-vs-cdp-trajectory.md)** — CDP stays Chrome-specific debugging; BiDi is the cross-browser standard, with non-trivial overlap during the transition. + +## Suggested cluster reads + +- **"I'm extending the browser-WS bridge"**: flatten-mode + one-ws-many-sessions + per-session-message-id-counters + target-attached-without-detach-leaks + puppeteer-as-cdp-reference-implementation. +- **"I'm adding request interception"**: network-vs-fetch + target-autoattach + navigation-listener-ordering-race. +- **"I'm hardening for hostile pages"**: isolated-worlds + runtime-evaluate-three-modes + browser-context-for-test-isolation. +- **"I'm shipping in a container or sandboxed env"**: headless-new-vs-shell + cdp-pipe-vs-websocket-transport + chrome-process-lifecycle-traps. +- **"I'm thinking about cross-browser someday"**: webdriver-bidi-vs-cdp-trajectory + puppeteer-as-cdp-reference-implementation. diff --git a/docs/cdp/browser-context-for-test-isolation.md b/docs/cdp/browser-context-for-test-isolation.md new file mode 100644 index 0000000..9675cf5 --- /dev/null +++ b/docs/cdp/browser-context-for-test-isolation.md @@ -0,0 +1,17 @@ +# Target.createBrowserContext is the right unit of test isolation; per-test cookie scrubbing is the wrong one + +A Chrome BrowserContext is like an incognito profile but scoped to a programmatic lifetime: created via `Target.createBrowserContext`, scoped to receive new pages via `Target.createTarget({browserContextId})`, and disposed atomically via `Target.disposeBrowserContext`. Disposing tears down cookies, localStorage, sessionStorage, IndexedDB, cache storage, service worker registrations, and any pages still open inside the context — in one call, with no race between deletions. + +The contrast that justifies preferring it: a hand-rolled "reset state between tests" routine that calls `Network.clearBrowserCookies`, `Storage.clearDataForOrigin`, and friends is always incomplete. It misses storage types added after the routine was written (e.g. service worker registrations from a feature added later), it has ordering problems (clearing cookies after a redirect already fired), and it cannot atomically guarantee that no in-flight network call from the previous test mutates state for the next one. `disposeBrowserContext` makes that impossible by construction — the renderer process is torn down with its storage. + +The cost is real: BrowserContexts are not free. Each one is roughly a fresh incognito session — new disk allocations, new HTTP connection pools, new service-worker registration scope. For high-volume parallel testing, the right pattern is "one context per worker, recycle every N tests" not "one context per test." For agent-driven flows where isolation is the *whole point* (a fresh session for each agent run), one context per run is correct and the cost is amortized. + +## For superpowers-chrome +The library exposes `createBrowserContext({proxyServer?})` via the bridge, returning `{browserContextId, createPage, dispose}`. An advanced consumer building per-run isolation should create a context at session start, build all pages inside it, and call `dispose` at teardown. The library does not currently force this — pages created via `newTab()` go into Chrome's default context. A consumer that wants strict isolation needs to use the bridge's `createBrowserContext` API directly and skip the convenience `newTab`. + +See also: [target-domain-target-types](target-domain-target-types.md), [chrome-process-lifecycle-traps](chrome-process-lifecycle-traps.md) + +Sources: +- CDP Target domain (createBrowserContext/disposeBrowserContext): https://chromedevtools.github.io/devtools-protocol/tot/Target/ +- `superpowers-chrome/skills/browsing/lib/browser-bridge.js` (createBrowserContext implementation) +- gauntlet commit cda4f03 (BrowserContext-based per-test isolation rationale) diff --git a/docs/cdp/cdp-pipe-vs-websocket-transport.md b/docs/cdp/cdp-pipe-vs-websocket-transport.md new file mode 100644 index 0000000..e654f8c --- /dev/null +++ b/docs/cdp/cdp-pipe-vs-websocket-transport.md @@ -0,0 +1,21 @@ +# --remote-debugging-pipe carries CDP over inherited file descriptors; --remote-debugging-port exposes it on the network + +CDP is a JSON-RPC protocol; how the bytes move is a transport detail with security and operational consequences. Chrome supports two transports for it. + +**`--remote-debugging-port=N`** is the familiar one. Chrome binds a TCP socket on localhost:N (or a configured interface), exposes the HTTP discovery endpoints (`/json/version`, `/json/list`), and accepts WebSocket upgrades for `/devtools/browser/` and `/devtools/page/`. Anyone with network access to that port — including, historically, malicious local processes — can drive the browser. This is the transport every "remote debugging in Chrome" article and library defaults to. + +**`--remote-debugging-pipe`** does the same JSON-RPC over inherited file descriptors: Chrome reads CDP messages from FD 3, writes responses and events to FD 4. There is no network socket at all. Only the parent process that launched Chrome (with those FDs set up via `spawn`/`posix_spawn` options) can talk to it. The benefits: no port to leak or collide, no localhost-attack surface, works in sandboxed environments (gVisor, Firecracker) that block runtime TCP bind, lower per-message overhead (no TCP/WS framing). + +A 2026-relevant change: from Chrome 136, both `--remote-debugging-port` and `--remote-debugging-pipe` are *refused* if you're targeting the default Chrome user-data-dir. You must pass `--user-data-dir=/somewhere/else`. This is a response to cookie-theft malware that was reading from the default profile via the debugging interface; legitimate automation always passed its own profile anyway, so the user impact should be small but the diagnostic "Chrome silently exits when I add `--remote-debugging-port=9222`" gets a new explanation. + +For a CDP library: pipe transport is structurally safer, requires a different I/O loop (line-delimited JSON over FDs instead of WebSocket framing), and is what Puppeteer uses by default when it spawns Chrome itself. Most libraries that connect to a separately-launched Chrome are stuck on the port transport because pipe requires inheriting FDs from the launch. + +## For superpowers-chrome +The library uses the port transport (WebSocket) and always launches Chrome with its own `--user-data-dir`, so the Chrome 136 change is already handled. Adding pipe-transport support would be a substantial change — the WebSocket plumbing in `lib/websocket-client.js` would need a pipe analogue, and the library would lose the ability to attach to an already-running Chrome (the typical workflow for some consumers). A useful intermediate: document the security-posture difference so consumers running in untrusted environments know to consider pipe-launched alternatives. + +See also: [chrome-process-lifecycle-traps](chrome-process-lifecycle-traps.md), [headless-new-vs-shell](headless-new-vs-shell.md) + +Sources: +- Chrome blog, "Changes to remote debugging switches to improve security" (Chrome 136 user-data-dir requirement): https://developer.chrome.com/blog/remote-debugging-port +- chromedp issue on pipe transport: https://github.com/chromedp/chromedp/issues/1607 +- Puppeteer Launcher defaults (pipe transport for spawned Chrome): https://github.com/puppeteer/puppeteer/tree/main/packages/puppeteer-core/src/node diff --git a/docs/cdp/chrome-process-lifecycle-traps.md b/docs/cdp/chrome-process-lifecycle-traps.md new file mode 100644 index 0000000..16ea057 --- /dev/null +++ b/docs/cdp/chrome-process-lifecycle-traps.md @@ -0,0 +1,21 @@ +# Spawning and reconnecting to Chrome from a library is mostly fighting four lifecycle traps + +A CDP automation library that owns the Chrome process spends a surprising fraction of its complexity on lifecycle, not on protocol. Four traps recur: + +**1. User-data-dir collisions.** Two Chrome processes pointed at the same `--user-data-dir` will fight over the profile lock and one will exit immediately. If your library reuses a profile across runs (to keep cookies/extensions), you must either serialize launches or use one profile per concurrent run. The fix-by-construction is per-session profile directories. The fix-by-coordination is a meta-file written into the profile dir that records the active PID and port, checked on launch. + +**2. Port-binding race.** If you ask for a specific `--remote-debugging-port`, two parallel launches will collide. If you ask Chrome to pick (no port flag), Chrome writes the chosen port to `/DevToolsActivePort` — but you have to poll that file because Chrome creates it asynchronously. The cleaner fix is to find a free port *before* spawn (bind, get the port, close, pass it) and accept the small race window where another process can steal it. + +**3. Zombie processes.** Chrome forks itself extensively — one browser process, one renderer per site instance, GPU process, network service, utility processes. If you kill only the parent, children survive on some platforms (macOS especially) until the OS reaps them, often hanging onto the user-data-dir lock. Either kill the process group (`-pid` on Unix), use `Browser.close` via CDP first (graceful shutdown), or both. + +**4. Reconnecting to a Chrome that died.** Across restarts of your library/MCP server, the Chrome you launched may still be alive (graceful) or may have crashed (you have a stale port number in your meta file). The typical pattern is: read the meta file, probe the port (HTTP GET to `/json/version`), if it responds reconnect, else clear the meta and launch fresh. Probing must be fast and tolerant; a slow probe blocks library startup, and a strict probe (e.g. requiring a specific version field) breaks across Chrome updates. + +## For superpowers-chrome +`lib/chrome-process.js` and `lib/chrome-launcher-helpers.js` handle all four: per-profile meta.json with `{port, pid}`, `findAvailablePort` for dynamic allocation, `isPortAlive` probe with PID matching for reconnection, graceful shutdown via `/json/close` then SIGTERM, port-based PID fallback for `killChrome` when the library doesn't own the process. The shape is right; the parts most worth review periodically are the probe timeouts (currently 15s startup poll) and the killing strategy on Linux/Windows where process-group handling diverges. + +See also: [cdp-pipe-vs-websocket-transport](cdp-pipe-vs-websocket-transport.md), [headless-new-vs-shell](headless-new-vs-shell.md), [browser-context-for-test-isolation](browser-context-for-test-isolation.md) + +Sources: +- `superpowers-chrome/skills/browsing/lib/chrome-process.js` (the in-tree implementation) +- Chrome's `DevToolsActivePort` file behavior: https://chromium.googlesource.com/chromium/src/+/main/content/browser/devtools/ +- Chrome blog on `--user-data-dir` requirement from Chrome 136: https://developer.chrome.com/blog/remote-debugging-port diff --git a/docs/cdp/flatten-mode-and-sessionid-envelope.md b/docs/cdp/flatten-mode-and-sessionid-envelope.md new file mode 100644 index 0000000..17e38f3 --- /dev/null +++ b/docs/cdp/flatten-mode-and-sessionid-envelope.md @@ -0,0 +1,17 @@ +# Flatten mode makes sessionId a message-envelope field, not a connection property + +In CDP's legacy ("non-flat") session protocol, attaching to a child target produced a *nested* session: you sent `Target.sendMessageToTarget` containing a serialized inner message, and Chrome replied with `Target.receivedMessageFromTarget` containing a serialized inner reply. Each layer of attachment added a wrapper. Flatten mode — turned on by passing `flatten: true` to `Target.attachToTarget` (and `Target.setAutoAttach`) — collapses that. The same WebSocket carries top-level messages tagged with a `sessionId` field alongside the usual `id` / `method` / `params`. The official docs say: *"We plan to make this the default, deprecate non-flattened mode, and eventually retire it."* Puppeteer, Playwright, and every modern CDP client default to `flatten: true`. + +The practical shape: one outbound message looks like `{"id":7,"sessionId":"","method":"Page.navigate","params":{...}}`. A reply or event with `sessionId` set belongs to that page session; one without `sessionId` is a root (browser-level) message. The router on your side keys on `sessionId` to dispatch. + +The reason this matters more than it sounds: flatten mode is what makes a *single* browser-level WebSocket viable as the transport for an arbitrary number of pages, workers, and out-of-process iframes. Without it, every page attachment doubled the wire envelope and demanded a custom unwrap on every message. With it, sessionId becomes a routing label on otherwise normal CDP traffic. + +## For superpowers-chrome +The library opens exactly one CDP WebSocket per Chrome process (against `/devtools/browser/`) and obtains a `sessionId` for each page via `Target.attachToTarget({targetId, flatten: true})`. Page action commands ride that envelope. An advanced consumer wanting to attach to additional targets (OOPIFs, service workers, popup windows) should attach with `flatten: true` for the same reason — there is no good argument to opt into the legacy nested protocol in 2026. + +See also: [one-ws-many-sessions-architecture](one-ws-many-sessions-architecture.md), [per-session-message-id-counters](per-session-message-id-counters.md), [target-autoattach-vs-discovertargets](target-autoattach-vs-discovertargets.md), [puppeteer-as-cdp-reference-implementation](puppeteer-as-cdp-reference-implementation.md) + +Sources: +- Chrome DevTools Protocol — Target domain: https://chromedevtools.github.io/devtools-protocol/tot/Target/ +- Andrey Lushnikov, "Getting Started With Chrome DevTools Protocol": https://github.com/aslushnikov/getting-started-with-cdp +- Puppeteer `Connection.ts` (uses `flatten: true`): https://github.com/puppeteer/puppeteer/blob/main/packages/puppeteer-core/src/cdp/Connection.ts diff --git a/docs/cdp/headless-new-vs-shell.md b/docs/cdp/headless-new-vs-shell.md new file mode 100644 index 0000000..451bea1 --- /dev/null +++ b/docs/cdp/headless-new-vs-shell.md @@ -0,0 +1,19 @@ +# Chrome 132 removed --headless=old; "new headless" is now the only thing in the Chrome binary, chrome-headless-shell is a separate download + +Chrome had two headless implementations for years. The original ("old headless") was a separate code path — a lightweight wrapper around Chromium's `//content` module with substantially fewer dependencies (no X11/Wayland/D-Bus required), but with a different surface from real Chrome: missing extensions, missing print preview, divergent network stack behavior in subtle places. The "new headless" introduced in Chrome 112 is the same Chrome binary as the headful build, just running without UI surfaces. Same code path, same features, same behavior. + +As of Chrome 132 (early 2025), the old binary is gone from the standard Chrome distribution. Passing `--headless=old` fails; `--headless` and `--headless=new` both run the unified mode. The old implementation lives on as a separately-downloadable binary called `chrome-headless-shell`, distributed via the Chrome for Testing infrastructure (one build per Chrome release, available from Chrome 120 onward). + +The trade-off to pick from in 2026: use the regular Chrome binary in `--headless` mode when you need feature parity with what a user sees (real extensions, real print pipeline, real Web APIs); use `chrome-headless-shell` when you need the lightweight, lower-dependency, lower-RAM footprint and accept the reduced feature set. Most automation testing wants the unified mode now. Bots that scrape at scale, run inside minimal containers, or care about startup latency may still prefer the shell. + +A CDP-relevant note: both modes speak the same CDP. The protocol commands are identical. Where behavior can differ is in features the shell doesn't ship (e.g., headless-shell can't run extensions, so extension-related Target types won't appear). For most automation libraries, the CDP code path is identical; the user-facing difference is the binary path and the Chrome version compatibility matrix. + +## For superpowers-chrome +`chrome-process.js`'s binary search list (`/Applications/Google Chrome.app/Contents/MacOS/Google Chrome`, `/usr/bin/google-chrome`, etc.) targets the unified Chrome binary, which is correct for 2026. The library passes `--headless=new` (or `--headless` for old-Chrome compatibility — verify in `buildChromeArgs`). An advanced consumer running in a container who wants the smaller surface should be able to point at a `chrome-headless-shell` binary; today this requires setting the binary path via env or args. The library doesn't currently surface that override. + +See also: [cdp-pipe-vs-websocket-transport](cdp-pipe-vs-websocket-transport.md), [chrome-process-lifecycle-traps](chrome-process-lifecycle-traps.md) + +Sources: +- Chrome blog, "Chrome Headless mode": https://developer.chrome.com/docs/chromium/headless +- Chrome blog, "Removing --headless=old from Chrome": https://developer.chrome.com/blog/removing-headless-old-from-chrome +- Chrome blog, "Download old Headless Chrome as chrome-headless-shell": https://developer.chrome.com/blog/chrome-headless-shell diff --git a/docs/cdp/isolated-worlds-and-execution-contexts.md b/docs/cdp/isolated-worlds-and-execution-contexts.md new file mode 100644 index 0000000..1edd87f --- /dev/null +++ b/docs/cdp/isolated-worlds-and-execution-contexts.md @@ -0,0 +1,19 @@ +# Isolated worlds give you a script sandbox that shares the DOM but not the JS heap + +A frame in a Chromium renderer has one "main world" execution context (the page's normal JavaScript environment) and zero or more isolated worlds. Isolated worlds share the same DOM tree — `document.querySelector` works the same — but have *separate* global objects, separate prototype chains, separate variable bindings. The page can't see your variables; you can't see the page's. Extensions use this to inject content scripts without colliding with site code; CDP automation uses it for the same reason. + +You create one via `Page.createIsolatedWorld({frameId, worldName, grantUniveralAccess?})`, which returns an `executionContextId`. You then pass that id (or its `uniqueContextId` equivalent from `Runtime.executionContextCreated`) to `Runtime.evaluate({contextId, ...})` and your code runs in the sandbox. + +The motivation for caring even when you're not writing an extension: the page can redefine globals. A site that does `window.fetch = sketchyFetch` or `Element.prototype.click = noop` breaks any automation script that uses those names in the main world. Running in an isolated world means you get the pristine prototypes, immune to site monkey-patching. Puppeteer's "utility world" pattern uses this — its internal helpers run in an isolated world so they're robust against hostile pages. + +The events you need to track context lifecycle: `Runtime.executionContextCreated` (one per world per frame), `Runtime.executionContextDestroyed`, `Runtime.executionContextsCleared` (on navigation). If you cache a `contextId` across a navigation, you will eval into a dead context and Chrome will reject with "Cannot find context with specified id." Resubscribe on navigation or, better, look up the current contextId at call time. + +## For superpowers-chrome +The library currently evals exclusively in the main world (no `contextId` passed). For driving cooperative pages this is fine and simpler. An advanced consumer needing to defend against hostile pages, build a stable injection layer that doesn't conflict with site libraries, or implement Puppeteer-style utility helpers should add isolated-world support: call `Page.createIsolatedWorld` once per frame on attach, cache the contextId keyed by frameId, refresh on `Runtime.executionContextsCleared`, pass it to `Runtime.evaluate`. + +See also: [runtime-evaluate-three-modes](runtime-evaluate-three-modes.md), [target-domain-target-types](target-domain-target-types.md), [navigation-listener-ordering-race](navigation-listener-ordering-race.md) + +Sources: +- CDP Page domain (createIsolatedWorld): https://chromedevtools.github.io/devtools-protocol/tot/Page/ +- CDP Runtime domain (executionContext events): https://chromedevtools.github.io/devtools-protocol/tot/Runtime/ +- Puppeteer utility world pattern (in `IsolatedWorld.ts`): https://github.com/puppeteer/puppeteer/tree/main/packages/puppeteer-core/src/cdp diff --git a/docs/cdp/navigation-listener-ordering-race.md b/docs/cdp/navigation-listener-ordering-race.md new file mode 100644 index 0000000..55ca8e4 --- /dev/null +++ b/docs/cdp/navigation-listener-ordering-race.md @@ -0,0 +1,19 @@ +# Register the load-event listener before issuing Page.navigate, or fast pages will fire it before you're listening + +The naive shape for "navigate and wait for load" — `await send("Page.navigate", {...}); await waitForEvent("Page.loadEventFired")` — has a race. For fast-loading URLs (especially `data:` URLs, cached pages, or local files), `loadEventFired` can arrive between the resolution of `Page.navigate` and the start of your event subscription. You then wait forever for an event Chrome already sent. + +The fix is to register the listener *synchronously, before* sending the navigation. The shape becomes: enable the Page domain (idempotent), set up the waitForEvent promise (which adds the listener immediately), *then* send `Page.navigate`, then await both. Chrome's event delivery is ordered relative to the message that triggered it, so as long as the listener is attached before the trigger message goes on the wire, the event won't be lost. + +This pattern generalizes: any CDP flow of the form "do X, then wait for the event X causes" needs the listener attached first. Examples: `Target.attachToTarget` then wait for `Runtime.executionContextCreated`; `Page.captureScreenshot` with `fromSurface` then wait for the frame; `Network.emulateNetworkConditions` then wait for the next request to confirm timing. The race is invisible on slow operations and bites instantly on fast ones, which is why it survives most test suites until it doesn't. + +A second, related trap: `Page.loadEventFired` is the `window.onload` equivalent, not "the page is ready." It fires when subresources have loaded, not when the SPA has hydrated, the React tree has mounted, or the user-visible content has settled. For SPAs you usually want either `Page.frameStoppedLoading`, `Page.domContentEventFired`, or an explicit `Runtime.evaluate` that polls for an app-specific readiness marker. + +## For superpowers-chrome +The library's `lib/navigation.js` does this correctly: `const loadP = ps.waitForEvent('Page.loadEventFired', ...)` is set up before `await ps.send('Page.navigate', ...)`. The comment in the code even names the failure mode it prevents ("fast loading pages (data: URLs) can't fire loadEventFired before we're listening"). Any new "do-then-wait" helper a consumer adds should preserve this ordering; the library has a small set of helper events it observes (loadEventFired, executionContextCreated, frameNavigated) and getting any of them wrong reintroduces the class. + +See also: [runtime-evaluate-three-modes](runtime-evaluate-three-modes.md), [page-loadeventfired-is-not-ready](page-loadeventfired-is-not-ready.md), [target-autoattach-vs-discovertargets](target-autoattach-vs-discovertargets.md) + +Sources: +- CDP Page domain: https://chromedevtools.github.io/devtools-protocol/tot/Page/ +- `superpowers-chrome/skills/browsing/lib/navigation.js` (the in-tree implementation of the pattern) +- gauntlet commit 183cd60 (navigate rejects on Page.navigate error, listener WS error/close, and timeout — closely related defensive coverage) diff --git a/docs/cdp/network-vs-fetch-domains.md b/docs/cdp/network-vs-fetch-domains.md new file mode 100644 index 0000000..62fdf5d --- /dev/null +++ b/docs/cdp/network-vs-fetch-domains.md @@ -0,0 +1,21 @@ +# Network domain observes requests; Fetch domain intercepts them — Network.requestIntercepted is deprecated, use Fetch.requestPaused + +CDP has two overlapping HTTP-flavored domains. Telling them apart is the first thing to get right when adding traffic instrumentation. + +**`Network`** is for observation. `Network.enable` turns on a stream of events: `Network.requestWillBeSent`, `Network.responseReceived`, `Network.loadingFinished`, `Network.loadingFailed`. You can read headers, see body sizes, get the response body via `Network.getResponseBody(requestId)` after the response arrives. You *cannot* modify, block, or substitute the request through this domain in the modern protocol. `Network.setRequestInterception` and the related `Network.requestIntercepted` event are deprecated; new code should not call them. + +**`Fetch`** is for interception. `Fetch.enable({patterns: [...]})` causes Chrome to pause matching requests and emit `Fetch.requestPaused`. The request hangs until you respond with `Fetch.continueRequest` (let it proceed, optionally with mutations), `Fetch.fulfillRequest` (synthesize a response, never hit the network), or `Fetch.failRequest` (abort with a reason). Patterns filter by URL, resource type, and the request stage (request vs. response — you can pause again after the response headers arrive for response-stage mutations). + +The split exists because observation and interception have different cost profiles. Always-on observation of every request is cheap; pausing every request to give your code a chance to mutate it adds round-trips to every page load. Fetch patterns let you opt in narrowly — e.g. "only pause requests to api.example.com" — instead of paying the latency tax universally. + +A trap that shows up in practice: enabling `Fetch` without responding to every `requestPaused` event will hang the page. Tests pass with one slow request; production with hundreds of subresources stalls indefinitely. Always wire a "continue everything we don't care about" handler before enabling Fetch in production. + +## For superpowers-chrome +The library doesn't currently use either domain in the orchestrator surface; consumers wanting traffic capture or mutation must reach for the page session's raw `send`/`onEvent`. Recommended extension shape: a `pageSession.network` namespace that wraps `Network.enable` and exposes a request observer; a separate `pageSession.intercept(patterns, handler)` that wraps `Fetch.enable` with a default-continue handler to prevent hangs. Keep them separate — bundling them risks consumers paying interception cost when they only wanted observation. + +See also: [target-autoattach-vs-discovertargets](target-autoattach-vs-discovertargets.md), [runtime-evaluate-three-modes](runtime-evaluate-three-modes.md) + +Sources: +- CDP Network domain: https://chromedevtools.github.io/devtools-protocol/tot/Network/ +- CDP Fetch domain: https://chromedevtools.github.io/devtools-protocol/tot/Fetch/ +- Deprecation note on `Network.requestIntercepted`: https://chromedevtools.github.io/devtools-protocol/tot/Network/#event-requestIntercepted diff --git a/docs/cdp/one-ws-many-sessions-architecture.md b/docs/cdp/one-ws-many-sessions-architecture.md new file mode 100644 index 0000000..5cd970b --- /dev/null +++ b/docs/cdp/one-ws-many-sessions-architecture.md @@ -0,0 +1,17 @@ +# One browser-level WebSocket multiplexing N flatten-mode sessions is the modern CDP transport + +CDP exposes two WebSocket entry points per Chrome process: `/devtools/browser/` (root/browser-level) and `/devtools/page/` (one per page target). Historically, automation libraries opened a fresh per-page WebSocket for each tab they drove. The contemporary practice — Puppeteer's `Connection`, Playwright's CDP client, chrome-devtools-mcp via Puppeteer, and modern hand-rolled clients — opens *only* the browser-level WS, then multiplexes every per-page conversation over it as flatten-mode sessions. + +The architecture has three moving pieces: (1) a single WebSocket to the browser endpoint; (2) a session map keyed by `sessionId`, populated when `Target.attachToTarget` returns; (3) a dispatcher that reads each incoming frame and routes by the `sessionId` field — messages without one go to root, with one go to the matching session's pending-request map or event listeners. Page sessions never own a socket; they own an id-counter, a pending-requests map, an event-listener set, and the right to push messages onto the shared WS with their sessionId in the envelope. + +The reason to prefer this over per-page sockets is failure-mode reduction. A per-page WebSocket can drop independently of Chrome (renderer crash, navigation, OOPIF process churn), and library code then has to either reconnect-and-resubscribe or surface a transport error. A single browser-WS dies only when Chrome itself dies, at which point every session is already gone. The downside is that you must implement sessionId-aware dispatch correctly; the upside is one transport-lifecycle bug instead of N. + +## For superpowers-chrome +The library has already adopted this shape: `lib/browser-session.js` owns the one WS, `lib/cdp-router.js` dispatches by sessionId, `lib/page-session.js` is the per-session handle that ships commands via `sendRaw`. An advanced consumer extending the library should not introduce per-page sockets; new capabilities (OOPIF inspection, service-worker debugging, popups) should reuse the bridge and acquire additional page sessions from it. + +See also: [flatten-mode-and-sessionid-envelope](flatten-mode-and-sessionid-envelope.md), [per-session-message-id-counters](per-session-message-id-counters.md), [target-attached-without-detach-leaks](target-attached-without-detach-leaks.md), [puppeteer-as-cdp-reference-implementation](puppeteer-as-cdp-reference-implementation.md) + +Sources: +- Lushnikov, "Getting Started With CDP", `sessions.js` example: https://github.com/aslushnikov/getting-started-with-cdp +- Puppeteer `Connection.ts` (single WS, sessionId-based dispatch): https://github.com/puppeteer/puppeteer/blob/main/packages/puppeteer-core/src/cdp/Connection.ts +- `superpowers-chrome/skills/browsing/lib/browser-session.js`, `cdp-router.js`, `page-session.js` diff --git a/docs/cdp/page-loadeventfired-is-not-ready.md b/docs/cdp/page-loadeventfired-is-not-ready.md new file mode 100644 index 0000000..925126d --- /dev/null +++ b/docs/cdp/page-loadeventfired-is-not-ready.md @@ -0,0 +1,29 @@ +# Page.loadEventFired is window.onload, not "the page is ready to interact with" + +`Page.loadEventFired` is the CDP event corresponding to the browser's `window.onload`. It fires after the document and all of its synchronously declared subresources (images, stylesheets, scripts) have finished loading. It is *not* a signal that: + +- the page is interactive (try `Page.domContentEventFired` for DOMContentLoaded, or `Page.lifecycleEvent` with name `InteractiveTime`); +- an SPA framework has hydrated; +- React/Vue/Svelte/etc has mounted the application; +- network activity has gone quiet (modern pages keep fetching long after onload); +- user-visible content has settled (LCP / Largest Contentful Paint, available via `Performance.metrics` or `PerformanceTimeline`). + +For a server-rendered page with mostly-static content, `loadEventFired` is a reasonable "navigation done" signal. For an SPA, it usually fires while the page is still showing a loading spinner. Tests that wait for `loadEventFired` and then immediately interact with the SPA will flake. + +The patterns to know: + +- **App-specific marker**: poll `Runtime.evaluate` for a known DOM signal — a stable selector, a `window.__appReady` flag the app sets. Most reliable, requires cooperation from the page. +- **`Page.lifecycleEvent`**: a stream of named events (`firstPaint`, `firstContentfulPaint`, `firstMeaningfulPaint`, `largestContentfulPaint`, `networkAlmostIdle`, `networkIdle`, `load`, `DOMContentLoaded`, etc.). Subscribe via `Page.enable` + `Page.setLifecycleEventsEnabled(true)`. `networkIdle` (two seconds of no in-flight requests, IIRC) is the closest thing to "page has settled" without page cooperation. +- **`Network.loadingFinished` counter**: track in-flight requests, wait for the count to hit zero. Brittle (long-polling or websockets keep the count above zero forever). + +Puppeteer's `page.goto({waitUntil: 'networkidle0'|'networkidle2'|'domcontentloaded'|'load'})` exposes this trade-off as named modes. Picking one badly is a major source of flake in test suites that started simple and grew. + +## For superpowers-chrome +The library's `navigate()` waits on `Page.loadEventFired` and that's it. For most automation against well-behaved pages this is fine; for SPAs, consumers should follow `navigate()` with an explicit `waitForElement` or `waitForText` to land on a stable post-load marker. An advanced extension would be a `navigate({waitUntil})` option mirroring Puppeteer's, with `networkIdle` available via `Page.lifecycleEvent` — but only if the library is willing to take the complexity, since most flake bugs in this area come from libraries' opinionated defaults masking what's actually happening. + +See also: [navigation-listener-ordering-race](navigation-listener-ordering-race.md), [runtime-evaluate-three-modes](runtime-evaluate-three-modes.md), [network-vs-fetch-domains](network-vs-fetch-domains.md) + +Sources: +- CDP Page domain: https://chromedevtools.github.io/devtools-protocol/tot/Page/ +- Puppeteer `page.goto` waitUntil options: https://pptr.dev/api/puppeteer.page.goto +- `superpowers-chrome/skills/browsing/lib/navigation.js` (current loadEventFired wait) diff --git a/docs/cdp/per-session-message-id-counters.md b/docs/cdp/per-session-message-id-counters.md new file mode 100644 index 0000000..8fe37a3 --- /dev/null +++ b/docs/cdp/per-session-message-id-counters.md @@ -0,0 +1,17 @@ +# Each CDP session has its own message-id counter; collapsing id space silently breaks correlation + +When you multiplex many sessions on one WebSocket via flatten mode, you have to decide whether `id` numbers are scoped per WS or per session. The protocol's rule, stated in the Lushnikov getting-started doc: *"clients must provide unique 'id' for commands inside the session, but different sessions might use the same ids."* Each session counts independently; `{id:1, sessionId:"A"}` and `{id:1, sessionId:"B"}` are two unrelated commands. + +The failure mode if you collapse id space across sessions: a response arriving for one session's command can resolve another session's pending request, because the dispatcher (or worse, the lookup) keyed on `id` alone. The bug is invisible until two sessions happen to have an in-flight request with the same id at the same time — at which point one promise resolves with the wrong result and the other hangs forever. Tests pass; production breaks at concurrency. + +There is also a subtler version of the same bug at the root/page boundary: the root browser session has its own pending-request map (for sessionless commands like `Target.attachToTarget`), and page sessions have theirs (for sessionId-tagged commands). The router needs to see *both* `id` and `sessionId` before deciding which map to look in — `data.id !== undefined && data.sessionId === undefined` for root, `id !== undefined && sessionId` for a page session. + +## For superpowers-chrome +The library handles this correctly today: `lib/page-session.js` declares `let messageIdCounter = 1` per session, `lib/browser-session.js` has its own counter for root commands, and `lib/cdp-router.js` reads the `sessionId` field before doing any id lookup. An extension that adds a new transport feature (request batching, response replay) must preserve this invariant — the per-session counter is local-to-the-handle, not local-to-the-transport. + +See also: [flatten-mode-and-sessionid-envelope](flatten-mode-and-sessionid-envelope.md), [one-ws-many-sessions-architecture](one-ws-many-sessions-architecture.md) + +Sources: +- Lushnikov, "Getting Started With CDP": https://github.com/aslushnikov/getting-started-with-cdp +- `superpowers-chrome/skills/browsing/lib/page-session.js` (per-session `messageIdCounter`) +- `superpowers-chrome/skills/browsing/lib/cdp-router.js` (router's sessionId-aware dispatch) diff --git a/docs/cdp/puppeteer-as-cdp-reference-implementation.md b/docs/cdp/puppeteer-as-cdp-reference-implementation.md new file mode 100644 index 0000000..d3e0857 --- /dev/null +++ b/docs/cdp/puppeteer-as-cdp-reference-implementation.md @@ -0,0 +1,22 @@ +# Puppeteer's Connection/CDPSession code is the canonical reference for modern CDP client design + +When a CDP protocol question is ambiguous in the docs — does this event fire before that one, should I enable Runtime before Page, do I need to handle this race — the most reliable answer is "what does Puppeteer do." Puppeteer is maintained by Chrome team members, ships changes in lockstep with Chrome releases, and has a much larger production deployment than any spec author or doc reader can match. Its handling of CDP is, in practice, the reference implementation. + +Two specific files reward reading by any library author working at the same level: + +- `packages/puppeteer-core/src/cdp/Connection.ts` — the WebSocket connection, sessionId-based dispatch, root vs. per-session callback registries, message-id management. This is the canonical implementation of the "one browser-WS, N flatten-mode sessions" architecture. +- `packages/puppeteer-core/src/cdp/CDPSession.ts` — the per-session handle. Shows exactly which methods are exposed (`send`, `on`, `off`, `connection`, `detach`), how session lifecycle interacts with target lifecycle, and where the abstraction boundary sits. + +Beyond those two, the `TargetManager.ts`, `IsolatedWorld.ts`, `FrameManager.ts`, and `LifecycleWatcher.ts` files together show how Puppeteer's high-level surface (`page.goto`, `page.click`, `page.waitForNavigation`) decomposes into CDP commands and event-stream subscriptions. The decomposition is opinionated but it represents lessons absorbed from years of edge cases. + +The two patterns from Puppeteer most worth borrowing in any new CDP library: (1) sessions are first-class objects with their own send/event surface, not just sessionId strings passed around; (2) every CDP call is wrapped in code that knows the failure modes of *that specific call* (e.g. `Page.navigate` can return a `frameId` but also fail with `errorText` you have to read from the result), not generic error handling that loses the protocol-specific signal. + +## For superpowers-chrome +The library's `lib/browser-session.js` + `lib/page-session.js` shape closely mirrors Puppeteer's `Connection` + `CDPSession`, which is appropriate given they're solving the same problem. When adding new capabilities (auto-attach, isolated worlds, Fetch interception), checking Puppeteer's handling of the same surface is faster and more reliable than re-deriving it from the protocol docs. + +See also: [flatten-mode-and-sessionid-envelope](flatten-mode-and-sessionid-envelope.md), [one-ws-many-sessions-architecture](one-ws-many-sessions-architecture.md), [webdriver-bidi-vs-cdp-trajectory](webdriver-bidi-vs-cdp-trajectory.md) + +Sources: +- Puppeteer `Connection.ts`: https://github.com/puppeteer/puppeteer/blob/main/packages/puppeteer-core/src/cdp/Connection.ts +- Puppeteer `CDPSession.ts`: https://github.com/puppeteer/puppeteer/blob/main/packages/puppeteer-core/src/cdp/CdpSession.ts +- Puppeteer `cdp/` directory (full implementation): https://github.com/puppeteer/puppeteer/tree/main/packages/puppeteer-core/src/cdp diff --git a/docs/cdp/runtime-evaluate-three-modes.md b/docs/cdp/runtime-evaluate-three-modes.md new file mode 100644 index 0000000..875bbcf --- /dev/null +++ b/docs/cdp/runtime-evaluate-three-modes.md @@ -0,0 +1,25 @@ +# Runtime.evaluate has three orthogonal return modes; choosing wrong loses type information or hangs on Promises + +`Runtime.evaluate` is the workhorse for "run this JS in the page and get the answer back." Three parameters fundamentally change its semantics: + +1. **`returnByValue: true`** — Chrome JSON-serializes the result and ships the value back. The reply's `result.result.value` is what you want. DOM nodes and other host objects become opaque descriptions (`{type: "object", subtype: "node"}`) because JSON-serializing them loses identity. Use this when the expression returns a primitive, plain object, or array of primitives. + +2. **`returnByValue: false`** (the default) — Chrome returns a `RemoteObject` handle: `{type, subtype?, objectId?, description?}`. You can hold the `objectId` and pass it to `Runtime.callFunctionOn` for further operations against the same object identity. This is the path you take when you need to manipulate DOM nodes from outside, hold a reference across calls, or inspect a complex shape without losing it to JSON. + +3. **`awaitPromise: true`** — if the expression evaluates to a Promise, Chrome waits and returns the resolved value (subject to `returnByValue` rules). Without this, your `result.value` is the Promise object itself, which serializes to `{type: "object", subtype: "promise"}` and is useless. *This is the foot-gun for async code* — forgetting it and getting back "Promise" descriptors instead of the awaited value is the most common cause of "why is my eval returning nothing?" + +Two more details that aren't optional: + +- **`exceptionDetails`**: if the expression threw (including an unhandled rejection when `awaitPromise: true`), the reply contains a top-level `exceptionDetails` object alongside `result`. The `result` field is *not* an error — it's the thrown value's RemoteObject (often a string description). Library code must check `exceptionDetails` and surface it, otherwise the caller silently sees a stringified `Error: ...` masquerading as a return value. + +- **`contextId` / `uniqueContextId`**: if omitted, the eval runs in the page's default world. To eval in an isolated world (e.g. one you created via `Page.createIsolatedWorld`), pass the world's context id. `uniqueContextId` is the cross-process-safe variant — recommended for any new code. + +## For superpowers-chrome +The library wraps `Runtime.evaluate` in `lib/evaluation.js` and provides three variants: `evaluate` (returnByValue + awaitPromise), `evaluateJson` (wraps the expression to tag DOM nodes), and `evaluateRaw` (returnByValue: false). All three call `throwIfExceptionDetails` to surface thrown errors. An advanced consumer who needs to hold an objectId across calls should reach for `evaluateRaw`; one who wants typed handling of DOM-vs-primitive returns should reach for `evaluateJson`. + +See also: [isolated-worlds-and-execution-contexts](isolated-worlds-and-execution-contexts.md), [navigation-listener-ordering-race](navigation-listener-ordering-race.md) + +Sources: +- CDP Runtime domain: https://chromedevtools.github.io/devtools-protocol/tot/Runtime/ +- `superpowers-chrome/skills/browsing/lib/evaluation.js` +- gauntlet commit 95910fe (throw on Runtime.evaluate exceptionDetails — the bug this prevents) diff --git a/docs/cdp/target-attached-without-detach-leaks.md b/docs/cdp/target-attached-without-detach-leaks.md new file mode 100644 index 0000000..8114287 --- /dev/null +++ b/docs/cdp/target-attached-without-detach-leaks.md @@ -0,0 +1,19 @@ +# Forgetting Target.detachFromTarget leaks sessionIds and event subscriptions in Chrome until process exit + +Each `Target.attachToTarget` call returns a sessionId and registers state inside Chrome: domain-enabled flags, event subscriptions, the session object itself. `Target.detachFromTarget({sessionId})` releases that state. If you skip the detach call when you're done with a session — either through optimism ("Chrome will clean up when the page closes") or through error handling that drops the session reference without detaching — you leak. The leak is per-Chrome-process and persists until Chrome exits. + +The leak is mostly invisible because individual session state is small. It becomes visible in three situations: (a) long-running browser sessions (an MCP server holding a single Chrome for hours, attaching to popups that come and go); (b) page churn (a flow that opens-and-closes many tabs, each attached); (c) auto-attached children where the parent session's `setAutoAttach` produced child sessions you never explicitly tracked. In all three, you accumulate sessions whose targets are gone but whose subscriptions still match events that, while never reaching any handler, still cost Chrome time to filter. + +The right shape is to treat sessions like file descriptors: every attach is paired with a detach in a try/finally or equivalent. For sessions tied to targets that disappear independently (popup closes, OOPIF navigates away), subscribe to `Target.targetDestroyed` for the targetId and detach the session in the handler — Chrome will not error if you detach a session whose target is already gone. + +There's also a subtler version: even with detach, *event listeners on your side of the wire* may retain references to closed-over state (page handles, captured DOM nodes, console message buffers keyed by sessionId), so your library's session-cleanup must clear its own data structures too, not just call detach. Memory leaks in CDP libraries are usually about client-side maps that grew, not about Chrome holding state. + +## For superpowers-chrome +`lib/page-session.js#detach` calls `Target.detachFromTarget` and then `router.unregisterSession(sessionId)`, which clears the router's per-session state and rejects in-flight requests. `lib/cdp-router.js#unregisterSession` does the same. An advanced consumer attaching to arbitrary targets via `bridge.attachPageSession(targetId)` is responsible for calling `pageSession.detach()` when done. A linting-style sanity check during development: after a flow runs, `bridge.router` should have no sessions for targets that no longer exist. + +See also: [target-domain-target-types](target-domain-target-types.md), [target-autoattach-vs-discovertargets](target-autoattach-vs-discovertargets.md), [one-ws-many-sessions-architecture](one-ws-many-sessions-architecture.md) + +Sources: +- CDP Target domain (attachToTarget/detachFromTarget): https://chromedevtools.github.io/devtools-protocol/tot/Target/ +- `superpowers-chrome/skills/browsing/lib/page-session.js` (detach implementation) +- `superpowers-chrome/skills/browsing/lib/cdp-router.js` (unregisterSession) diff --git a/docs/cdp/target-autoattach-vs-discovertargets.md b/docs/cdp/target-autoattach-vs-discovertargets.md new file mode 100644 index 0000000..38627f7 --- /dev/null +++ b/docs/cdp/target-autoattach-vs-discovertargets.md @@ -0,0 +1,19 @@ +# Target.setAutoAttach is for capturing child targets safely; setDiscoverTargets is for observation only + +These two methods look similar in the protocol reference and are easy to conflate. They are not interchangeable. + +`Target.setDiscoverTargets({discover: true})` says "tell me about all targets and emit `Target.targetCreated/Changed/Destroyed` events." That's a *notification* subscription. You learn that a target exists; you can then choose to call `Target.attachToTarget` for it. Two timing problems: (1) the target may have already started executing scripts by the time `targetCreated` reaches you and you respond with `attachToTarget`; (2) for short-lived targets (e.g. a redirect popup), it may already be gone. + +`Target.setAutoAttach({autoAttach: true, waitForDebuggerOnStart: true, flatten: true})` says "for every new related target, attach automatically and (if `waitForDebuggerOnStart`) pause the target's main script until I send `Runtime.runIfWaitingForDebugger`." This is the only reliable way to configure a popup/OOPIF/service-worker *before* it runs any user code. Without `waitForDebuggerOnStart`, you race the renderer. With it, you can set up `Network.enable`, `Fetch.enable`, request interception patterns, etc., and only then release the target. + +The "auto" in auto-attach is also recursive when paired with `flatten: true`: the parent session's `setAutoAttach` applies to *its* children, so iframes get attached via the page session that owns them, and you get a session tree rooted at the browser. Puppeteer relies on this for OOPIF support — see the 2022 commit that switched Puppeteer to CDP auto-attach for OOPIFs. + +## For superpowers-chrome +The library currently uses `setDiscoverTargets` (in `lib/browser-bridge.js`) for top-level target visibility and does not enable auto-attach. That's adequate for the present scope (user-driven page-level navigation, no request interception). An advanced consumer doing request interception on OAuth popups or wanting to mutate service-worker requests *must* add `setAutoAttach` with `waitForDebuggerOnStart`, configure the child session, then `Runtime.runIfWaitingForDebugger`. Adding this without `waitForDebuggerOnStart` will pass tests and silently miss requests in production. + +See also: [target-domain-target-types](target-domain-target-types.md), [network-vs-fetch-domains](network-vs-fetch-domains.md), [navigation-listener-ordering-race](navigation-listener-ordering-race.md) + +Sources: +- CDP Target domain: https://chromedevtools.github.io/devtools-protocol/tot/Target/ +- Puppeteer commit switching to CDP auto-attach for OOPIFs: https://github.com/puppeteer/puppeteer/commit/2cbfdeb0ca388a45cedfae865266230e1291bd29 +- chrome-devtools-mcp OOPIF issue (still tracking this surface): https://github.com/ChromeDevTools/chrome-devtools-mcp/issues/703 diff --git a/docs/cdp/target-domain-target-types.md b/docs/cdp/target-domain-target-types.md new file mode 100644 index 0000000..72b3dfc --- /dev/null +++ b/docs/cdp/target-domain-target-types.md @@ -0,0 +1,16 @@ +# CDP targets are not just pages: workers, iframes, browser, and "tab" are all distinct target types with their own session shape + +Chrome exposes the world of debuggable things through the Target domain. Every target has a `type` field. The values seen in practice: `page` (a top-level frame), `iframe` (a cross-origin or OOPIF subframe), `worker` (dedicated Web Worker), `service_worker`, `shared_worker`, `browser` (the root target your `/devtools/browser/` connection is implicitly attached to), `webview`, `other` (e.g. extension background pages), and `tab` (a relatively recent addition that groups pages with their prerender/back-forward-cache siblings). + +This taxonomy matters because each type has different capabilities exposed via CDP. A `service_worker` target lets you debug a worker's script — but doesn't have a DOM, so `Page.*` commands fail. An `iframe` OOPIF lives in a different renderer process from its parent page, so you can't reach it from the parent's page session; you have to attach to it as its own target. The `tab` type sits *above* `page` and is what Puppeteer prefers for navigation control because it survives back-forward-cache restores that destroy the old page target. + +The mistake to avoid: treating `/json/list`'s output as a list of pages and filtering to `type === 'page'` is fine for simple automation but loses real workloads. A page that spawns a popup, embeds an OOPIF, or registers a service worker has 3+ targets that are part of the user-visible behavior. If your library only models pages, you're blind to the other halves. + +## For superpowers-chrome +Today the library focuses on `type === 'page'` targets (see the `getPageSession` resolver filter). An advanced consumer who needs to drive OAuth flows (popups), inspect service-worker state, or instrument cross-origin iframes will need to extend the bridge surface to expose attached page sessions for non-page targets. The bridge architecture already supports this — `attachPageSession(targetId)` takes any targetId, not just page ones — but the orchestrator's resolver currently won't return them via the index-based shape. + +See also: [target-autoattach-vs-discovertargets](target-autoattach-vs-discovertargets.md), [browser-context-for-test-isolation](browser-context-for-test-isolation.md), [isolated-worlds-and-execution-contexts](isolated-worlds-and-execution-contexts.md) + +Sources: +- CDP Target domain: https://chromedevtools.github.io/devtools-protocol/tot/Target/ +- Puppeteer's `TargetType` enum and target manager: https://github.com/puppeteer/puppeteer/tree/main/packages/puppeteer-core/src/cdp diff --git a/docs/cdp/webdriver-bidi-vs-cdp-trajectory.md b/docs/cdp/webdriver-bidi-vs-cdp-trajectory.md new file mode 100644 index 0000000..462c69d --- /dev/null +++ b/docs/cdp/webdriver-bidi-vs-cdp-trajectory.md @@ -0,0 +1,20 @@ +# CDP stays the debugging protocol for Chromium; WebDriver BiDi is the cross-browser standard for automation, with non-trivial overlap during the transition + +CDP started as Chrome's internal debugging protocol and accidentally became the de-facto API for browser automation when Puppeteer exposed it. It is Chrome-specific by design — Firefox briefly implemented a subset and stopped; Safari has never. WebDriver BiDi is the W3C-track replacement: a bidirectional, event-driven protocol that aims to give automation tools CDP-level capabilities (network events, console capture, isolated worlds) over a standardized wire that all browsers implement. + +Current state (2026): BiDi is implemented in Chromium, Firefox, and is shipping in WebKit. Puppeteer enables BiDi by default when launching Firefox but still uses CDP when launching Chrome — BiDi doesn't yet cover all of CDP's surface (notably some performance tracing, heap profiling, and protocol-only events). Playwright continues to use a CDP-flavored implementation against Chromium; Selenium is actively migrating to BiDi as its default. The Chrome team has stated they will keep CDP for *debugging* indefinitely; they recommend BiDi for *automation* going forward. + +The practical implication for a CDP automation library in 2026: building purely on CDP is fine for Chrome-only workloads, especially anything that touches debugging primitives (heap snapshots, performance traces, isolated worlds, fine-grained network instrumentation) where BiDi doesn't have parity yet. If cross-browser support ever becomes a goal, expect a substantial rewrite — BiDi's protocol shape is similar but the message types, capability boundaries, and event taxonomies are different. There is no clean adapter layer; libraries that want both maintain two backends. + +The deeper question for any agent-driving library is whether BiDi's *agent-friendly* features (built-in screenshot, accessibility tree access, navigation primitives) make it the right substrate even for Chrome-only work in a few years. The bet is open. Puppeteer hedges by maintaining both; chrome-devtools-mcp commits to Chrome+CDP via Puppeteer; Playwright commits to its own CDP layer with BiDi planned. + +## For superpowers-chrome +The library is correctly positioned as a Chrome+CDP tool. Migrating to BiDi would be a rewrite, not a refactor — the WebSocket transport stays, but the message vocabulary changes. The right time to consider it is when (a) cross-browser support becomes a goal, or (b) BiDi reaches feature parity for the library's actual use cases (most of which are page-driving, navigation, and DOM interaction — all areas BiDi covers today). Tracking BiDi's spec without committing is the prudent stance. + +See also: [flatten-mode-and-sessionid-envelope](flatten-mode-and-sessionid-envelope.md), [puppeteer-as-cdp-reference-implementation](puppeteer-as-cdp-reference-implementation.md) + +Sources: +- Chrome blog, "WebDriver BiDi - The future of cross-browser automation": https://developer.chrome.com/blog/webdriver-bidi +- W3C WebDriver BiDi spec: https://w3c.github.io/webdriver-bidi/ +- Puppeteer guide, "Experimental WebDriver BiDi support": https://pptr.dev/webdriver-bidi +- Selenium WebDriver BiDi overview (Medium): https://medium.com/womenintechnology/selenium-webdriver-bidi-kismet-child-of-webdriver-classic-and-chrome-devtools-protocol-7922f07cded5 diff --git a/skills/browsing/chrome-ws-lib.js b/skills/browsing/chrome-ws-lib.js index 7d7e1aa..d9b56d8 100644 --- a/skills/browsing/chrome-ws-lib.js +++ b/skills/browsing/chrome-ws-lib.js @@ -1,9 +1,26 @@ /** - * Chrome WebSocket Library - Core CDP automation functions - * Used by both CLI and MCP server + * Chrome WebSocket Library — Core CDP automation functions + * + * The orchestrator: a thin wiring layer over `lib/*.js` modules. + * + * Page-action commands ride a single browser-level CDP WebSocket + * (lib/browser-session.js) via `Target.attachToTarget({flatten:true})` + * sessions. Per-page WebSockets are no longer used as the transport for + * actions — the page session (lib/page-session.js) does the work, with + * sessionId routing handled by lib/cdp-router.js. That subsystem + * (browser-session + cdp-router + page-session + browser-bridge) replaces + * the per-page connection pool that previously lived in lib/cdp-connection.js. + * + * The bridge is lazy: the browser-WS is opened on first targets/context/ + * page-session access, not at createSession() time. That way the + * remote-Chrome path (where the caller passes `{host, port}` of an + * already-running Chrome and skips startChrome) works through the same + * code path as the local-launched case. * * Fixes implemented: - * - JRV-130: Connection pooling for persistent focus + * - JRV-130: focus survives across Runtime.evaluate calls (now via the + * persistent page session — same property as the old pool, simpler + * substrate) * - JRV-127: keyboard_press action for special keys * - JRV-123: React-compatible input via Input.insertText * - JRV-124: React-compatible click via Input.dispatchMouseEvent @@ -30,9 +47,10 @@ const { attachExtraction } = require('./lib/extraction'); const { attachScreenshot } = require('./lib/screenshot'); const { attachTabs } = require('./lib/tabs'); const { attachFileUpload } = require('./lib/file-upload'); -const { attachCdpConnection } = require('./lib/cdp-connection'); const { attachConsoleLogging } = require('./lib/console-logging'); const { attachSelectOption } = require('./lib/select-option'); +const { createBrowserSession } = require('./lib/browser-session'); +const { attachBrowserBridge } = require('./lib/browser-bridge'); const { getXdgCacheHome, getChromeProfileDir, @@ -48,60 +66,155 @@ const { * Build a fresh Chrome session — a state-bag scoped to a single Chrome target. * * Pre-factory, every consumer that required this file shared module-level - * state: the connection pool, console-message buffers, the chosen profile - * name, the launched Chrome process handle, the active CDP port, and the - * host-override config. Two consumers in the same process therefore drove a - * single Chrome — fine for the CLI and the MCP server (each owns its - * process), but a hazard for any caller that wants to drive multiple Chromes - * concurrently from one Node process (different ports, different profiles). + * state: console-message buffers, the chosen profile name, the launched + * Chrome process handle, the active CDP port, and the host-override config. + * Two consumers in the same process therefore drove a single Chrome. * * `createSession({ host, port })` returns a fresh instance with private state - * and methods bound to that state. Two instances do not share a connection - * pool, console-message map, profile, Chrome process, or host-override — - * mutating one (e.g. setProfileName, startChrome) has no effect on the other. + * and methods bound to that state. Two instances do not share a console-message + * map, profile, Chrome process, host-override, or browser-WS bridge. * Pass `host`/`port` to seed the host-override; omit them to seed from the - * `CHROME_WS_HOST` / `CHROME_WS_PORT` env vars exactly as before. - * - * The returned object preserves the legacy module-level export shape — the - * one-line consumer migration is `require(...)` becomes - * `require(...).createSession()`. + * `CHROME_WS_HOST` / `CHROME_WS_PORT` env vars. */ function createSession({ host, port } = {}) { const state = createState({ host, port }); - // ============================================================================= - const { - sendCdpCommand, - closePooledConnection, - closeAllConnections, - } = attachCdpConnection({ state }); - - const { chromeHttp, resolveWsUrl, getTabs, newTab, closeTab } = attachTabs({ state }); + // ===== Tabs / chromeHttp / resolveWsUrl ===== + const tabsApi = attachTabs({ state }); + const { chromeHttp, getTabs, newTab, closeTab } = tabsApi; + + // ===== Browser-WS bridge (lazy) ===== + // The browser-WS is opened the first time a consumer reaches for the + // bridge surface (targets, createBrowserContext, attachPageSession, or + // any action lib via getPageSession). It is NOT opened in startChrome — + // the remote-Chrome path bypasses startChrome entirely, and lazy-open + // serves both modes with one code path. + let _browser = null; + let _bridge = null; + + async function _ensureBridge() { + if (_bridge) return _bridge; + if (!_browser) { + _browser = createBrowserSession({ + host: state.hostOverride.getHost(), + port: state.activePort, + rewriteWsUrl: state.rewriteWsUrl, + chromeHttp, + }); + } + _bridge = await attachBrowserBridge({ + browser: _browser, + host: state.hostOverride.getHost(), + port: state.activePort, + rewriteWsUrl: state.rewriteWsUrl, + }); + return _bridge; + } + + async function _closeBridge() { + if (_browser) { + try { await _browser.close(); } catch { /* best-effort */ } + _browser = null; + _bridge = null; + } + } + + // Public bridge wrappers — each lazy-opens the browser-WS on first use. + const targets = { + async list() { return (await _ensureBridge()).targets.list(); }, + async onCreated(fn) { return (await _ensureBridge()).targets.onCreated(fn); }, + async onDestroyed(fn) { return (await _ensureBridge()).targets.onDestroyed(fn); }, + async waitForNew(predicate, opts) { return (await _ensureBridge()).targets.waitForNew(predicate, opts); }, + }; + async function createBrowserContext(opts) { + return (await _ensureBridge()).createBrowserContext(opts); + } + + /** + * Attach a page session to an existing target. Returns + * `{sessionId, targetId, send, onEvent, waitForEvent, enableDomain, detach}`. + * Page sessions ride the browser-WS via `Target.attachToTarget({flatten:true})` + * — no per-page WebSocket, no per-page WS-drop race. + */ + async function attachPageSession(targetId) { + return (await _ensureBridge()).attachPageSession(targetId); + } + + // Wire the lazy attacher into tabs.js so tab handles returned by + // getTabs() / newTab() carry a `getPageSession()` thunk. The thunk goes + // through _ensureBridge → bridge.attachPageSession at call time, so + // there's no construction-order dependency between tabs and the bridge. + tabsApi.setPageSessionAttacher((targetId) => attachPageSession(targetId)); + + /** + * Action-lib argument resolver. + * + * Accepts the legacy shapes that tools/tests use today (numeric tab index, + * `ws://...` URL, numeric string) AND the new shape (an existing + * pageSession object) and returns the corresponding pageSession. + */ + async function getPageSession(arg) { + // Already a pageSession? Pass through. + if (arg && typeof arg.send === 'function' && arg.sessionId) { + return arg; + } + + // Numeric or numeric-string index — index into the current tabs list. + if (typeof arg === 'number' || (typeof arg === 'string' && /^\d+$/.test(arg))) { + const idx = typeof arg === 'number' ? arg : parseInt(arg, 10); + const allTabs = await getTabs(); + const pageTabs = allTabs.filter((t) => t.type === 'page'); + + // Auto-create a tab if none exist (matches the legacy auto-start + // behaviour of resolveWsUrl — tools shouldn't have to special-case + // fresh Chrome). + if (pageTabs.length === 0) { + const newTabInfo = await newTab(); + if (!newTabInfo || !newTabInfo.getPageSession) { + throw new Error('getPageSession: newTab failed to return a tab handle'); + } + return newTabInfo.getPageSession(); + } + + if (!pageTabs[idx]) throw new Error(`getPageSession: no tab at index ${idx} (have ${pageTabs.length})`); + return pageTabs[idx].getPageSession(); + } + + // ws:// URL — find the matching tab. + if (typeof arg === 'string' && arg.startsWith('ws://')) { + const allTabs = await getTabs(); + const rewritten = state.rewriteWsUrl(arg, state.hostOverride.getHost(), state.activePort); + const tab = allTabs.find((t) => t.webSocketDebuggerUrl === rewritten || t.webSocketDebuggerUrl === arg); + if (!tab) throw new Error(`getPageSession: no tab found for ${arg}`); + return tab.getPageSession(); + } + + throw new Error(`getPageSession: unsupported arg type: ${typeof arg}`); + } + + // ===== Action libs ===== const { click, hover, drag, mouseMove, scroll, doubleClick, rightClick } = - attachMouse({ resolveWsUrl, sendCdpCommand }); + attachMouse({ getPageSession }); const { keyboardPress, fill, humanType } = - attachKeyboardInput({ state, resolveWsUrl, sendCdpCommand, click }); - - const { fileUpload } = attachFileUpload({ resolveWsUrl, sendCdpCommand }); - - const { selectOption } = attachSelectOption({ resolveWsUrl, sendCdpCommand }); + attachKeyboardInput({ state, getPageSession, click }); - const { evaluate } = attachEvaluation({ resolveWsUrl, sendCdpCommand }); + const { fileUpload } = attachFileUpload({ getPageSession }); - // ============================================================================= + const { selectOption } = attachSelectOption({ getPageSession }); - const { extractText, getHtml, getAttribute } = attachExtraction({ resolveWsUrl, sendCdpCommand }); + const { evaluate } = attachEvaluation({ getPageSession }); + const { extractText, getHtml, getAttribute } = attachExtraction({ getPageSession }); - const { screenshot } = attachScreenshot({ resolveWsUrl, sendCdpCommand }); + const { screenshot } = attachScreenshot({ getPageSession }); const { startChrome, killChrome, showBrowser, hideBrowser, getBrowserMode, getChromePid, getActivePort, getProfileName, setProfileName } = - attachChromeProcess({ state, chromeHttp, getTabs, newTab }); + attachChromeProcess({ state, chromeHttp, getTabs, newTab, closeBridge: _closeBridge }); const { enableConsoleLogging, getConsoleMessages, clearConsoleMessages } = - attachConsoleLogging({ state, resolveWsUrl }); + attachConsoleLogging({ state, getPageSession }); const { initializeSession, @@ -118,24 +231,23 @@ function createSession({ host, port } = {}) { evaluateWithCapture, } = attachCapture({ state, - resolveWsUrl, - sendCdpCommand, + getPageSession, getHtml, screenshot, actions: { click, fill, selectOption, evaluate }, }); const { navigate, waitForElement, waitForText } = - attachNavigation({ state, resolveWsUrl, sendCdpCommand, capturePageArtifacts, evaluate }); + attachNavigation({ state, getPageSession, capturePageArtifacts, evaluate }); - const { setViewport, clearViewport, getViewport } = attachViewport({ resolveWsUrl, sendCdpCommand }); - const { clearCookies } = attachCookies({ resolveWsUrl, sendCdpCommand }); + const { setViewport, clearViewport, getViewport } = attachViewport({ getPageSession }); + const { clearCookies } = attachCookies({ getPageSession }); return { // Internal helpers (exported for testing) getElementSelector, - // Core browser actions (click/fill now use CDP events by default for React compatibility) + // Core browser actions (click/fill use CDP events by default for React compatibility) getTabs, newTab, closeTab, @@ -208,9 +320,11 @@ function createSession({ host, port } = {}) { generateHtmlDiff, captureActionWithDiff, - // Connection management (JRV-130) - closePooledConnection, - closeAllConnections, + // Browser-WS bridge — Target.* events, BrowserContext create/dispose, + // and per-page CDP sessions over the shared browser-WS. + targets, + createBrowserContext, + attachPageSession, // Dynamic port allocation and per-profile meta.json getActivePort, diff --git a/skills/browsing/lib/browser-bridge.js b/skills/browsing/lib/browser-bridge.js new file mode 100644 index 0000000..3dce9b7 --- /dev/null +++ b/skills/browsing/lib/browser-bridge.js @@ -0,0 +1,163 @@ +// Target.* events + BrowserContext create/dispose, plus the +// cdp-router and page-session attach point — all the browser-WS bridge's +// consumer-facing surface in one module. + +const { createCdpRouter } = require('./cdp-router'); +const { attachPageSession } = require('./page-session'); + +/** + * attachBrowserBridge({browser, host, port, rewriteWsUrl}) — attaches + * Target.setDiscoverTargets to the browser session, tracks the live target + * set, and exposes: + * - targets.list() — synchronous snapshot + * - targets.onCreated(handler) — register listener; returns unsub fn + * - targets.onDestroyed(handler) + * - targets.waitForNew(predicate, {timeoutMs}) + * - createBrowserContext({proxyServer?}) + * - attachPageSession(targetId) — page session over the browser-WS + * + * host/port/rewriteWsUrl are needed by createBrowserContext.createPage to + * construct per-page WS URLs for callers that still want one (the bridge + * itself never uses them — page sessions ride the browser-WS). + */ +async function attachBrowserBridge({ browser, host, port, rewriteWsUrl }) { + const ctxHost = host; + const ctxPort = port; + const ctxRewriteWsUrl = rewriteWsUrl; + + // The cdp-router sits between browser-session and bridge consumers. + // Page-session-tagged messages dispatch to the right session; root-session + // events (Target.*, etc.) fire root listeners. Command responses without + // sessionId stay correlated by browser-session.js's pendingRequests + // (single source of truth for root-session correlation). + const router = createCdpRouter({ browser }); + + const targetMap = new Map(); // targetId -> targetInfo + const onCreatedFns = new Set(); + const onDestroyedFns = new Set(); + + router.getRootListeners().add((msg) => { + if (msg.method === 'Target.targetCreated') { + const t = msg.params.targetInfo; + targetMap.set(t.targetId, t); + for (const fn of onCreatedFns) { + try { fn(t); } catch (e) { console.error('targets onCreated handler threw:', e); } + } + } else if (msg.method === 'Target.targetInfoChanged') { + const t = msg.params.targetInfo; + targetMap.set(t.targetId, t); + } else if (msg.method === 'Target.targetDestroyed') { + const t = targetMap.get(msg.params.targetId); + targetMap.delete(msg.params.targetId); + if (t) { + for (const fn of onDestroyedFns) { + try { fn(t); } catch (e) { console.error('targets onDestroyed handler threw:', e); } + } + } + } + }); + + // Subscribe — replays existing targets as targetCreated events. + await browser.send('Target.setDiscoverTargets', { discover: true }); + + function list() { return Array.from(targetMap.values()); } + function onCreated(fn) { onCreatedFns.add(fn); return () => onCreatedFns.delete(fn); } + function onDestroyed(fn) { onDestroyedFns.add(fn); return () => onDestroyedFns.delete(fn); } + + function waitForNew(predicate, { timeoutMs = 15000 } = {}) { + return new Promise((resolve, reject) => { + let unsub = null; + const timeout = setTimeout(() => { + if (unsub) unsub(); + reject(new Error(`waitForNew: timed out after ${timeoutMs}ms`)); + }, timeoutMs); + unsub = onCreated((t) => { + let match; + try { match = predicate(t); } + catch (e) { + clearTimeout(timeout); + if (unsub) unsub(); + reject(e); + return; + } + if (match) { + clearTimeout(timeout); + if (unsub) unsub(); + resolve(t); + } + }); + }); + } + + /** + * createBrowserContext({proxyServer?}) — creates a Chrome BrowserContext. + * Returns {browserContextId, createPage, dispose}. + * + * createPage(url) calls Target.createTarget({url, browserContextId}) and + * constructs a tab-shape-compatible page handle whose webSocketDebuggerUrl + * is run through rewriteWsUrl. + * + * dispose() is atomic — Chrome tears down cookies/storage/IDB/SW for the + * context in one call. + */ + async function createBrowserContext(opts = {}) { + const params = {}; + if (opts.proxyServer) params.proxyServer = opts.proxyServer; + const { browserContextId } = await browser.send('Target.createBrowserContext', params); + + let disposed = false; + + async function createPage(url = 'about:blank') { + if (disposed) throw new Error('BrowserContext disposed'); + const { targetId } = await browser.send('Target.createTarget', { + url, + browserContextId, + }); + // Construct the per-page WS URL — same shape Chrome's /json/list returns. + const rawWsUrl = `ws://${ctxHost}:${ctxPort}/devtools/page/${targetId}`; + const webSocketDebuggerUrl = ctxRewriteWsUrl(rawWsUrl, ctxHost, ctxPort); + return { + id: targetId, + targetId, + webSocketDebuggerUrl, + type: 'page', + url, + browserContextId, + }; + } + + async function dispose() { + if (disposed) return; + disposed = true; + try { + await browser.send('Target.disposeBrowserContext', { browserContextId }); + } catch (e) { + // best-effort: log but don't throw — dispose is meant to be safe. + console.warn('BrowserContext.dispose() failed:', e && e.message); + } + } + + return { browserContextId, createPage, dispose }; + } + + /** + * Attach a CDP page session to an existing target. Returns a pageSession + * with `{sessionId, targetId, send, onEvent, waitForEvent, enableDomain, detach}`. + * + * The page session rides the browser-WS via `Target.attachToTarget({flatten:true})`. + * No new WebSocket per page; no per-page WS death race. + */ + async function attachPage(targetId) { + return attachPageSession({ browser, router }, targetId); + } + + return { + targets: { list, onCreated, onDestroyed, waitForNew }, + createBrowserContext, + attachPageSession: attachPage, + // Exposed for tests + advanced callers; not part of the public session API. + router, + }; +} + +module.exports = { attachBrowserBridge }; diff --git a/skills/browsing/lib/browser-session.js b/skills/browsing/lib/browser-session.js new file mode 100644 index 0000000..baf00dc --- /dev/null +++ b/skills/browsing/lib/browser-session.js @@ -0,0 +1,165 @@ +// One CDP WebSocket per Chrome process, talking to /devtools/browser/. +// +// This is the transport for Target.* events, BrowserContext create/dispose, +// and every page session attached via Target.attachToTarget({flatten:true}). +// Page-action commands ride the per-page sessions (lib/page-session.js), +// which envelope into this socket via sendRaw and correlate responses +// through the cdp-router. +// +// Lazy: connect happens on first send() / onEvent() call. Both local and +// remote-Chrome callers exercise the same code path — for the remote case +// startChrome() is skipped entirely, and lazy-connect on first bridge use is +// what serves both. + +const { WebSocketClient } = require('./websocket-client'); + +/** + * createBrowserSession({host, port, rewriteWsUrl, chromeHttp}) -> bridge handle. + * + * Discovers the WS URL via chromeHttp('/json/version').webSocketDebuggerUrl + * and pipes it through rewriteWsUrl (same pattern getTabs/newTab use for + * per-page URLs). + * + * Returned API: + * send(method, params?, {timeoutMs?}) -> Promise // root-session command + * onEvent(handler) -> unsub fn + * close() -> Promise + * isConnected() -> boolean + * sendRaw(json) -> void // pre-built envelope + * + * `sendRaw` is the page-session escape hatch: page sessions need to send + * messages with a `sessionId` envelope, but `send()` builds its own + * envelope (and would clash on the id-counter). The page session pre-builds + * the JSON, manages its own id-counter via the cdp-router, and pushes it + * onto the WS through `sendRaw`. Caller is responsible for matching + * responses (the cdp-router does this). + */ +function createBrowserSession({ host, port, rewriteWsUrl, chromeHttp }) { + let ws = null; + const pendingRequests = new Map(); // id -> {resolve, reject, timeout} + let messageIdCounter = 1; + const eventListeners = new Set(); // (msg) => void + let connectPromise = null; // memoized in-flight connect + let closed = false; + + async function ensureConnected() { + if (ws && ws.isConnected()) return; + if (connectPromise) { await connectPromise; return; } + connectPromise = (async () => { + const versionInfo = await chromeHttp('/json/version'); + if (!versionInfo || !versionInfo.webSocketDebuggerUrl) { + throw new Error('chromeHttp(/json/version) returned no webSocketDebuggerUrl'); + } + const url = rewriteWsUrl(versionInfo.webSocketDebuggerUrl, host, port); + const next = new WebSocketClient(url); + next.on('message', (raw) => { + let data; + try { data = JSON.parse(raw); } catch (e) { + console.error('browser-session: bad JSON from CDP:', e); + return; + } + // Id correlation here is for ROOT-session command responses + // only. Page-session responses arrive with {id, result, sessionId} + // — they go to event listeners (the cdp-router dispatches by + // sessionId). Without this guard, a root id=1 would incorrectly + // resolve when a page-session id=1 response arrives, since each + // session has its own id-counter. + if (data.id !== undefined && data.sessionId === undefined) { + const pending = pendingRequests.get(data.id); + if (pending) { + clearTimeout(pending.timeout); + pendingRequests.delete(data.id); + if (data.error) { + pending.reject(new Error(data.error.message || JSON.stringify(data.error))); + } else { + pending.resolve(data.result); + } + return; // handled — don't deliver to event listeners + } + } + // Everything else (events with method, or page-session command + // responses with sessionId) goes to event listeners. The router + // dispatches by sessionId from there. + for (const fn of eventListeners) { + try { fn(data); } catch (e) { console.error('browser-session listener threw:', e); } + } + }); + next.on('close', () => { + for (const [, p] of pendingRequests) { + clearTimeout(p.timeout); + p.reject(new Error('Browser session WS closed')); + } + pendingRequests.clear(); + }); + await next.connect(); + // Only assign `ws` after a successful connect so concurrent callers + // that fall through the `ws && ws.isConnected()` early-return don't + // observe a partially-initialized socket. Don't null connectPromise — + // leaving the resolved promise in place makes subsequent awaits a no-op. + ws = next; + })(); + await connectPromise; + } + + async function send(method, params = {}, { timeoutMs = 10000 } = {}) { + if (closed) throw new Error('Browser session closed'); + await ensureConnected(); + // Re-check after the await — close() may have run during ensureConnected(). + if (closed) throw new Error('Browser session closed'); + const id = messageIdCounter++; + return new Promise((resolve, reject) => { + const timeout = setTimeout(() => { + pendingRequests.delete(id); + reject(new Error(`Browser session timeout: ${method}`)); + }, timeoutMs); + pendingRequests.set(id, { resolve, reject, timeout }); + try { + ws.send(JSON.stringify({ id, method, params })); + } catch (e) { + clearTimeout(timeout); + pendingRequests.delete(id); + reject(e); + } + }); + } + + function onEvent(handler) { + eventListeners.add(handler); + return () => eventListeners.delete(handler); + } + + async function close() { + closed = true; + if (ws) { ws.close(); ws = null; } + for (const [, p] of pendingRequests) { + clearTimeout(p.timeout); + p.reject(new Error('Browser session closed')); + } + pendingRequests.clear(); + eventListeners.clear(); + } + + function isConnected() { return ws !== null && ws.isConnected(); } + + /** + * Send a pre-formed JSON payload. Used by page-session.js to send messages + * with a sessionId envelope without browser-session needing to know about + * sessionIds. Caller is responsible for response correlation (the + * cdp-router does this for page sessions). + * + * Caller must ensure the browser-WS is open. Page sessions reach this + * path only after `attachPageSession` has already issued + * `Target.attachToTarget` via the regular `send`, which lazy-opens the WS. + */ + function sendRaw(json) { + if (closed) throw new Error('Browser session closed'); + if (!ws || !ws.isConnected()) { + throw new Error('Browser WS not connected (call send() first to lazy-open)'); + } + ws.send(json); + } + + return { send, onEvent, close, isConnected, sendRaw }; +} + +module.exports = { createBrowserSession }; diff --git a/skills/browsing/lib/capture.js b/skills/browsing/lib/capture.js index 760e7c5..2b6fda9 100644 --- a/skills/browsing/lib/capture.js +++ b/skills/browsing/lib/capture.js @@ -41,11 +41,10 @@ function ensureProcessHandlersRegistered() { * - WithCapture wrappers: thin adapters that pair an action with a * post-action capturePageArtifacts. * - * `attachCapture({ state, resolveWsUrl, sendCdpCommand, getHtml, - * screenshot, actions: { click, fill, selectOption, evaluate } })` - * returns the bound API. + * Helpers accept `tabIndexOrPageSession` and route through + * `pageSession.send`. */ -function attachCapture({ state, resolveWsUrl, sendCdpCommand, getHtml, screenshot, actions }) { +function attachCapture({ state, getPageSession, getHtml, screenshot, actions }) { function initializeSession() { if (!state.sessionDir) { // ~/.cache/superpowers/browser/YYYY-MM-DD/session-{timestamp} @@ -87,18 +86,18 @@ function attachCapture({ state, resolveWsUrl, sendCdpCommand, getHtml, screensho // Token-efficient page summary: heading list, interactive-element counts, // main/nav landmark detection. Used in the auto-capture artifact bundle so // the model can decide whether to read the .md or .html file. - async function generateDomSummary(tabIndexOrWsUrl) { - const wsUrl = await resolveWsUrl(tabIndexOrWsUrl); - const result = await sendCdpCommand(wsUrl, 'Runtime.evaluate', { + async function generateDomSummary(tabIndexOrPageSession) { + const ps = await getPageSession(tabIndexOrPageSession); + const result = await ps.send('Runtime.evaluate', { expression: domSummaryScript, - returnByValue: true + returnByValue: true, }); throwIfExceptionDetails(result); return result.result.value; } - async function getPageSize(tabIndexOrWsUrl) { - const wsUrl = await resolveWsUrl(tabIndexOrWsUrl); + async function getPageSize(tabIndexOrPageSession) { + const ps = await getPageSession(tabIndexOrPageSession); const js = `({ width: window.innerWidth, @@ -107,9 +106,9 @@ function attachCapture({ state, resolveWsUrl, sendCdpCommand, getHtml, screensho documentHeight: document.documentElement.scrollHeight })`; - const result = await sendCdpCommand(wsUrl, 'Runtime.evaluate', { + const result = await ps.send('Runtime.evaluate', { expression: js, - returnByValue: true + returnByValue: true, }); throwIfExceptionDetails(result); return result.result.value; @@ -118,11 +117,11 @@ function attachCapture({ state, resolveWsUrl, sendCdpCommand, getHtml, screensho // Render the page to markdown for token-efficient consumption. Includes // images >= 100x100 in a header summary; inlines image references >= 50x50 // with size info; skips smaller icons. - async function generateMarkdown(tabIndexOrWsUrl) { - const wsUrl = await resolveWsUrl(tabIndexOrWsUrl); - const result = await sendCdpCommand(wsUrl, 'Runtime.evaluate', { + async function generateMarkdown(tabIndexOrPageSession) { + const ps = await getPageSession(tabIndexOrPageSession); + const result = await ps.send('Runtime.evaluate', { expression: markdownScript, - returnByValue: true + returnByValue: true, }); throwIfExceptionDetails(result); return result.result.value; @@ -131,15 +130,15 @@ function attachCapture({ state, resolveWsUrl, sendCdpCommand, getHtml, screensho // Single post-action snapshot: html + markdown + screenshot + console-log // placeholder, all parallelised. Filenames share a numbered prefix so the // session dir reads like a flat timeline. - async function capturePageArtifacts(tabIndexOrWsUrl, actionType = 'navigate') { + async function capturePageArtifacts(tabIndexOrPageSession, actionType = 'navigate') { const prefix = createCapturePrefix(actionType); const dir = initializeSession(); const [html, markdown, pageSize, domSummary] = await Promise.all([ - getHtml(tabIndexOrWsUrl), - generateMarkdown(tabIndexOrWsUrl), - getPageSize(tabIndexOrWsUrl), - generateDomSummary(tabIndexOrWsUrl) + getHtml(tabIndexOrPageSession), + generateMarkdown(tabIndexOrPageSession), + getPageSize(tabIndexOrPageSession), + generateDomSummary(tabIndexOrPageSession), ]); const htmlPath = path.join(dir, `${prefix}.html`); @@ -151,7 +150,7 @@ function attachCapture({ state, resolveWsUrl, sendCdpCommand, getHtml, screensho fs.writeFileSync(markdownPath, markdown || ''); fs.writeFileSync(consoleLogPath, '# Console Log\n# TODO: Console logging not yet implemented\n'); - await screenshot(tabIndexOrWsUrl, screenshotPath); + await screenshot(tabIndexOrPageSession, screenshotPath); return { capturePrefix: prefix, @@ -160,10 +159,10 @@ function attachCapture({ state, resolveWsUrl, sendCdpCommand, getHtml, screensho html: htmlPath, markdown: markdownPath, screenshot: screenshotPath, - consoleLog: consoleLogPath + consoleLog: consoleLogPath, }, pageSize, - domSummary + domSummary, }; } @@ -171,13 +170,13 @@ function attachCapture({ state, resolveWsUrl, sendCdpCommand, getHtml, screensho // get the action result alongside the diff and screenshots. Saves and // restores focus around the BEFORE screenshot — taking a screenshot can // shift focus, which then breaks any focus-dependent action that follows. - async function captureActionWithDiff(tabIndexOrWsUrl, actionType, actionFn, settleTime = 3000) { + async function captureActionWithDiff(tabIndexOrPageSession, actionType, actionFn, settleTime = 3000) { const prefix = createCapturePrefix(actionType); const dir = initializeSession(); - const wsUrl = await resolveWsUrl(tabIndexOrWsUrl); + const ps = await getPageSession(tabIndexOrPageSession); async function saveFocus() { - const result = await sendCdpCommand(wsUrl, 'Runtime.evaluate', { + const result = await ps.send('Runtime.evaluate', { expression: ` (() => { const el = document.activeElement; @@ -199,7 +198,7 @@ function attachCapture({ state, resolveWsUrl, sendCdpCommand, getHtml, screensho return { type: 'path', value: focusPath }; })() `, - returnByValue: true + returnByValue: true, }); throwIfExceptionDetails(result); return result.result?.value; @@ -225,31 +224,31 @@ function attachCapture({ state, resolveWsUrl, sendCdpCommand, getHtml, screensho })()`; } if (selector) { - const restoreResult = await sendCdpCommand(wsUrl, 'Runtime.evaluate', { - expression: `(() => { const el = ${selector}; if (el) el.focus(); })()` + const restoreResult = await ps.send('Runtime.evaluate', { + expression: `(() => { const el = ${selector}; if (el) el.focus(); })()`, }); throwIfExceptionDetails(restoreResult); } } // BEFORE: html + screenshot, with focus saved/restored around the screenshot. - const beforeHtml = await getHtml(tabIndexOrWsUrl); + const beforeHtml = await getHtml(ps); const focusInfo = await saveFocus(); const beforeScreenshotPath = path.join(dir, `${prefix}-before.png`); - await screenshot(tabIndexOrWsUrl, beforeScreenshotPath); + await screenshot(ps, beforeScreenshotPath); await restoreFocus(focusInfo); const actionResult = await actionFn(); // Settle: lets React re-renders, animations, and post-action XHRs complete // before the AFTER snapshot. - await new Promise(resolve => setTimeout(resolve, settleTime)); + await new Promise((resolve) => setTimeout(resolve, settleTime)); const [afterHtml, markdown, pageSize, domSummary] = await Promise.all([ - getHtml(tabIndexOrWsUrl), - generateMarkdown(tabIndexOrWsUrl), - getPageSize(tabIndexOrWsUrl), - generateDomSummary(tabIndexOrWsUrl) + getHtml(ps), + generateMarkdown(ps), + getPageSize(ps), + generateDomSummary(ps), ]); const diff = generateHtmlDiff(beforeHtml, afterHtml); @@ -264,7 +263,7 @@ function attachCapture({ state, resolveWsUrl, sendCdpCommand, getHtml, screensho fs.writeFileSync(afterHtmlPath, afterHtml || ''); fs.writeFileSync(diffPath, diff); fs.writeFileSync(markdownPath, markdown || ''); - await screenshot(tabIndexOrWsUrl, afterScreenshotPath); + await screenshot(ps, afterScreenshotPath); return { actionResult, @@ -277,21 +276,21 @@ function attachCapture({ state, resolveWsUrl, sendCdpCommand, getHtml, screensho diff: diffPath, markdown: markdownPath, beforeScreenshot: beforeScreenshotPath, - afterScreenshot: afterScreenshotPath + afterScreenshot: afterScreenshotPath, }, pageSize, domSummary, - diffSummary: diff.split('\n').slice(0, 5).join('\n') + (diff.split('\n').length > 5 ? '\n...' : '') - } + diffSummary: diff.split('\n').slice(0, 5).join('\n') + (diff.split('\n').length > 5 ? '\n...' : ''), + }, }; } // *WithCapture wrappers — perform an action, then capturePageArtifacts. // The MCP server consumes these directly; the bare action variants stay // exported for callers (and tests) that don't want auto-capture. - async function clickWithCapture(tabIndexOrWsUrl, selector) { - await actions.click(tabIndexOrWsUrl, selector); - const artifacts = await capturePageArtifacts(tabIndexOrWsUrl, 'click'); + async function clickWithCapture(tabIndexOrPageSession, selector) { + await actions.click(tabIndexOrPageSession, selector); + const artifacts = await capturePageArtifacts(tabIndexOrPageSession, 'click'); return { action: 'click', selector, @@ -300,13 +299,13 @@ function attachCapture({ state, resolveWsUrl, sendCdpCommand, getHtml, screensho sessionDir: artifacts.sessionDir, files: artifacts.files, domSummary: artifacts.domSummary, - consoleLog: [] // Placeholder + consoleLog: [], // Placeholder }; } - async function fillWithCapture(tabIndexOrWsUrl, selector, value) { - await actions.fill(tabIndexOrWsUrl, selector, value); - const artifacts = await capturePageArtifacts(tabIndexOrWsUrl, 'type'); + async function fillWithCapture(tabIndexOrPageSession, selector, value) { + await actions.fill(tabIndexOrPageSession, selector, value); + const artifacts = await capturePageArtifacts(tabIndexOrPageSession, 'type'); return { action: 'type', selector, @@ -316,13 +315,13 @@ function attachCapture({ state, resolveWsUrl, sendCdpCommand, getHtml, screensho sessionDir: artifacts.sessionDir, files: artifacts.files, domSummary: artifacts.domSummary, - consoleLog: [] // Placeholder + consoleLog: [], // Placeholder }; } - async function selectOptionWithCapture(tabIndexOrWsUrl, selector, value) { - await actions.selectOption(tabIndexOrWsUrl, selector, value); - const artifacts = await capturePageArtifacts(tabIndexOrWsUrl, 'select'); + async function selectOptionWithCapture(tabIndexOrPageSession, selector, value) { + await actions.selectOption(tabIndexOrPageSession, selector, value); + const artifacts = await capturePageArtifacts(tabIndexOrPageSession, 'select'); return { action: 'select', selector, @@ -332,13 +331,13 @@ function attachCapture({ state, resolveWsUrl, sendCdpCommand, getHtml, screensho sessionDir: artifacts.sessionDir, files: artifacts.files, domSummary: artifacts.domSummary, - consoleLog: [] // Placeholder + consoleLog: [], // Placeholder }; } - async function evaluateWithCapture(tabIndexOrWsUrl, expression) { - const result = await actions.evaluate(tabIndexOrWsUrl, expression); - const artifacts = await capturePageArtifacts(tabIndexOrWsUrl, 'eval'); + async function evaluateWithCapture(tabIndexOrPageSession, expression) { + const result = await actions.evaluate(tabIndexOrPageSession, expression); + const artifacts = await capturePageArtifacts(tabIndexOrPageSession, 'eval'); return { action: 'eval', expression, @@ -348,7 +347,7 @@ function attachCapture({ state, resolveWsUrl, sendCdpCommand, getHtml, screensho sessionDir: artifacts.sessionDir, files: artifacts.files, domSummary: artifacts.domSummary, - consoleLog: [] // Placeholder + consoleLog: [], // Placeholder }; } diff --git a/skills/browsing/lib/cdp-connection.js b/skills/browsing/lib/cdp-connection.js deleted file mode 100644 index 73606dd..0000000 --- a/skills/browsing/lib/cdp-connection.js +++ /dev/null @@ -1,188 +0,0 @@ -const { WebSocketClient } = require('./websocket-client'); - -// Default per-CDP-call timeout. Caller can override via the `timeout` -// parameter on sendCdpCommand. -const DEFAULT_CDP_TIMEOUT_MS = 30000; - -/** - * CDP transport — pooled WebSocket connections to Chrome's debugger. - * - * Why pooling matters (JRV-130): the original single-use connection per - * `Runtime.evaluate` call lost focus between calls because each new - * connection re-attached to the page as a fresh debugger client. The - * pool keeps one persistent ws per tab, so focus/state survives across - * commands. - * - * `sendCdpCommand` is the public entry point. It tries the pool first - * and falls back to a single-use connection if the pooled call throws — - * the fallback is a safety net for the rare case where the pooled - * connection is wedged (broken socket, frame parse error) but a fresh - * connection would still work. - * - * Per-connection request ids start at 1 in each pooled `conn`. The - * single-use path always uses id=1 because each fresh ws has nothing to - * collide with. - * - * The pool eventHandler hook (`conn.eventHandler`) is consumed by - * `enableConsoleLogging` (and any future caller that wants to listen on - * the persistent socket without spinning up a second one). - * - * `attachCdpConnection({ state })` returns the bound API. - */ -function attachCdpConnection({ state }) { - async function getPooledConnection(wsUrl) { - let conn = state.connectionPool.get(wsUrl); - - if (conn && conn.ws.isConnected()) { - return conn; - } - - const ws = new WebSocketClient(wsUrl); - conn = { - ws, - pendingRequests: new Map(), // id -> { resolve, reject, timeout } - messageIdCounter: 1 - }; - - ws.on('message', (msg) => { - try { - const data = JSON.parse(msg); - if (data.id !== undefined) { - const pending = conn.pendingRequests.get(data.id); - if (pending) { - clearTimeout(pending.timeout); - conn.pendingRequests.delete(data.id); - if (data.error) { - pending.reject(new Error(data.error.message || JSON.stringify(data.error))); - } else { - pending.resolve(data.result); - } - } - } - // Forward events (e.g. Runtime.consoleAPICalled) to the per-connection - // eventHandler if one was attached — used by enableConsoleLogging. - if (data.method && conn.eventHandler) { - conn.eventHandler(data); - } - } catch (e) { - console.error('Error processing CDP message:', e); - } - }); - - ws.on('close', () => { - state.connectionPool.delete(wsUrl); - // Reject any in-flight requests so callers don't hang forever. - for (const [_id, pending] of conn.pendingRequests) { - clearTimeout(pending.timeout); - pending.reject(new Error('Connection closed')); - } - conn.pendingRequests.clear(); - }); - - ws.on('error', (err) => { - console.error('WebSocket error:', err); - }); - - await ws.connect(); - state.connectionPool.set(wsUrl, conn); - - return conn; - } - - async function sendCdpCommandPooled(wsUrl, method, params = {}, timeout = DEFAULT_CDP_TIMEOUT_MS) { - const conn = await getPooledConnection(wsUrl); - const id = conn.messageIdCounter++; - - return new Promise((resolve, reject) => { - const timeoutHandle = setTimeout(() => { - conn.pendingRequests.delete(id); - reject(new Error(`CDP command timeout: ${method}`)); - }, timeout); - - conn.pendingRequests.set(id, { resolve, reject, timeout: timeoutHandle }); - conn.ws.send(JSON.stringify({ id, method, params })); - }); - } - - // Single-use ws — fallback when the pool is wedged. Each call opens a - // fresh connection, sends one request, waits for the matching id, - // closes. Less efficient (re-handshakes per call) but recovers from - // broken pooled connections without wedging the rest of the session. - async function sendCdpCommandSingle(wsUrl, method, params = {}, timeout = DEFAULT_CDP_TIMEOUT_MS) { - const ws = new WebSocketClient(wsUrl); - - return new Promise((resolve, reject) => { - // Single-use ws sends exactly one request — id=1 is fine because the - // connection is fresh and there's nothing to collide with. - const id = 1; - let resolved = false; - - ws.on('message', (msg) => { - const data = JSON.parse(msg); - if (data.id === id) { - resolved = true; - ws.close(); - if (data.error) { - reject(new Error(data.error.message || JSON.stringify(data.error))); - } else { - resolve(data.result); - } - } - }); - - ws.on('error', (err) => { - if (!resolved) { - reject(err); - } - }); - - ws.connect() - .then(() => { - ws.send(JSON.stringify({ id, method, params })); - }) - .catch(reject); - - setTimeout(() => { - if (!resolved) { - ws.close(); - reject(new Error('CDP command timeout')); - } - }, timeout); - }); - } - - async function sendCdpCommand(wsUrl, method, params = {}, timeout = DEFAULT_CDP_TIMEOUT_MS) { - try { - return await sendCdpCommandPooled(wsUrl, method, params, timeout); - } catch (e) { - console.error('Pooled connection failed, using single-use:', e.message); - return await sendCdpCommandSingle(wsUrl, method, params, timeout); - } - } - - function closePooledConnection(wsUrl) { - const conn = state.connectionPool.get(wsUrl); - if (conn) { - conn.ws.close(); - state.connectionPool.delete(wsUrl); - } - } - - function closeAllConnections() { - for (const [_wsUrl, conn] of state.connectionPool) { - conn.ws.close(); - } - state.connectionPool.clear(); - } - - return { - getPooledConnection, - sendCdpCommand, - sendCdpCommandPooled, - sendCdpCommandSingle, - closePooledConnection, - closeAllConnections, - }; -} - -module.exports = { attachCdpConnection }; diff --git a/skills/browsing/lib/cdp-router.js b/skills/browsing/lib/cdp-router.js new file mode 100644 index 0000000..418dd49 --- /dev/null +++ b/skills/browsing/lib/cdp-router.js @@ -0,0 +1,86 @@ +// sessionId-aware dispatcher for browser-WS messages. +// +// Routes incoming browser-WS messages by sessionId: +// - msg.sessionId set → page session's pendingRequests / event listeners +// - msg.method, no sessionId → root listeners (target events, etc.) +// - msg.id, no sessionId → falls through. browser-session.js's existing +// pendingRequests Map correlates root command +// responses; the router does NOT also try to. +// (Avoids duplicate correlation paths.) +// +// Per-session message id counters are independent. {id:1, sessionId:"A"} +// and {id:1, sessionId:"B"} correlate independently on one WS — collapsing +// id space across sessions would silently break correlation. + +function createCdpRouter({ browser }) { + // sessionId -> { pendingRequests: Map, + // eventListeners: Set<(msg) => void> } + const sessions = new Map(); + + // Root browser-session event listeners (events only — command responses are + // correlated in browser-session.js's pendingRequests, not here). + const rootListeners = new Set(); + + browser.onEvent((msg) => { + const sid = msg.sessionId; + if (sid) { + const sess = sessions.get(sid); + if (!sess) return; // detached or never registered + if (msg.id !== undefined) { + const pending = sess.pendingRequests.get(msg.id); + if (pending) { + clearTimeout(pending.timeout); + sess.pendingRequests.delete(msg.id); + if (msg.error) { + pending.reject(new Error(msg.error.message || JSON.stringify(msg.error))); + } else { + pending.resolve(msg.result); + } + } + } else if (msg.method) { + for (const fn of sess.eventListeners) { + try { fn(msg); } catch (e) { console.error('cdp-router page listener threw:', e); } + } + } + } else if (msg.method) { + // Root session events (e.g. Target.targetCreated). Command responses + // (msg.id without sessionId) intentionally fall through — + // browser-session.js handles them. + for (const fn of rootListeners) { + try { fn(msg); } catch (e) { console.error('cdp-router root listener threw:', e); } + } + } + }); + + function registerSession(sessionId) { + const sess = { + pendingRequests: new Map(), + eventListeners: new Set(), + }; + sessions.set(sessionId, sess); + return sess; + } + + function unregisterSession(sessionId) { + const sess = sessions.get(sessionId); + if (!sess) return; + // Reject any in-flight requests so awaiting callers don't hang. + for (const [, p] of sess.pendingRequests) { + clearTimeout(p.timeout); + p.reject(new Error('Page session detached')); + } + sess.pendingRequests.clear(); + sess.eventListeners.clear(); + sessions.delete(sessionId); + } + + function getRootListeners() { return rootListeners; } + + return { + registerSession, + unregisterSession, + getRootListeners, + }; +} + +module.exports = { createCdpRouter }; diff --git a/skills/browsing/lib/chrome-process.js b/skills/browsing/lib/chrome-process.js index 4a4aea3..555ec03 100644 --- a/skills/browsing/lib/chrome-process.js +++ b/skills/browsing/lib/chrome-process.js @@ -19,10 +19,15 @@ const os = require('os'); * needs — chromeHttp for graceful shutdown, getTabs/newTab for the * show/hide tab-restoration flow. * - * `attachChromeProcess({ state, chromeHttp, getTabs, newTab })` returns - * the bound methods. + * `closeBridge` (optional) is invoked at the start of `killChrome` so the + * browser-level CDP WebSocket and any attached page sessions are torn down + * before Chrome itself is killed. If absent (older callers, tests), the + * bridge close is skipped. + * + * `attachChromeProcess({ state, chromeHttp, getTabs, newTab, closeBridge })` + * returns the bound methods. */ -function attachChromeProcess({ state, chromeHttp, getTabs, newTab }) { +function attachChromeProcess({ state, chromeHttp, getTabs, newTab, closeBridge }) { // Read-once derived constants from the per-session host-override. const CHROME_DEBUG_HOST = state.hostOverride.getHost(); const CHROME_DEBUG_PORT = state.hostOverride.getPort(); @@ -143,6 +148,13 @@ function attachChromeProcess({ state, chromeHttp, getTabs, newTab }) { } async function killChrome() { + // Close the browser-level WS bridge before tearing down Chrome. Best-effort + // — if the bridge never opened (no consumer touched targets / pageSession), + // this is a no-op. + if (closeBridge) { + try { await closeBridge(); } catch { /* best-effort */ } + } + let pidToKill = null; if (state.chromeProcess && state.chromeProcess.pid) { diff --git a/skills/browsing/lib/console-logging.js b/skills/browsing/lib/console-logging.js index 79fdb10..3d0e532 100644 --- a/skills/browsing/lib/console-logging.js +++ b/skills/browsing/lib/console-logging.js @@ -1,115 +1,75 @@ -const { WebSocketClient } = require('./websocket-client'); - -// Fixed CDP request id used to mark the Runtime.enable response so the -// message handler can distinguish setup-acknowledged from runtime-event -// without tracking ids generally. -const RUNTIME_ENABLE_REQUEST_ID = 999999; - -// How long to wait for Runtime.enable to acknowledge before failing the -// console-logging setup. -const ENABLE_TIMEOUT_MS = 5000; - /** * Page console-message capture. * - * `enableConsoleLogging` opens a persistent WebSocket alongside the - * pooled CDP connection (kept separate so the request/response flow - * isn't polluted with `Runtime.consoleAPICalled` events) and streams - * console output into `state.consoleMessages` keyed by tab ws URL. - * `getConsoleMessages` reads them out — optionally filtered by - * timestamp — and `clearConsoleMessages` resets the buffer for a tab. + * `enableConsoleLogging` enables Runtime on the page session and registers a + * page-session event listener for `Runtime.consoleAPICalled`, streaming + * console output into `state.consoleMessages` keyed by `ps.sessionId`. + * `getConsoleMessages` reads them out — optionally filtered by timestamp — + * and `clearConsoleMessages` resets the buffer for a tab. * - * The fixed id `999999` is used for the `Runtime.enable` request/response - * pair so the message handler can tell setup-acknowledged from - * runtime-event without tracking ids generally. + * `enableDomain('Runtime')` is idempotent so navigation's auto-capture + * and `enableConsoleLogging` can coexist on the same page session without + * stomping on each other. * - * `attachConsoleLogging({ state, resolveWsUrl })` returns the bound API. + * Keying: `state.consoleMessages` is keyed by `ps.sessionId`. The public + * adapter API (`getConsoleMessages(arg, sinceTimestamp)`, + * `clearConsoleMessages(arg)`) accepts tabIndex / wsUrl / pageSession — + * all resolve through `getPageSession` to `ps.sessionId`. */ -function attachConsoleLogging({ state, resolveWsUrl }) { - async function enableConsoleLogging(tabIndexOrWsUrl) { - const wsUrl = await resolveWsUrl(tabIndexOrWsUrl); +function attachConsoleLogging({ state, getPageSession }) { + async function enableConsoleLogging(tabIndexOrPageSession) { + const ps = await getPageSession(tabIndexOrPageSession); - if (!state.consoleMessages.has(wsUrl)) { - state.consoleMessages.set(wsUrl, []); + if (!state.consoleMessages.has(ps.sessionId)) { + state.consoleMessages.set(ps.sessionId, []); } - // Persistent ws — kept open after Runtime.enable so we keep receiving - // Runtime.consoleAPICalled events. Separate from the pooled CDP - // connection so RPC traffic doesn't fight event traffic. - const ws = new WebSocketClient(wsUrl); - - return new Promise((resolve, reject) => { - let enabledRuntime = false; - - ws.on('message', (msg) => { - const data = JSON.parse(msg); - - // Fixed id marks the Runtime.enable response; everything - // after that is event traffic. - if (data.id === RUNTIME_ENABLE_REQUEST_ID && !enabledRuntime) { - enabledRuntime = true; - resolve(); - return; - } - - if (data.method === 'Runtime.consoleAPICalled') { - const entry = data.params; - const timestamp = new Date().toISOString(); - const level = entry.type || 'log'; - const args = entry.args || []; - - const text = args.map(arg => { - if (arg.type === 'string') return arg.value; - if (arg.type === 'number') return String(arg.value); - if (arg.type === 'boolean') return String(arg.value); - if (arg.type === 'object') return arg.description || '[Object]'; - return String(arg.value || arg.description || arg.type); - }).join(' '); - - const messages = state.consoleMessages.get(wsUrl) || []; - messages.push({ timestamp, level, text }); - state.consoleMessages.set(wsUrl, messages); - } - }); - - ws.on('error', (err) => { - if (!enabledRuntime) { - reject(err); - } - }); - - ws.connect() - .then(() => { - ws.send(JSON.stringify({ - id: RUNTIME_ENABLE_REQUEST_ID, - method: 'Runtime.enable' - })); - }) - .catch(reject); - - setTimeout(() => { - if (!enabledRuntime) { - ws.close(); - reject(new Error('Console logging enable timeout')); - } - }, ENABLE_TIMEOUT_MS); + await ps.enableDomain('Runtime'); // idempotent + + const unsub = ps.onEvent((data) => { + if (data.method !== 'Runtime.consoleAPICalled') return; + const entry = data.params || {}; + const timestamp = new Date().toISOString(); + const level = entry.type || 'log'; + const args = entry.args || []; + + const text = args.map((arg) => { + if (arg.type === 'string') return arg.value; + if (arg.type === 'number') return String(arg.value); + if (arg.type === 'boolean') return String(arg.value); + if (arg.type === 'object') return arg.description || '[Object]'; + return String(arg.value || arg.description || arg.type); + }).join(' '); + + const messages = state.consoleMessages.get(ps.sessionId) || []; + messages.push({ timestamp, level, text }); + state.consoleMessages.set(ps.sessionId, messages); }); + + // The page session detach handles event-listener cleanup via router + // unregisterSession; close() here just unsubscribes this specific + // listener so a caller that wants to stop capturing without detaching + // the whole page session can do so. + return { + close: () => { + try { unsub(); } catch { /* best-effort */ } + }, + }; } - async function getConsoleMessages(tabIndexOrWsUrl, sinceTime = null) { - const wsUrl = await resolveWsUrl(tabIndexOrWsUrl); - const messages = state.consoleMessages.get(wsUrl) || []; + async function getConsoleMessages(tabIndexOrPageSession, sinceTime = null) { + const ps = await getPageSession(tabIndexOrPageSession); + const messages = state.consoleMessages.get(ps.sessionId) || []; if (!sinceTime) { return messages; } - - return messages.filter(msg => new Date(msg.timestamp) > sinceTime); + return messages.filter((msg) => new Date(msg.timestamp) > sinceTime); } - async function clearConsoleMessages(tabIndexOrWsUrl) { - const wsUrl = await resolveWsUrl(tabIndexOrWsUrl); - state.consoleMessages.set(wsUrl, []); + async function clearConsoleMessages(tabIndexOrPageSession) { + const ps = await getPageSession(tabIndexOrPageSession); + state.consoleMessages.set(ps.sessionId, []); } return { enableConsoleLogging, getConsoleMessages, clearConsoleMessages }; diff --git a/skills/browsing/lib/cookies.js b/skills/browsing/lib/cookies.js index 1b53562..0a9a10a 100644 --- a/skills/browsing/lib/cookies.js +++ b/skills/browsing/lib/cookies.js @@ -1,15 +1,13 @@ /** * Cookie management — currently just a single "clear everything" action. * - * Builds against any session-scoped pair of `resolveWsUrl` (tab index → ws URL) - * and `sendCdpCommand` (CDP request) — see `attachCookies({ resolveWsUrl, - * sendCdpCommand })`. The returned methods carry the session binding through - * closure capture of those helpers. + * Helpers accept `tabIndexOrPageSession` and route through + * `pageSession.send`. */ -function attachCookies({ resolveWsUrl, sendCdpCommand }) { - async function clearCookies(tabIndexOrWsUrl) { - const wsUrl = await resolveWsUrl(tabIndexOrWsUrl); - await sendCdpCommand(wsUrl, 'Network.clearBrowserCookies', {}); +function attachCookies({ getPageSession }) { + async function clearCookies(tabIndexOrPageSession) { + const ps = await getPageSession(tabIndexOrPageSession); + await ps.send('Network.clearBrowserCookies', {}); } return { clearCookies }; diff --git a/skills/browsing/lib/evaluation.js b/skills/browsing/lib/evaluation.js index 07ef879..c5d74dd 100644 --- a/skills/browsing/lib/evaluation.js +++ b/skills/browsing/lib/evaluation.js @@ -15,25 +15,26 @@ * full RemoteObject including `objectId`). For callers that need the * raw CDP shape. * - * `attachEvaluation({ resolveWsUrl, sendCdpCommand })` binds them to a - * session via the helpers' closure capture. + * Helpers accept `tabIndexOrPageSession` (the orchestrator's + * `getPageSession` resolver handles all shapes) and route through + * `pageSession.send`. */ const { throwIfExceptionDetails } = require('./cdp-utils'); -function attachEvaluation({ resolveWsUrl, sendCdpCommand }) { - async function evaluate(tabIndexOrWsUrl, expression) { - const wsUrl = await resolveWsUrl(tabIndexOrWsUrl); - const result = await sendCdpCommand(wsUrl, 'Runtime.evaluate', { +function attachEvaluation({ getPageSession }) { + async function evaluate(tabIndexOrPageSession, expression) { + const ps = await getPageSession(tabIndexOrPageSession); + const result = await ps.send('Runtime.evaluate', { expression, returnByValue: true, - awaitPromise: true + awaitPromise: true, }); throwIfExceptionDetails(result); return result.result.value; } - async function evaluateJson(tabIndexOrWsUrl, expression) { - const wsUrl = await resolveWsUrl(tabIndexOrWsUrl); + async function evaluateJson(tabIndexOrPageSession, expression) { + const ps = await getPageSession(tabIndexOrPageSession); const wrappedExpression = ` (() => { @@ -60,20 +61,20 @@ function attachEvaluation({ resolveWsUrl, sendCdpCommand }) { })() `; - const result = await sendCdpCommand(wsUrl, 'Runtime.evaluate', { + const result = await ps.send('Runtime.evaluate', { expression: wrappedExpression, returnByValue: true, - awaitPromise: true + awaitPromise: true, }); throwIfExceptionDetails(result); return result.result.value; } - async function evaluateRaw(tabIndexOrWsUrl, expression) { - const wsUrl = await resolveWsUrl(tabIndexOrWsUrl); - const result = await sendCdpCommand(wsUrl, 'Runtime.evaluate', { + async function evaluateRaw(tabIndexOrPageSession, expression) { + const ps = await getPageSession(tabIndexOrPageSession); + const result = await ps.send('Runtime.evaluate', { expression, - returnByValue: false + returnByValue: false, }); throwIfExceptionDetails(result); return result.result; diff --git a/skills/browsing/lib/extraction.js b/skills/browsing/lib/extraction.js index ef7d8b3..c47183b 100644 --- a/skills/browsing/lib/extraction.js +++ b/skills/browsing/lib/extraction.js @@ -10,40 +10,40 @@ const { throwIfExceptionDetails } = require('./cdp-utils'); * found but empty." The page-content / DOM-summary / markdown extractors * (the heavyweight ones used by auto-capture) live in `lib/capture.js`. * - * `attachExtraction({ resolveWsUrl, sendCdpCommand })` returns the bound - * methods — no session state needed. + * Helpers accept `tabIndexOrPageSession` and route through + * `pageSession.send`. */ -function attachExtraction({ resolveWsUrl, sendCdpCommand }) { - async function extractText(tabIndexOrWsUrl, selector) { - const wsUrl = await resolveWsUrl(tabIndexOrWsUrl); +function attachExtraction({ getPageSession }) { + async function extractText(tabIndexOrPageSession, selector) { + const ps = await getPageSession(tabIndexOrPageSession); const js = `${getElementSelector(selector)}?.textContent`; - const result = await sendCdpCommand(wsUrl, 'Runtime.evaluate', { + const result = await ps.send('Runtime.evaluate', { expression: js, - returnByValue: true + returnByValue: true, }); throwIfExceptionDetails(result); return result.result.value; } - async function getHtml(tabIndexOrWsUrl, selector = null) { - const wsUrl = await resolveWsUrl(tabIndexOrWsUrl); + async function getHtml(tabIndexOrPageSession, selector = null) { + const ps = await getPageSession(tabIndexOrPageSession); const js = selector ? `${getElementSelector(selector)}?.innerHTML` : 'document.documentElement.outerHTML'; - const result = await sendCdpCommand(wsUrl, 'Runtime.evaluate', { + const result = await ps.send('Runtime.evaluate', { expression: js, - returnByValue: true + returnByValue: true, }); throwIfExceptionDetails(result); return result.result.value; } - async function getAttribute(tabIndexOrWsUrl, selector, attrName) { - const wsUrl = await resolveWsUrl(tabIndexOrWsUrl); + async function getAttribute(tabIndexOrPageSession, selector, attrName) { + const ps = await getPageSession(tabIndexOrPageSession); const js = `${getElementSelector(selector)}?.getAttribute(${JSON.stringify(attrName)})`; - const result = await sendCdpCommand(wsUrl, 'Runtime.evaluate', { + const result = await ps.send('Runtime.evaluate', { expression: js, - returnByValue: true + returnByValue: true, }); throwIfExceptionDetails(result); return result.result.value; diff --git a/skills/browsing/lib/file-upload.js b/skills/browsing/lib/file-upload.js index 507eddd..5cdd2d4 100644 --- a/skills/browsing/lib/file-upload.js +++ b/skills/browsing/lib/file-upload.js @@ -7,34 +7,34 @@ * or `DOM.performSearch` + `DOM.getSearchResults` for XPath, then attaches * the absolute file paths to the input. * - * `attachFileUpload({ resolveWsUrl, sendCdpCommand })` returns the bound - * action. + * Helpers accept `tabIndexOrPageSession` and route through + * `pageSession.send`. */ -function attachFileUpload({ resolveWsUrl, sendCdpCommand }) { - async function fileUpload(tabIndexOrWsUrl, selector, filePaths) { - const wsUrl = await resolveWsUrl(tabIndexOrWsUrl); +function attachFileUpload({ getPageSession }) { + async function fileUpload(tabIndexOrPageSession, selector, filePaths) { + const ps = await getPageSession(tabIndexOrPageSession); - const docResult = await sendCdpCommand(wsUrl, 'DOM.getDocument', {}); + const docResult = await ps.send('DOM.getDocument', {}); const rootNodeId = docResult.root.nodeId; let nodeId; if (selector.startsWith('/') || selector.startsWith('//')) { - const searchResult = await sendCdpCommand(wsUrl, 'DOM.performSearch', { - query: selector + const searchResult = await ps.send('DOM.performSearch', { + query: selector, }); if (searchResult.resultCount === 0) { throw new Error(`File input not found: ${selector}`); } - const nodesResult = await sendCdpCommand(wsUrl, 'DOM.getSearchResults', { + const nodesResult = await ps.send('DOM.getSearchResults', { searchId: searchResult.searchId, fromIndex: 0, - toIndex: 1 + toIndex: 1, }); nodeId = nodesResult.nodeIds[0]; } else { - const queryResult = await sendCdpCommand(wsUrl, 'DOM.querySelector', { + const queryResult = await ps.send('DOM.querySelector', { nodeId: rootNodeId, - selector: selector + selector, }); nodeId = queryResult.nodeId; } @@ -43,9 +43,9 @@ function attachFileUpload({ resolveWsUrl, sendCdpCommand }) { throw new Error(`File input not found: ${selector}`); } - await sendCdpCommand(wsUrl, 'DOM.setFileInputFiles', { + await ps.send('DOM.setFileInputFiles', { files: filePaths, - nodeId: nodeId + nodeId, }); return { uploaded: true, files: filePaths.length }; diff --git a/skills/browsing/lib/keyboard-input.js b/skills/browsing/lib/keyboard-input.js index 6cbb52f..b0002a9 100644 --- a/skills/browsing/lib/keyboard-input.js +++ b/skills/browsing/lib/keyboard-input.js @@ -14,11 +14,12 @@ const { throwIfExceptionDetails } = require('./cdp-utils'); * so headless skips key events and relies on `Input.insertText` plus * per-character timing for whatever realism it can offer. * - * `attachKeyboardInput({ state, resolveWsUrl, sendCdpCommand, click })` - * returns the bound API. `click` is the mouse-side click — humanType - * uses it to focus a target before typing. + * Helpers accept `tabIndexOrPageSession` (the orchestrator's + * `getPageSession` resolver handles all shapes) and route through + * `pageSession.send`. `click` is the mouse-side click — humanType uses + * it to focus a target before typing. */ -function attachKeyboardInput({ state, resolveWsUrl, sendCdpCommand, click }) { +function attachKeyboardInput({ state, getPageSession, click }) { /** * Press a named key (Tab, Enter, F1-F12, arrows, etc.) with optional * modifiers. Sends both keyDown and keyUp; if the key has a `text` @@ -26,8 +27,8 @@ function attachKeyboardInput({ state, resolveWsUrl, sendCdpCommand, click }) { * browser fires the matching `input`/`keypress` events that form * submission depends on. */ - async function keyboardPress(tabIndexOrWsUrl, keyName, modifiers = {}) { - const wsUrl = await resolveWsUrl(tabIndexOrWsUrl); + async function keyboardPress(tabIndexOrPageSession, keyName, modifiers = {}) { + const ps = await getPageSession(tabIndexOrPageSession); const keyDef = KEY_DEFINITIONS[keyName]; if (!keyDef) { @@ -40,23 +41,23 @@ function attachKeyboardInput({ state, resolveWsUrl, sendCdpCommand, click }) { if (modifiers.meta) modifierFlags |= 4; if (modifiers.shift) modifierFlags |= 8; - await sendCdpCommand(wsUrl, 'Input.dispatchKeyEvent', { + await ps.send('Input.dispatchKeyEvent', { type: 'keyDown', key: keyDef.key, code: keyDef.code, windowsVirtualKeyCode: keyDef.keyCode, nativeVirtualKeyCode: keyDef.keyCode, modifiers: modifierFlags, - ...(keyDef.text && { text: keyDef.text }) + ...(keyDef.text && { text: keyDef.text }), }); - await sendCdpCommand(wsUrl, 'Input.dispatchKeyEvent', { + await ps.send('Input.dispatchKeyEvent', { type: 'keyUp', key: keyDef.key, code: keyDef.code, windowsVirtualKeyCode: keyDef.keyCode, nativeVirtualKeyCode: keyDef.keyCode, - modifiers: modifierFlags + modifiers: modifierFlags, }); return { pressed: keyName, modifiers }; @@ -66,16 +67,15 @@ function attachKeyboardInput({ state, resolveWsUrl, sendCdpCommand, click }) { * Smart text input. If `selector` is supplied, focuses the element * (via JS focus to avoid mouse-click side effects). Then types the * value, treating \t as Tab, \n as Enter (unless current focus is a - *