From 70a95101bc51b3a10ad39dd4870674658db07810 Mon Sep 17 00:00:00 2001 From: Mark Backman Date: Fri, 24 Apr 2026 21:06:50 -0400 Subject: [PATCH 1/4] Add docs for UI Agents --- api-reference/client/js/a11y-snapshot.mdx | 235 +++++++++++++ api-reference/client/js/overview.mdx | 14 + api-reference/client/js/ui-agent-client.mdx | 139 ++++++++ api-reference/client/react/components.mdx | 39 +++ api-reference/client/react/hooks.mdx | 299 ++++++++++++++++ .../pipecat-subagents/decorators.mdx | 50 ++- api-reference/pipecat-subagents/messages.mdx | 89 +++++ api-reference/pipecat-subagents/overview.mdx | 29 ++ api-reference/pipecat-subagents/ui-agent.mdx | 318 ++++++++++++++++++ .../pipecat-subagents/ui-commands.mdx | 90 +++++ .../pipecat-subagents/ui-prompts.mdx | 47 +++ api-reference/pipecat-subagents/ui-tools.mdx | 100 ++++++ docs.json | 21 +- 13 files changed, 1465 insertions(+), 5 deletions(-) create mode 100644 api-reference/client/js/a11y-snapshot.mdx create mode 100644 api-reference/client/js/ui-agent-client.mdx create mode 100644 api-reference/pipecat-subagents/ui-agent.mdx create mode 100644 api-reference/pipecat-subagents/ui-commands.mdx create mode 100644 api-reference/pipecat-subagents/ui-prompts.mdx create mode 100644 api-reference/pipecat-subagents/ui-tools.mdx diff --git a/api-reference/client/js/a11y-snapshot.mdx b/api-reference/client/js/a11y-snapshot.mdx new file mode 100644 index 00000000..4d521b49 --- /dev/null +++ b/api-reference/client/js/a11y-snapshot.mdx @@ -0,0 +1,235 @@ +--- +title: "Accessibility Snapshots" +description: "Stream the document's accessibility tree to the server as " +--- + +## Overview + +The accessibility-snapshot module produces a structured tree of the document +that the server-side +[`UIAgent`](/api-reference/pipecat-subagents/ui-agent) renders into LLM +context as ``. Shape and filtering are inspired by Playwright's +accessibility snapshot and the Playwright MCP server's LLM-facing format: +the goal is a compact, semantically meaningful tree, not a raw DOM dump. + +```javascript +import { + A11ySnapshotStreamer, + snapshotDocument, + findElementByRef, +} from "@pipecat-ai/client-js"; +``` + +The streaming side is handled by `A11ySnapshotStreamer` (or +[`useA11ySnapshot`](/api-reference/client/react/hooks#usea11ysnapshot) in +React). The walker (`snapshotDocument`) is exposed for apps that need a +one-off snapshot, and `findElementByRef` resolves a server-supplied ref +back to a live DOM element for command handlers. + + + Apps can mark a subtree as PII / sensitive by adding `data-a11y-exclude` to + the element. The walker skips it and its descendants entirely, without also + hiding it from screen readers (unlike `aria-hidden`). Password inputs are + stripped automatically. + + +## A11ySnapshotStreamer + +```typescript +class A11ySnapshotStreamer { + constructor(client: UIAgentClient, options?: A11ySnapshotStreamerOptions); + start(): void; + stop(): void; +} +``` + +Framework-agnostic helper that drives accessibility-snapshot streaming. +Wraps the walker, a `MutationObserver`, and the other triggers +(`scrollend`, `resize`, focus, `visibilitychange`) into a single object +with a `start` / `stop` lifecycle. + +```javascript +import { A11ySnapshotStreamer, UIAgentClient } from "@pipecat-ai/client-js"; + +const ui = new UIAgentClient(pcClient); +ui.attach(); + +const streamer = new A11ySnapshotStreamer(ui); +streamer.start(); + +// Later +streamer.stop(); +``` + + + React apps should prefer + [`useA11ySnapshot`](/api-reference/client/react/hooks#usea11ysnapshot), which + is a thin lifecycle wrapper around this class. + + +### Triggers + +A snapshot is scheduled (debounced via `debounceMs`) on: + +- **DOM mutations** observed by `MutationObserver` on `document.body` + (`childList`, `subtree`, and a curated list of attribute changes including + `role`, `aria-*`, `data-a11y-exclude`, `disabled`, `hidden`, `tabindex`, + `href`). +- **Focus changes** (`focusin`, `focusout`). +- **Scroll-end** (`scrollend`, captured at window level so any scrollable + ancestor triggers it). +- **Window resize** (debounced because the viewport rect shifts which nodes + are `[offscreen]`). +- **Tab visibility** transition to `visible`. + +An initial snapshot is scheduled immediately on `start()`. + +### Options + + + Minimum interval between snapshot emissions, in milliseconds. Multiple + triggers within the window coalesce into one snapshot. + + + + When `true`, annotate every emitted node with `"offscreen"` in its `state` + list if its bounding rect sits entirely outside the viewport. Set to `false` + to skip the per-node layout measurement (e.g. on very large pages where layout + cost outweighs the viewport signal). + + + + When `true`, log each emitted snapshot to the browser console (node count, + rough token estimate, raw tree). Mirrors the server's `log_snapshots` flag on + `UIAgent`. + + +### Methods + +#### start + +```typescript +start(): void +``` + +Begin streaming. Idempotent: subsequent calls before `stop()` are no-ops. + +#### stop + +```typescript +stop(): void +``` + +Stop streaming. Detaches all observers/listeners and cancels pending timers. +Safe to call before `start()` or multiple times. + +## snapshotDocument + +```typescript +function snapshotDocument( + root?: Element, + options?: SnapshotOptions, +): A11ySnapshot; +``` + +Produce a one-off accessibility snapshot of a DOM subtree. Useful for tests +or for apps that want to drive snapshot timing manually (e.g. snapshot only +on a specific app event). + + + Element to walk. Defaults to `document.body`. + + + + Snapshot options. + + + When `true`, each emitted node gets `"offscreen"` in its state list if its + bounding rect sits entirely outside the viewport. + + + + +**Returns:** An `A11ySnapshot` with a `generic` root containing the walked +children, plus a client-side capture timestamp. + +## findElementByRef + +```typescript +function findElementByRef(ref: string): Element | null; +``` + +Resolve a ref string like `"e42"` back to a live DOM element. Returns `null` +if the ref was never assigned or the element has since been +garbage-collected. + +The walker assigns stable refs to every emitted DOM element via a `WeakMap` +(forward) plus a `Map>` (reverse). Refs persist as +long as the element stays mounted and survive across snapshots. Command +handlers use this to act on nodes the server referenced from ``. + +The +[`useStandard*Handler`](/api-reference/client/react/hooks#usestandardscrolltohandler) +hooks call this internally. + +## Types + +### A11yNode + +One node in the accessibility snapshot tree. + +```typescript +interface A11yNode { + ref: string; + role: string; + name?: string; + value?: string; + state?: string[]; + level?: number; + colcount?: number; + rowcount?: number; + children?: A11yNode[]; +} +``` + +| Field | Type | Description | +| ---------- | ------------ | ------------------------------------------------------------------------------------------------------------------ | +| `ref` | `string` | Stable reference id of the form `e{N}`. Persists across snapshots while the DOM node is mounted. | +| `role` | `string` | ARIA role (explicit or tag-derived). | +| `name` | `string` | Accessible name, truncated to 100 chars. | +| `value` | `string` | Current value for inputs (omitted for passwords), progress, etc. | +| `state` | `string[]` | Short state tags. Known values: `"focused"`, `"selected"`, `"expanded"`, `"checked"`, `"disabled"`, `"offscreen"`. | +| `level` | `number` | Heading level, 1-6. | +| `colcount` | `number` | Column count for grid-like containers, populated from `aria-colcount`. | +| `rowcount` | `number` | Row count for grid-like containers, populated from `aria-rowcount`. | +| `children` | `A11yNode[]` | Child nodes. | + +### A11ySnapshot + +```typescript +interface A11ySnapshot { + root: A11yNode; + captured_at: number; +} +``` + +Shape of the payload inside a `__ui_snapshot` UI event. A full tree is sent +on each update; the server keeps the latest and renders it into +`...` when an agent injects it. + +| Field | Type | Description | +| ------------- | ---------- | ---------------------------------------------------------------- | +| `root` | `A11yNode` | Root of the accessibility tree (usually `document.body`'s node). | +| `captured_at` | `number` | Client-side timestamp (ms since epoch) when captured. | + +### UI_SNAPSHOT_EVENT_NAME + +```typescript +const UI_SNAPSHOT_EVENT_NAME = "__ui_snapshot"; +``` + +Reserved UI event name carrying an accessibility snapshot. The server-side +`UIAgent` recognizes this name and stores the payload as the latest +snapshot without dispatching to `@on_ui_event` handlers or injecting a +`` developer message. Underscore-prefixed to signal SDK-internal +and avoid colliding with app-defined event names. diff --git a/api-reference/client/js/overview.mdx b/api-reference/client/js/overview.mdx index c4a817d0..772e46db 100644 --- a/api-reference/client/js/overview.mdx +++ b/api-reference/client/js/overview.mdx @@ -85,6 +85,20 @@ pcClient.connect({ > Daily, SmallWebRTC, WebSocket, and other transports + + Send UI events and dispatch UI commands with the v1 UI Agent protocol + + + Stream the document's accessibility tree to the server as `` + The Pipecat JavaScript SDK implements the [RTVI standard](/client/rtvi-standard) for real-time AI inference, ensuring compatibility with any RTVI-compatible server and transport layer. diff --git a/api-reference/client/js/ui-agent-client.mdx b/api-reference/client/js/ui-agent-client.mdx new file mode 100644 index 00000000..ff2a0016 --- /dev/null +++ b/api-reference/client/js/ui-agent-client.mdx @@ -0,0 +1,139 @@ +--- +title: "UIAgentClient" +description: "Client-side surface for the UI Agent pattern" +--- + +## Overview + +`UIAgentClient` wraps an existing `PipecatClient` with the v1 UI Agent +protocol: + +- **`sendEvent(name, payload)`** for client → server events. +- **`registerCommandHandler(name, handler)`** for server → client commands. + Handlers dispatch on the command name extracted from `RTVIEvent.ServerMessage` + payloads of type `"ui.command"`. + +Construction does not subscribe to anything. Call `attach()` to start +listening for `RTVIEvent.ServerMessage` and store the returned detach +function to stop. The React provider +([`UIAgentProvider`](/api-reference/client/react/components#uiagentprovider)) +does this automatically inside a `useEffect` so subscription lifecycle +follows mount/unmount. + +```javascript +import { PipecatClient, UIAgentClient } from "@pipecat-ai/client-js"; + +const pcClient = new PipecatClient({ /* ... */ }); +const uiClient = new UIAgentClient(pcClient); +const detach = uiClient.attach(); + +uiClient.registerCommandHandler("toast", (payload) => { + showToast(payload.title, payload.subtitle); +}); + +uiClient.sendEvent("nav_click", { view: "settings" }); + +// Later, on cleanup +detach(); +``` + +## Constructor + +```typescript +new UIAgentClient(client: PipecatClient) +``` + + + An existing Pipecat client. The UI agent client wraps this without taking + ownership; the host is still responsible for connecting and disconnecting + the underlying client. + + +## Properties + +### pipecatClient + +```typescript +get pipecatClient: PipecatClient +``` + +The underlying Pipecat client, exposed as an escape hatch when an app needs +to call `PipecatClient` methods directly. + +## Methods + +### sendEvent + +```typescript +sendEvent(name: string, payload?: T): void +``` + +Send a named UI event to the server. The event is delivered as an RTVI +client message with type `"ui.event"`; the bridge installed by +`attach_ui_bridge` republishes it as `BusUIEventMessage` on the bus. + + + App-defined event name. Must match the `name` argument on a server-side + `@on_ui_event(name)` handler if the agent should react. + + + + App-defined payload. Schemaless. Optional. + + +### registerCommandHandler + +```typescript +registerCommandHandler( + name: string, + handler: UICommandHandler, +): void +``` + +Register a handler for a named UI command. Overwrites any existing handler +for the same name. + + + App-defined command name. Must match the `name` argument the server passes + to `UIAgent.send_command(name, payload)`. + + + + Callback invoked with the command payload. Sync or async; rejected promises + surface to the host's unhandled-rejection channel rather than being + swallowed. + + +### unregisterCommandHandler + +```typescript +unregisterCommandHandler(name: string): void +``` + +Remove the handler previously registered for `name`, if any. No-op when no +handler is registered. + +### unregisterAllCommandHandlers + +```typescript +unregisterAllCommandHandlers(): void +``` + +Remove all registered command handlers. + +### attach + +```typescript +attach(): () => void +``` + +Subscribe to `RTVIEvent.ServerMessage` on the underlying `PipecatClient`. +Returns a detach function that unsubscribes. + +Safe to call more than once: each call installs an independent listener. +Invoke the returned function to remove that listener. The React provider +calls this inside `useEffect` and uses the returned function as the cleanup, +so subscription lifecycle follows React's mount/unmount cycle (including +`StrictMode`'s double-invoke in development). + +**Returns:** A `() => void` detach function. diff --git a/api-reference/client/react/components.mdx b/api-reference/client/react/components.mdx index f1b18ae7..df10be47 100644 --- a/api-reference/client/react/components.mdx +++ b/api-reference/client/react/components.mdx @@ -21,6 +21,45 @@ The root component for providing Pipecat client context to your application. It A singleton instance of `PipecatClient` +## UIAgentProvider + +Wraps its children with a `UIAgentClient` bound to a Pipecat client. Mount +inside (or alongside) `PipecatClientProvider`. Descendants can then call +[`useUIAgentClient`](/api-reference/client/react/hooks#useuiagentclient), +[`useUICommandHandler`](/api-reference/client/react/hooks#useuicommandhandler), +[`useUIEventSender`](/api-reference/client/react/hooks#useuieventsender), +[`useA11ySnapshot`](/api-reference/client/react/hooks#usea11ysnapshot), and +the [`useStandard*Handler`](/api-reference/client/react/hooks#usestandardscrolltohandler) +hooks. + +The provider subscribes to `RTVIEvent.ServerMessage` on the underlying +Pipecat client to dispatch UI commands. On unmount (or when the underlying +Pipecat client changes), the subscription is torn down via the detach +function returned from `client.attach()`, including under `StrictMode`'s +double-invoke in development. + +```jsx + + + + + +``` + +**Props** + + + The Pipecat client to bind the UI agent to. When omitted, the provider + reads the client from the ambient `PipecatClientProvider` via + `usePipecatClient`. When set, the prop takes precedence and the context + is ignored. + + Pass this explicitly when the host provides the client via a render prop + (for example, `PipecatAppBase` from `@pipecat-ai/voice-ui-kit`) so the + UI agent binds to the exact same client instance without relying on + React-context identity. + + ## PipecatClientAudio Creates a new `