UIAgent by markbackman · Pull Request #18 · pipecat-ai/pipecat-subagents

markbackman · 2026-04-25T01:24:57Z

Summary

Adds UIAgent plus the orchestration helpers needed for AI agents that observe and drive a GUI app through a structured a11y-snapshot wire format. Six runnable demos exercise the patterns in isolation and in combination.

The wire format itself is now first-class in Pipecat (companion PR pipecat-ai/pipecat#4407): five new RTVI message types (ui-event, ui-command, ui-snapshot, ui-cancel-task, ui-task), paired pydantic envelope models, and the matching pipeline frames live in pipecat.processors.frameworks.rtvi.models. The matching client-side support lives in @pipecat-ai/client-js and @pipecat-ai/client-react (companion PR pipecat-ai/pipecat-client-web#203). This subagents PR builds the agent abstractions on top of that wire format. Single-LLM Pipecat apps that want UI Agent semantics without the subagents framework can target the wire format directly.

Bumps the minimum pipecat-ai dependency to >=1.2.0.

⚠️ Requires a Pipecat release (1.2.0) before this can be merged.

What's added

Core SDK (src/pipecat_subagents/agents/ui/):

UIAgent (subclass of LLMContextAgent) that:
- Stores the latest accessibility snapshot from the client and auto-injects it as <ui_state> at the start of every task.
- Routes inbound ui-event RTVI messages to @on_ui_event(name) handlers without running the LLM, for low latency.
- Provides respond_to_task(...) and a current_task property so tools don't have to thread task_id manually.
- Single-flight task semantics: on_task_request acquires a per-agent lock that is held until respond_to_task fires, so overlapping requests queue rather than interleaving their context mutations. The lock is also released on cancellation, so a cancelled task can't strand the agent.
- Has a keep_history flag for multi-turn UIs (defaults to False, the canonical stateless-delegate pattern that pairs with the voice/UI separation).
send_command(name, payload) for server-to-client UI commands, going out as first-class ui-command RTVI messages. Pairs with the standard payload models that ship in pipecat (Toast, Navigate, ScrollTo, Highlight, Focus, SelectText, SetInputValue, Click); apps publish their own command names freely.
Action helpers on UIAgent: scroll_to, highlight, select_text, click, set_input_value. Plain instance methods (not LLM tools) that wrap send_command with the standard payloads.
ReplyToolMixin: one bundled reply(answer, scroll_to=None, highlight=None, select_text=None, fills=None, click=None) LLM tool. Required answer argument keeps smaller models from omitting the spoken terminator (a real failure mode of the chainable-mixin shape we tried first). One tool call per turn, no chaining.
start_user_task_group(...): fire-and-forget counterpart to the user_task_group context manager. Dispatches a worker fan-out, returns the task_id, and lets workers run in a background asyncio task that the SDK manages.
attach_ui_bridge(root_agent) that wires the new first-class UI RTVI channels to the agent bus in both directions:
- Inbound: subscribes to RTVIProcessor.on_ui_message. ui-event and ui-snapshot from the client become BusUIEventMessage on the bus (the snapshot is routed to UIAgent for <ui_state> injection; events fan out to handlers).
- Outbound: BusUICommandMessage from any agent leaves the bus as an RTVIUICommandFrame (UI commands) or RTVIUITaskFrame (task lifecycle envelopes), which the RTVI observer wraps into the matching UICommandMessage / UITaskMessage envelopes on the wire.
<selection> block in <ui_state> for read-side deixis (text the user has highlighted in the client).
UI_STATE_PROMPT_GUIDE constant: canonical prompt fragment that documents the <ui_state> / <ui_event> context tags the LLM sees. Apps concatenate it into their system prompt.
New bus message types: BusUIEventMessage, BusUICommandMessage (in agents/ui/ui_messages.py).

Six demos (examples/local/ui-agent/), each isolating one concept:

Demo	Pattern
`hello-snapshot`	Foundational: a11y snapshot streaming + UIAgent task dispatch
`pointing`	`highlight` action grounded by `<ui_state>` refs
`deixis`	Read-side text-selection grounding via `<selection>` block
`form-fill`	Input fill + click actions, multi-field tools
`async-tasks`	Parallel fan-out via `start_user_task_group`, streaming task updates
`document-review`	Synthesis demo combining all of the above

For reviewers

⚠️ MERGE BLOCKER — revert the [tool.uv.sources] pin in pyproject.toml before merging. Commit 4aa3fbd adds a temporary [tool.uv.sources] block that resolves pipecat-ai>=1.2.0 from the open wire-format PR (feat(rtvi): add UI Agent Protocol as first-class RTVI message types pipecat#4407). It exists so reviewers and CI can resolve the dep before pipecat 1.2.0 ships on PyPI. Once 1.2.0 lands, drop that commit (or the block) so the published package and downstream installs resolve from PyPI. The override is install-time-only — uv strips [tool.uv.sources] from the published distribution — but leaving it in the repo would mask a regression where 1.2.0 fails to resolve cleanly.
Companion PRs land the wire format on each side. feat(rtvi): add UI Agent Protocol as first-class RTVI message types pipecat#4407 (canonical RTVI types and pipeline frames) and Client updates for UI agent pipecat-client-web#203 (UIAgentClient, React idioms, standard handlers). All three are additive; no existing wire shapes change. The RTVI PROTOCOL_VERSION bumps from 1.2.0 to 1.3.0 — minor bump, major-version compat check still passes.
Reading order suggestion: the SDK foundations land first (commits up through the action commands), then the subpackage refactor and LLMContextAgent extension, then ergonomics iteration (chainable mixins to bundled ReplyToolMixin), then the orchestration primitives, then the wire-format migration to first-class RTVI types, then example/feature pairs (each demo paired with the SDK change it exercises). The top-level examples/local/ui-agent/README.md is a good entry point for the demo side.

Test plan

uv run pytest passes (281 tests)
All six demos run end-to-end against their React clients
Reviewer verifies one or two demos locally per their READMEs
Before merge: drop commit 4aa3fbd (or the [tool.uv.sources] block in pyproject.toml) once pipecat 1.2.0 is on PyPI

codecov-commenter · 2026-04-25T01:25:41Z

Codecov Report

❌ Patch coverage is 95.16129% with 21 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/pipecat_subagents/agents/ui/ui_agent.py	93.23%	18 Missing ⚠️
src/pipecat_subagents/agents/ui/ui_task_context.py	93.33%	2 Missing ⚠️
src/pipecat_subagents/agents/ui/ui_tools.py	97.22%	1 Missing ⚠️

Files with missing lines	Coverage Δ
src/pipecat_subagents/agents/__init__.py	`100.00% <100.00%> (ø)`
src/pipecat_subagents/agents/ui/__init__.py	`100.00% <100.00%> (ø)`
src/pipecat_subagents/agents/ui/ui_bridge.py	`100.00% <100.00%> (ø)`
.../pipecat_subagents/agents/ui/ui_event_decorator.py	`100.00% <100.00%> (ø)`
src/pipecat_subagents/agents/ui/ui_messages.py	`100.00% <100.00%> (ø)`
src/pipecat_subagents/agents/ui/ui_prompts.py	`100.00% <100.00%> (ø)`
src/pipecat_subagents/bus/__init__.py	`100.00% <ø> (ø)`
src/pipecat_subagents/agents/ui/ui_tools.py	`97.22% <97.22%> (ø)`
src/pipecat_subagents/agents/ui/ui_task_context.py	`93.33% <93.33%> (ø)`
src/pipecat_subagents/agents/ui/ui_agent.py	`93.23% <93.23%> (ø)`

... and 3 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Temporary [tool.uv.sources] overrides so reviewers and CI can resolve ``pipecat-ai>=1.1.0`` and ``pipecat-ai-subagents>=0.4.0`` from the open wire-format PRs before either is published to PyPI. Companion PRs: - pipecat-ai/pipecat#4407 - pipecat-ai/pipecat-subagents#18 uv strips [tool.uv.sources] when building the distribution, so this is install-time-only and does not affect the published demo. Drop this commit (or just the [tool.uv.sources] block) before merging once both upstreams are on PyPI.

A model that emits a non-dict entry in ``fills`` (or a non-string ref in ``highlight`` / ``click``) would have crashed the tool body before ``respond_to_task`` ran. Because UIAgent acquires the single-flight task lock in ``on_task_request`` and only releases it via ``respond_to_task`` (or the cancellation path), an unhandled exception in the tool would have stranded the lock until the voice-agent's 30s task timeout fired ``on_task_cancelled`` — 30s of UI deadlock for what's almost always a transient model hiccup. Skip non-conforming entries instead. The fix is at the LLM-input boundary (the contents of the list arguments) rather than a broad try/finally so that real bugs in the helpers still surface as exceptions. Adds three regression tests covering the non-dict ``fills``, non-string ``highlight``, and non-string ``click`` cases. Each one asserts the critical invariant: ``respond_to_task`` and ``result_callback`` still run, so the lock is released. Reported by Codex review of #18.

aconchillo · 2026-05-08T23:31:26Z

+            ),
+        )
+
+    async def on_task_request(self, message: BusTaskRequestMessage) -> None:


this could be:

@task(name="hello") async def on_hello(self, , message: BusTaskRequestMessage) -> None:

The call to await super().on_task_request(message) will happen automatically.

aconchillo · 2026-05-08T23:55:44Z

+        await super().on_ready()
+        # Route inbound UI events (incl. the reserved snapshot event)
+        # at HelloAgent — the snapshot is what HelloAgent reasons over.
+        attach_ui_bridge(self, target="hello")


Calling this attach_ui_bridge is a bit weird. We already have the notion of a bridge, which is a frame processor that outputs and inputs from the bus. Maybe we can also do this as a decorator that gets automatically called when the pipeline is ready.

What about?

@ui_agent(agent="hello") class HelloRoot(BaseAgent)

could there be multiple UI agents? I think it should be possible. If so, maybe:

@ui_agent(agents=["hello", "hello2"]) class HelloRoot(BaseAgent): ....

aconchillo · 2026-05-08T23:57:11Z

+                # auto_inject_ui_state is on) injects ``<ui_state>``.
+                # Then feed the user's query into the LLM context with
+                # ``run_llm=True`` so the LLM actually generates.
+                await super().on_task_request(message)


Maybe we document the @task(name="xxxx") instead?

aconchillo · 2026-05-09T00:03:03Z

+        ``run_llm=True`` the context contains exactly: current
+        ``<ui_state>`` + query.
+        """
+        await self._task_lock.acquire()


This is assuming a UIAgent can only run one task a time. That is, it is hard for the user to run additional tasks on a UIAgent. Need to think about this, but it feels a bit limiting.

Maybe we can create a @ui_task(name="name") decorator instead that blocks... Or maybe even more generic @task(name="hello", sync=True) meaning that this task will block other tasks with the same name. I'll think about it.

UIAgent(LLMAgent) dispatches BusUIEventMessage to @on_ui_event handlers (each runs in its own asyncio task so the bus dispatcher is never held open) and exposes send_command for server-to-client messages. attach_ui_bridge forwards RTVI client messages onto the bus and translates BusUICommandMessage back to RTVIServerMessageFrame pushed into the root agent's pipeline. Bus gains BusUIEventMessage, BusUICommandMessage, and the matching type constants; standard command payloads (Toast, Navigate, ScrollTo, Highlight, Focus) ship alongside. Events inject as <ui_event name="...">payload</ui_event> developer messages into the LLM context by default; override render_ui_event or set inject_events=False to opt out.

Adds snapshot storage and LLM-context injection for the __ui_snapshot reserved event. UIAgent.render_ui_state() produces Playwright-MCP-style indented text with stable refs, offscreen tags, and grid dimensions. inject_ui_state() queues a developer message. visible_nodes() returns the non-offscreen subset. ScrollTo/Highlight/Focus commands now accept ref alongside target_id so the server can reference nodes it saw in <ui_state>.

UIAgent now injects the latest <ui_state> snapshot into the LLM context at the start of every task request, so the agent always reasons over the current screen. Controlled by the new auto_inject_ui_state constructor option (default True). Apps that override on_task_request pick up the behavior via super(). Adds UI_STATE_PROMPT_GUIDE, a canonical prompt fragment that documents the <ui_state> and <ui_event> wire format. Apps concatenate it into their system prompt so the LLM understands the SDK-managed developer messages it receives, and we can evolve the format in one place. Removes the spike-only log_snapshots option, _log_snapshot_receipt, _previous_snapshot_root_ref, and the now-unused _count_nodes helper.

Adds an opt-in ``log_snapshots`` constructor flag that emits a ``logger.debug`` line on every accessibility snapshot received (node count, char count, rough token estimate, and the full rendered ``<ui_state>``). Defaults to off. Useful in dev / staging for eyeballing what the LLM will see before the next inject. Ships ``ScrollToToolMixin`` alongside ``UIAgent`` in a new ``ui_tools`` module. Apps that want the LLM to be able to scroll offscreen elements into view inherit the mixin, which exposes a ``scroll_to(ref)`` tool. The tool dispatches a standard ``ScrollTo`` command by ref; keeping it as a mixin (not a base-class method) means single-screen apps don't get a ``scroll_to`` tool cluttering their LLM tool list.

Mirrors the ScrollTo mixin: apps that want the LLM to be able to point at visible elements ("which one is Radiohead?") inherit ``HighlightToolMixin`` alongside ``UIAgent``. The mixin exposes a ``highlight(ref)`` tool that dispatches the standard ``Highlight`` command by snapshot ref. The client's ``useStandardHighlightHandler`` (or a custom one) does the visual effect. Like the scroll mixin, keeping this as a separate mixin means agents that don't need visual highlighting don't pay the tool-list cost. ``test_ui_tools`` grows from 3 to 7 cases: per-mixin coverage (expose-tool, plain-agent-doesn't, dispatch-by-ref) plus a combined-mixin check.

Adds `current_task` tracking and a `respond_to_task(...)` helper on UIAgent so `@tool` methods can complete the in-flight task without threading the task id through every call. The shipped mixin tools (`scroll_to`, `highlight`) now dispatch the UI command, complete the task with no `speak` field, and exit silently. The visual change on the client is the user-facing feedback; apps that want spoken narration override the mixin tool and pass `speak`.

@tool

The user asks the assistant to research a topic. The UI agent spawns a background asyncio task that runs user_task_group(...) across three worker agents (Wikipedia, news, scholar). The SDK auto-forwards every task lifecycle event to the client as ui.task envelopes — group_started, task_update, task_completed, group_completed — and the client renders an in-flight card with per-worker status. The user can cancel any group via a Cancel button on the card; the SDK ships the __cancel_task event back to the agent which calls cancel_task() on the registered group. The custom @tool reply has a research_query field. When set, the tool spawns the task group via create_asyncio_task and returns immediately with the spoken acknowledgement ("Researching X now"). The voice agent isn't blocked; it can handle follow-up turns while workers run in the background. Workers are simulated BaseAgent subclasses with on_task_request that emit a few send_task_update progress messages followed by a send_task_response with a canned summary. asyncio.sleep with randomized intervals makes the streaming UI come alive without needing real data sources. This rounds out the demo arc: hello-snapshot (read), pointing (visual point), deixis (text point), form-fill (state-changing actions), async-tasks (fan-out + streaming progress + cancel). Each demo is a one- or two-line composition of the SDK primitives plus a focused page.

The canonical fire-and-forget pattern with user_task_group required ceremony: a separate method to host the async with, a unique asyncio task name string, and a pass body for the context manager. The async-tasks demo shows the pattern in full: self.create_asyncio_task(self._run_research(query), f"...") ... async def _run_research(self, query): async with self.user_task_group(...): pass Replace with one call: await self.start_user_task_group( "wikipedia", "news", "scholar", payload={"query": query}, label=f"Research: {query}", ) Returns the task_id once the group_started envelope has fired (so the client renders immediately) and runs the context to completion in a background asyncio task the SDK manages. The context-manager form stays available for callers that want to consume worker events inline (async for event in tg). Updates the async-tasks demo to use the new helper. Drops the _run_research method and the create_asyncio_task ceremony from the LLM tool body; the reply tool body is now ~10 lines shorter.

…ancelled When the client cancels an in-flight user_task_group, the group's wait() raises TaskGroupError on __aexit__ ("user requested" or similar reason). For the context-manager form that bubbles to the caller as expected. For start_user_task_group's background runner it was bubbling to the asyncio task manager and getting logged as an unexpected exception. Cancellation is an expected exit for fire-and-forget groups: the client already knows because it received the group_completed envelope. Catch TaskGroupError around the __aexit__ call and log at debug. Other exceptions still log at warning. Also restructures the runner so iteration exceptions are forwarded into __aexit__ correctly (not swallowed by a finally that calls __aexit__(None, None, None) instead). Adds a regression test that triggers the exact path: start_user_task_group with a slow worker, cancel, and verify the group_completed envelope still publishes and the task group is cleaned up — without leaking TaskGroupError to the test runner.

Combines every prior demo's pattern into one workspace where the user reviews a draft article by voice. The user can: - Select a paragraph and ask for review. ReviewAgent calls start_review(answer, paragraph_ref, paragraph_text), which spawns two specialist worker agents (clarity, tone) via start_user_task_group. Workers stream progress to an in-flight card. As each completes, on_task_response intercepts and emits an add_note custom command, attaching the worker's feedback to the paragraph as a note. - Dictate notes by voice. Reply tool's fills + click drive the textarea and Save button, same as form-fill. - Ask "where does it talk about X" and the agent uses select_text + scroll_to to navigate, same as deixis write direction. - Click any note in the panel. Client emits a note_click UI event; the agent's @on_ui_event("note_click") handler dispatches select_text to jump to the related paragraph. Round-trip event/command pattern. Demonstrates two patterns no prior demo touched: - Custom UI command (add_note) registered locally on the client. The server emits it via send_command; the client's handler renders a note card. Apps register their own command names freely. - Custom client-emitted event (note_click). When the user clicks a note, the client calls ui.sendEvent("note_click", {ref}); the agent's @on_ui_event handler reacts. Two LLM tools coexist: reply (from ReplyToolMixin, for normal turns) plus a custom start_review (for paragraph review kick-off). The prompt steers the model to pick the right one. Single tool call per turn — no chainable coordination problems. Workers are simulated, like async-tasks: they compute simple text metrics (word count, sentence count, presence of absolutist or hedging words) and emit templated feedback that varies meaningfully per paragraph. The demo is about orchestration, not real NLP. A real app swaps the workers for LLMAgent subclasses without changing anything else. The article is a 6-paragraph draft with deliberately uneven paragraphs: ¶2 too dense, ¶3 too vague, ¶4 absolutist tone, ¶5/¶6 balanced. So the workers actually have something to flag.

The form-submit handler was trying to find the ref by walking up selection.anchorNode looking for a dataset.ref attribute the walker never sets. So notes always showed up unattached. Fix: - Track lastArticleRef on every selectionchange that lands inside the article column. Filtered to article ancestors so subsequent textarea selection (when the user or agent types) doesn't overwrite it. - Use the new findRefForElement client SDK helper to resolve selection ancestors → ref. Walks up parentElement to find the closest snapshot-known container. - Submit handler reads lastArticleRef instead of trying to derive the ref from the live selection at submit time. This works for both the manual flow (user types + clicks Save) and the voice flow (agent fills textarea + clicks Save) because the textarea focus doesn't clear lastArticleRef. Result: select a paragraph, type a note (or dictate one), the note attaches to that paragraph. Click the note in the panel and the page jumps back to the paragraph it's attached to. Closes the loop between dictation and deixis. Tightens the prompt's note-flow description to reflect the actual behavior — the client tracks selection across the textarea fill, so the agent doesn't need to thread the ref through the tool call.

Two related changes from real-session testing: 1. ReviewAgent now defaults to keep_history=True so the UI agent can resolve conversational deixis. The "can we have a note for that?" case fails when keep_history=False because the UI agent's context is fresh per turn — "that" has no antecedent. With keep_history=True the UI agent sees its own prior replies and resolves the reference correctly. Constructor signature now accepts keep_history as an explicit kwarg so apps that want fresh-per-turn can override. This is the right default for multi-turn interactive apps where the user and agent work through something together (review, iterate, refine). The original keep_history=False default on UIAgent stays correct for stateless-delegate apps (pointing, form-fill, async-tasks) where each turn is "given the current screen, do X" with no carryover. 2. UI prompt now starts with a "hard rule" requiring every turn to call exactly one tool (reply or start_review). The earlier prompt described both tools but never required calling one; for open questions like "how can we improve it?" gpt-4o-mini would sometimes produce 50 tokens of plain text and the voice agent's task() would time out. The hard rule plus an explicit "general questions go through reply" decision rule prevents that. Caught in the same session: ReviewAgent's prior __init__ signature hardcoded keep_history=False with no kwarg passthrough, so call sites trying to flip it (or even pass it) would TypeError during add_agent inside on_client_ready and silently abort the rest of the handler — which left the registry without ui/clarity/tone agents and surfaced as "agents not ready within timeout" much later when the voice agent first delegated. Worth knowing that exceptions in RTVI event handlers don't propagate noisily.

Indexes the six demos in difficulty order (hello-snapshot → pointing → deixis → form-fill → async-tasks → document-review) with a one-paragraph summary of what each shows. Defers per-demo specifics to each demo's own README. Also covers shared concerns once: how to run any demo (the npm + uv pattern is the same everywhere), the API keys all demos need, and a quick map back to the SDK's public surface for readers exploring the directory cold.

The project uses changelog/<PR>.<type>.md fragments per PR (see existing 19.changed.md and 20.added.md); CHANGELOG.md gets compiled at release time. The earlier direct edit to CHANGELOG.md's [Unreleased] section short-circuited that flow. Moves the content to changelog/18.added.md and reverts CHANGELOG.md to match main.

Two correctness issues raised in code review: 1. Single _current_task slot races under concurrent dispatch. Two overlapping on_task_request calls overwrite each other's task handle, so the first task's tool calls respond_to_task() with the wrong task_id. Even if the slot were per-request (e.g. ContextVar), the agent has only one LLM context and one running pipeline; concurrent processing would still interleave context mutations and corrupt the conversation. 2. The keep_history=False reset wipes any messages pre-seeded via context= on LLMContextAgent's constructor, contradicting the inherited contract. Fixes (1) by acquiring an asyncio.Lock in on_task_request and holding it until respond_to_task fires. Concurrent submissions queue and process in arrival order. The lock release lives in respond_to_task; a tool that forgets to call it will hang the agent on the next task, which is the correct fast-surfacing signal that something is wrong (no watchdog: it would mask the bug). Fixes (2) via documentation in the keep_history docstring, calling out that persistent app instructions belong in the LLM's system_instruction setting (which lives outside the context message list and is unaffected by the reset). Adds test_concurrent_task_requests_serialize covering the overlap case.

The constructor's context= arg doc previously said it was for seeding 'an initial system prompt or message history,' which conflicts with the default keep_history=False reset behavior. Updates to call out that seeded messages are part of mutable task history and get cleared, and points readers to system_instruction for anything durable. Same clarification in reset_context()'s docstring (seeded messages ARE affected by the reset). And keep_history's note now points at UI_STATE_PROMPT_GUIDE as a concrete example of what to put in system_instruction.

BaseAgent._handle_task_cancel sends the CANCELLED response directly via send_task_response and bypasses respond_to_task, which is where the lock release lives. Without an on_task_cancelled hook the lock would stay held after a cancellation and every subsequent UI task request would block at on_task_request's acquire forever. Override on_task_cancelled to clear _current_task and release _task_lock when the cancelled task_id matches the in-flight one. Idempotent and race-safe: the current_task identity check makes it a no-op when respond_to_task fired first, and the locked() guard makes the release safe regardless of what cleared the slot. Adds two focused tests: - cancellation_releases_lock_for_subsequent_tasks: the bug Codex flagged; a follow-up task must not block. - cancellation_for_unrelated_task_id_leaves_lock_held: confirms we only react to cancels that match the current task.

The UI Agent Protocol wire format (envelope-type strings, reserved event names, task-lifecycle kind discriminators, and the built-in command payload dataclasses) now lives in pipecat.processors.frameworks.rtvi.ui as of pipecat-ai 1.2.0. Single-LLM Pipecat apps and other frameworks can now target the same wire format without taking a subagents dependency. Subagents continues to re-export the same names from pipecat_subagents.bus and pipecat_subagents.agents so existing imports keep working; the canonical definitions just moved. Also replaces the inline 'group_started' / 'task_update' / etc. string literals in the UI bridge with the new UI_TASK_*_KIND constants from pipecat.

These bus messages are subagents-internal carriers exchanged only between UIAgent and the bridge installed by attach_ui_bridge. They have no use outside the UI subpackage. Co-locating them with the agent that consumes them removes a layering wart in bus/messages.py and matches the directory shape of the rest of the UI surface. Removes the BusUI* re-exports from pipecat_subagents.bus and adds them under pipecat_subagents.agents.ui (and at agents.ui.ui_messages for direct import). All internal callers and tests updated to the new path.

Pipecat-ai 1.2.0 promotes the UI Agent Protocol to first-class RTVI message types (ui-event, ui-command, ui-snapshot, ui-cancel-task, ui-task) instead of sub-types carried inside server-message / client-message. Update the bridge to match: - attach_ui_bridge subscribes to on_ui_message on the RTVI processor (instead of on_client_message). UIEventMessage, UISnapshotMessage, and UICancelTaskMessage are translated onto the bus as BusUIEventMessage carriers; the snapshot and cancel-task carry subagents-internal event names so UIAgent's existing dispatch keeps working. - Outbound: BusUICommandMessage and the four BusUITask* messages are emitted as RTVIServerTypedMessageFrame wrapping UICommandMessage / UITaskMessage envelopes (instead of RTVIServerMessageFrame with a dict). Subagents-internal _UI_SNAPSHOT_BUS_EVENT_NAME and _UI_CANCEL_TASK_BUS_EVENT_NAME constants in agents/ui/ui_messages replace the wire-format reserved event names (which were public constants in pipecat). UIAgent dispatches on these internal names. Tests updated: bridge tests cover the new UI message inputs and the typed frame outputs; the cancel-task tests now construct UICancelTaskMessage instead of forging a ui-event with a reserved name. All 282 subagents tests pass against the local pipecat checkout.

Pipecat's RTVI now ships RTVIUICommandFrame and RTVIUITaskFrame as domain-scoped pipeline frames, mirroring how RTVIServerMessageFrame and the LLM/TTS frames work: the frame carries domain data, the observer wraps it into the matching typed RTVI envelope before sending. Switch the bridge over to push these instead of the generic RTVIServerTypedMessageFrame. The generic typed-message frame is gone from pipecat. This is a better fit with the rest of the RTVI surface: a reader doesn't have to inspect what's inside the frame to know what's being sent, the frame name itself tells them. Symmetric with how LLM events flow (LLMFunctionCallStartedFrame produces an llm-function-call-started envelope inside the observer).

The constant is going away on the pipecat side (it was redundant with the Literal[...] field default on UICommandMessage). Drops the import and the corresponding test assertion, and trims the matching mention from the changelog fragment.

Temporary [tool.uv.sources] override so reviewers and CI can resolve ``pipecat-ai>=1.2.0`` from the open wire-format PR before pipecat 1.2.0 ships on PyPI. uv strips [tool.uv.sources] when building the distribution, so this is install-time-only and does not affect the published package. Companion PR: pipecat-ai/pipecat#4407 Drop this commit (or just the [tool.uv.sources] block) before merging once pipecat 1.2.0 is on PyPI.

A model that emits a non-dict entry in ``fills`` (or a non-string ref in ``highlight`` / ``click``) would have crashed the tool body before ``respond_to_task`` ran. Because UIAgent acquires the single-flight task lock in ``on_task_request`` and only releases it via ``respond_to_task`` (or the cancellation path), an unhandled exception in the tool would have stranded the lock until the voice-agent's 30s task timeout fired ``on_task_cancelled`` — 30s of UI deadlock for what's almost always a transient model hiccup. Skip non-conforming entries instead. The fix is at the LLM-input boundary (the contents of the list arguments) rather than a broad try/finally so that real bugs in the helpers still surface as exceptions. Adds three regression tests covering the non-dict ``fills``, non-string ``highlight``, and non-string ``click`` cases. Each one asserts the critical invariant: ``respond_to_task`` and ``result_callback`` still run, so the lock is released. Reported by Codex review of #18.

A refresh of the v1 primer that lands the moving parts since the original was written: - Wire format moved to Pipecat as canonical (single-LLM apps don't need subagents). Layered "the pieces" treatment of all four packages plus the reference app, with an architecture diagram. - Full action vocabulary: select_text, click, set_input_value alongside the original scroll_to / highlight / focus / toast / navigate. Tied to a "what it enables" table that maps user capabilities to wire-format pieces. - Task lifecycle protocol (start_user_task_group / ui-task / useUITasks) treated as a first-class deployment dimension. - Two orthogonal deployment knobs (history mode + task shape) called out so the right corner is easy to pick. - "When (not) to use this" rewritten as app-shape fit followed by deployment-shape decision (single LLM / voice+UI / multi-agent). v1 stays in place at UI_AGENT_DESIGN.md. v2 keeps the same audience (internal team) and tone (conversational primer); v1 is referenced for the longer "How information flows" treatment and the other two sequence diagrams.

markbackman force-pushed the mb/ui-agent branch 6 times, most recently from 74c6a39 to 8208302 Compare May 1, 2026 21:43

markbackman requested a review from aconchillo May 1, 2026 21:45

markbackman marked this pull request as ready for review May 1, 2026 21:45

markbackman mentioned this pull request May 1, 2026

Use UIAgent markbackman/pipecat-music-player#1

Open

6 tasks

aconchillo reviewed May 1, 2026

View reviewed changes

Comment thread src/pipecat_subagents/bus/messages.py Outdated

markbackman force-pushed the mb/ui-agent branch 2 times, most recently from 5189cd9 to 1394ec4 Compare May 2, 2026 13:17

markbackman changed the title ~~UI agent POC~~ UIAgent May 2, 2026

markbackman force-pushed the mb/ui-agent branch 2 times, most recently from bf4ced9 to ba9b84f Compare May 6, 2026 21:42

aconchillo reviewed May 8, 2026

View reviewed changes

aconchillo reviewed May 9, 2026

View reviewed changes

markbackman added 8 commits May 21, 2026 08:54

Add CHANGELOG with UIAgent SDK entries

ea04181

Lint fixes for UIAgent + ui_prompts + tests

cc8e2f0

markbackman added 25 commits May 21, 2026 08:54

Fix linting

719dd88

Update design doc

489aeaa

refactor(ui-agent): align with Pipecat RTVI helpers

9dbfa7a

Update UI agent client API references

c3030de

Update pipecat-aito use main

9e38c5c

Update examples for latest client-js changes

8a8e819

markbackman force-pushed the mb/ui-agent branch 2 times, most recently from d67ffbf to 03ecd79 Compare May 21, 2026 13:01

Use pipecat-ai 1.2.1

7e299e9

markbackman force-pushed the mb/ui-agent branch from 03ecd79 to 7e299e9 Compare May 21, 2026 13:01

Update ui-agent examples to use the latest pipecat-ai client versions

7483460

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UIAgent#18

UIAgent#18
markbackman wants to merge 57 commits into
mainfrom
mb/ui-agent

markbackman commented Apr 25, 2026 •

edited

Loading

Uh oh!

codecov-commenter commented Apr 25, 2026 •

edited

Loading

Uh oh!

Uh oh!

aconchillo May 8, 2026

Uh oh!

aconchillo May 8, 2026

Uh oh!

aconchillo May 8, 2026

Uh oh!

aconchillo May 8, 2026

Uh oh!

aconchillo May 9, 2026

Uh oh!

aconchillo May 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

markbackman commented Apr 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's added

For reviewers

Test plan

Uh oh!

codecov-commenter commented Apr 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

aconchillo May 8, 2026

Choose a reason for hiding this comment

Uh oh!

aconchillo May 8, 2026

Choose a reason for hiding this comment

Uh oh!

aconchillo May 8, 2026

Choose a reason for hiding this comment

Uh oh!

aconchillo May 8, 2026

Choose a reason for hiding this comment

Uh oh!

aconchillo May 9, 2026

Choose a reason for hiding this comment

Uh oh!

aconchillo May 9, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

markbackman commented Apr 25, 2026 •

edited

Loading

codecov-commenter commented Apr 25, 2026 •

edited

Loading