From 6a766e3fcd77205db7e62deeb54619b17aef8283 Mon Sep 17 00:00:00 2001
From: "github-actions[bot]" <github-actions[bot]@users.noreply.github.com>
Date: Fri, 8 May 2026 21:01:19 +0000
Subject: [PATCH] docs: update Gemini Live for async tool support (PR #4448)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

- Update example file references (function-calling → async-tool / vertex)
- Document cancel_on_interruption=False support with NON_BLOCKING on Gemini 2.x
- Note Gemini 3.x warning for async tools
- Document Vertex AI limitation (no NON_BLOCKING support)
---
 api-reference/server/services/s2s/gemini-live-vertex.mdx | 3 ++-
 api-reference/server/services/s2s/gemini-live.mdx        | 5 +++--
 2 files changed, 5 insertions(+), 3 deletions(-)
diff --git a/api-reference/server/services/s2s/gemini-live-vertex.mdx b/api-reference/server/services/s2s/gemini-live-vertex.mdx
index cc28d1c4..2cca2a0d 100644
--- a/api-reference/server/services/s2s/gemini-live-vertex.mdx
+++ b/api-reference/server/services/s2s/gemini-live-vertex.mdx
@@ -24,7 +24,7 @@ description: "A real-time, multimodal conversational AI service powered by Googl
   <Card
     title="Example Implementation"
     icon="play"
-    href="https://github.com/pipecat-ai/pipecat/blob/main/examples/realtime/realtime-gemini-live-vertex-function-calling.py"
+    href="https://github.com/pipecat-ai/pipecat/blob/main/examples/realtime/realtime-gemini-live-vertex.py"
   >
     Complete Gemini Live Vertex AI function calling example
   </Card>
@@ -252,4 +252,5 @@ llm = GeminiLiveVertexLLMService(
 - **Authentication priority**: The service tries credentials in this order: (1) `credentials` JSON string, (2) `credentials_path` file, (3) Application Default Credentials (ADC).
 - **File API not supported**: The Gemini File API is not available through Vertex AI. Use Google Cloud Storage for file handling instead.
 - **Model naming**: Vertex AI uses different model identifiers (e.g., `"google/gemini-live-2.5-flash-native-audio"`) compared to the Google AI variant.
+- **Async tool limitation**: Vertex AI's Gemini Live endpoint does not currently support NON_BLOCKING tool calls. Functions registered with `cancel_on_interruption=False` will log a one-time warning and fall back to synchronous behavior (the conversation pauses while the tool runs). Use `cancel_on_interruption=True` (the default) or use a non-realtime LLM service if your tool requires async semantics.
 - **All other features** (VAD, context compression, thinking, function calling, etc.) work identically to the base [Gemini Live](/api-reference/server/services/s2s/gemini-live) service.
diff --git a/api-reference/server/services/s2s/gemini-live.mdx b/api-reference/server/services/s2s/gemini-live.mdx
index c69743b9..708af688 100644
--- a/api-reference/server/services/s2s/gemini-live.mdx
+++ b/api-reference/server/services/s2s/gemini-live.mdx
@@ -23,9 +23,9 @@ description: "A real-time, multimodal conversational AI service powered by Googl
   <Card
     title="Example Implementation"
     icon="play"
-    href="https://github.com/pipecat-ai/pipecat/blob/main/examples/realtime/realtime-gemini-live-function-calling.py"
+    href="https://github.com/pipecat-ai/pipecat/blob/main/examples/realtime/realtime-gemini-live-async-tool.py"
   >
-    Complete Gemini Live function calling example
+    Complete Gemini Live async tool calling example
   </Card>
   <Card
     title="Gemini Documentation"
@@ -321,6 +321,7 @@ llm = GeminiLiveLLMService(
 ## Notes
 
 - **Model support**: The service supports both Gemini 2.5 and Gemini 3.x models. The service automatically detects and handles model-specific behavior.
+- **Async tool support**: Functions registered with `cancel_on_interruption=False` use Gemini's NON_BLOCKING tool mechanism on models that support it (currently Gemini 2.x), allowing the conversation to continue while the tool runs in the background. The result is delivered via the async-tool mechanism and integrated into the model's next turn. On models that don't support NON_BLOCKING (Gemini 3.x), the service logs a one-time warning explaining the limitation. Note: An intermittent 1008 error can occasionally occur on Gemini 2.5 during long-running tool calls; the service auto-reconnects when this happens.
 - **System instruction precedence**: The `system_instruction` from service settings takes precedence over an initial system message in the LLM context. A warning is logged when both are set.
 - **VAD modes**: By default, Gemini Live uses server-side VAD for detecting when the user starts and stops speaking. To use local VAD (e.g., Silero), set `vad=GeminiVADParams(disabled=True)` and configure an external VAD analyzer in your `LLMUserAggregatorParams`. The service will automatically send activity signals to the Gemini API when local VAD detects speech.
 - **Tools precedence**: Similarly, tools provided in the context override tools provided at init time.