pipecat-ai · markbackman · May 8, 2026
diff --git a/api-reference/server/services/s2s/gemini-live-vertex.mdx b/api-reference/server/services/s2s/gemini-live-vertex.mdx
@@ -24,7 +24,7 @@ description: "A real-time, multimodal conversational AI service powered by Googl
   <Card
     title="Example Implementation"
     icon="play"
-    href="https://github.com/pipecat-ai/pipecat/blob/main/examples/realtime/realtime-gemini-live-vertex-function-calling.py"
+    href="https://github.com/pipecat-ai/pipecat/blob/main/examples/realtime/realtime-gemini-live-vertex.py"
   >
     Complete Gemini Live Vertex AI function calling example
   </Card>
@@ -252,4 +252,5 @@ llm = GeminiLiveVertexLLMService(
 - **Authentication priority**: The service tries credentials in this order: (1) `credentials` JSON string, (2) `credentials_path` file, (3) Application Default Credentials (ADC).
 - **File API not supported**: The Gemini File API is not available through Vertex AI. Use Google Cloud Storage for file handling instead.
 - **Model naming**: Vertex AI uses different model identifiers (e.g., `"google/gemini-live-2.5-flash-native-audio"`) compared to the Google AI variant.
+- **Async tool limitation**: Vertex AI's Gemini Live endpoint does not currently support NON_BLOCKING tool calls. Functions registered with `cancel_on_interruption=False` will log a one-time warning and fall back to synchronous behavior (the conversation pauses while the tool runs). Use `cancel_on_interruption=True` (the default) or use a non-realtime LLM service if your tool requires async semantics.
 - **All other features** (VAD, context compression, thinking, function calling, etc.) work identically to the base [Gemini Live](/api-reference/server/services/s2s/gemini-live) service.
diff --git a/api-reference/server/services/s2s/gemini-live.mdx b/api-reference/server/services/s2s/gemini-live.mdx
@@ -23,9 +23,9 @@ description: "A real-time, multimodal conversational AI service powered by Googl
   <Card
     title="Example Implementation"
     icon="play"
-    href="https://github.com/pipecat-ai/pipecat/blob/main/examples/realtime/realtime-gemini-live-function-calling.py"
+    href="https://github.com/pipecat-ai/pipecat/blob/main/examples/realtime/realtime-gemini-live-async-tool.py"
   >
-    Complete Gemini Live function calling example
+    Complete Gemini Live async tool calling example
   </Card>
   <Card
     title="Gemini Documentation"
@@ -321,6 +321,7 @@ llm = GeminiLiveLLMService(
 ## Notes
 
 - **Model support**: The service supports both Gemini 2.5 and Gemini 3.x models. The service automatically detects and handles model-specific behavior.
+- **Async tool support**: Functions registered with `cancel_on_interruption=False` use Gemini's NON_BLOCKING tool mechanism on models that support it (currently Gemini 2.x), allowing the conversation to continue while the tool runs in the background. The result is delivered via the async-tool mechanism and integrated into the model's next turn. On models that don't support NON_BLOCKING (Gemini 3.x), the service logs a one-time warning explaining the limitation. Note: An intermittent 1008 error can occasionally occur on Gemini 2.5 during long-running tool calls; the service auto-reconnects when this happens.
 - **System instruction precedence**: The `system_instruction` from service settings takes precedence over an initial system message in the LLM context. A warning is logged when both are set.
 - **VAD modes**: By default, Gemini Live uses server-side VAD for detecting when the user starts and stops speaking. To use local VAD (e.g., Silero), set `vad=GeminiVADParams(disabled=True)` and configure an external VAD analyzer in your `LLMUserAggregatorParams`. The service will automatically send activity signals to the Gemini API when local VAD detects speech.
 - **Tools precedence**: Similarly, tools provided in the context override tools provided at init time.