From 727227746c468b56466bcd28cde8759f689dd6ba Mon Sep 17 00:00:00 2001 From: Cale Shapera <25466659+cshape@users.noreply.github.com> Date: Mon, 4 May 2026 15:56:38 -0700 Subject: [PATCH] docs(inworld): default to inworld-tts-2; introduce Realtime TTS naming Flip the documented default for both InworldTTSService / InworldHttpTTSService and InworldRealtimeLLMService.tts_model from inworld-tts-1.5-max to inworld-tts-2. Keep inworld-tts-1.5-max and inworld-tts-1.5-mini as documented options. Add display-naming guidance per partner request: Realtime TTS-2, Realtime TTS-1.5-Max, Realtime TTS-1.5-Mini. --- api-reference/server/services/s2s/inworld.mdx | 17 ++++++++++------- api-reference/server/services/tts/inworld.mdx | 18 ++++++++++++++---- 2 files changed, 24 insertions(+), 11 deletions(-) diff --git a/api-reference/server/services/s2s/inworld.mdx b/api-reference/server/services/s2s/inworld.mdx index a89adec2..63a98e02 100644 --- a/api-reference/server/services/s2s/inworld.mdx +++ b/api-reference/server/services/s2s/inworld.mdx @@ -1,12 +1,14 @@ --- title: "Inworld Realtime" -description: "Real-time speech-to-speech service implementation using Inworld's Realtime API" +description: "Real-time speech-to-speech service powered by Inworld's Realtime TTS-2" --- ## Overview `InworldRealtimeLLMService` provides real-time, multimodal conversation capabilities using Inworld's Realtime API. It operates as a cascade STT/LLM/TTS pipeline under the hood with built-in semantic voice activity detection (VAD) for turn management, offering low-latency speech-to-speech interactions with integrated LLM processing and function calling. +Speech synthesis defaults to **Realtime TTS-2** (`inworld-tts-2`). Realtime TTS-1.5-Max (`inworld-tts-1.5-max`) and Realtime TTS-1.5-Mini (`inworld-tts-1.5-mini`) remain available via the `tts_model` parameter. + - - TTS model to use (e.g. "inworld-tts-1.5-max"). Shorthand for - `session_properties.audio.output.model`. + + TTS model to use. Defaults to Realtime TTS-2 (`inworld-tts-2`). Other options: + Realtime TTS-1.5-Max (`inworld-tts-1.5-max`), Realtime TTS-1.5-Mini + (`inworld-tts-1.5-mini`). Shorthand for `session_properties.audio.output.model`. @@ -172,7 +175,7 @@ The `audio` field in `SessionProperties` accepts an `AudioConfiguration` with `i | Parameter | Type | Default | Description | | --------- | ------------- | ------- | -------------------------------------------------- | | `format` | `AudioFormat` | `None` | Output audio format. Same format options as input. | -| `model` | `str` | `None` | TTS model to use (e.g. "inworld-tts-1.5-max"). | +| `model` | `str` | `None` | TTS model to use (e.g. "inworld-tts-2"). | | `voice` | `str` | `None` | Voice ID (e.g. "Sarah", "Clive"). | Inworld PCM audio supports sample rates: 8000, 16000, 24000, 32000, 44100, and 48000 Hz. @@ -217,7 +220,7 @@ llm = InworldRealtimeLLMService( api_key=os.getenv("INWORLD_API_KEY"), llm_model="openai/gpt-4.1-nano", voice="Sarah", - tts_model="inworld-tts-1.5-max", + tts_model="inworld-tts-2", stt_model="assemblyai/universal-streaming-multilingual", ) ``` @@ -255,7 +258,7 @@ session_properties = SessionProperties( ), output=AudioOutput( format=PCMAudioFormat(rate=24000), - model="inworld-tts-1.5-max", + model="inworld-tts-2", voice="Sarah", ), ), diff --git a/api-reference/server/services/tts/inworld.mdx b/api-reference/server/services/tts/inworld.mdx index 5d1e2c2c..e9269a82 100644 --- a/api-reference/server/services/tts/inworld.mdx +++ b/api-reference/server/services/tts/inworld.mdx @@ -1,12 +1,22 @@ --- title: "Inworld" -description: "Text-to-speech service using Inworld AI's TTS APIs" +description: "Text-to-speech service using Inworld AI's Realtime TTS-2 (and TTS-1.5) models" --- ## Overview Inworld provides high-quality, low-latency speech synthesis via two implementation types: `InworldTTSService` for real-time, minimal-latency use-cases through websockets and `InworldHttpTTSService` for streaming and non-streaming use-cases over HTTP. Featuring support for 12+ languages, timestamps, custom pronunciation and instant voice cloning. +## Models + +The default model is **Realtime TTS-2** (`inworld-tts-2`). Realtime TTS-1.5-Max (`inworld-tts-1.5-max`) and Realtime TTS-1.5-Mini (`inworld-tts-1.5-mini`) remain available. + +| Display name | Model ID | +| ------------------------- | ----------------------- | +| Realtime TTS-2 _(default)_ | `inworld-tts-2` | +| Realtime TTS-1.5-Max | `inworld-tts-1.5-max` | +| Realtime TTS-1.5-Mini | `inworld-tts-1.5-mini` | + - + ID of the model to use for synthesis. _Deprecated in v0.0.105. Use `settings=InworldTTSService.Settings(model=...)` instead._ @@ -155,7 +165,7 @@ HTTP-based service supporting both streaming and non-streaming modes. `settings=InworldHttpTTSService.Settings(voice=...)` instead._ - + ID of the model to use for synthesis. _Deprecated in v0.0.105. Use `settings=InworldHttpTTSService.Settings(model=...)` instead._ @@ -208,7 +218,7 @@ tts = InworldTTSService( api_key=os.getenv("INWORLD_API_KEY"), settings=InworldTTSService.Settings( voice="Ashley", - model="inworld-tts-1.5-max", + model="inworld-tts-2", temperature=0.8, speaking_rate=1.1, ),