Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 10 additions & 7 deletions api-reference/server/services/s2s/inworld.mdx
Original file line number Diff line number Diff line change
@@ -1,12 +1,14 @@
---
title: "Inworld Realtime"
description: "Real-time speech-to-speech service implementation using Inworld's Realtime API"
description: "Real-time speech-to-speech service powered by Inworld's Realtime TTS-2"
---

## Overview

`InworldRealtimeLLMService` provides real-time, multimodal conversation capabilities using Inworld's Realtime API. It operates as a cascade STT/LLM/TTS pipeline under the hood with built-in semantic voice activity detection (VAD) for turn management, offering low-latency speech-to-speech interactions with integrated LLM processing and function calling.

Speech synthesis defaults to **Realtime TTS-2** (`inworld-tts-2`). Realtime TTS-1.5-Max (`inworld-tts-1.5-max`) and Realtime TTS-1.5-Mini (`inworld-tts-1.5-mini`) remain available via the `tts_model` parameter.

<CardGroup cols={2}>
<Card
title="Inworld Realtime API Reference"
Expand Down Expand Up @@ -91,9 +93,10 @@ Before using Inworld Realtime services, you need:
`session_properties.audio.output.voice`.
</ParamField>

<ParamField path="tts_model" type="str" default="inworld-tts-1.5-max">
TTS model to use (e.g. "inworld-tts-1.5-max"). Shorthand for
`session_properties.audio.output.model`.
<ParamField path="tts_model" type="str" default="inworld-tts-2">
TTS model to use. Defaults to Realtime TTS-2 (`inworld-tts-2`). Other options:
Realtime TTS-1.5-Max (`inworld-tts-1.5-max`), Realtime TTS-1.5-Mini
(`inworld-tts-1.5-mini`). Shorthand for `session_properties.audio.output.model`.
</ParamField>

<ParamField path="stt_model" type="str" default="assemblyai/u3-rt-pro">
Expand Down Expand Up @@ -172,7 +175,7 @@ The `audio` field in `SessionProperties` accepts an `AudioConfiguration` with `i
| Parameter | Type | Default | Description |
| --------- | ------------- | ------- | -------------------------------------------------- |
| `format` | `AudioFormat` | `None` | Output audio format. Same format options as input. |
| `model` | `str` | `None` | TTS model to use (e.g. "inworld-tts-1.5-max"). |
| `model` | `str` | `None` | TTS model to use (e.g. "inworld-tts-2"). |
| `voice` | `str` | `None` | Voice ID (e.g. "Sarah", "Clive"). |

Inworld PCM audio supports sample rates: 8000, 16000, 24000, 32000, 44100, and 48000 Hz.
Expand Down Expand Up @@ -217,7 +220,7 @@ llm = InworldRealtimeLLMService(
api_key=os.getenv("INWORLD_API_KEY"),
llm_model="openai/gpt-4.1-nano",
voice="Sarah",
tts_model="inworld-tts-1.5-max",
tts_model="inworld-tts-2",
stt_model="assemblyai/universal-streaming-multilingual",
)
```
Expand Down Expand Up @@ -255,7 +258,7 @@ session_properties = SessionProperties(
),
output=AudioOutput(
format=PCMAudioFormat(rate=24000),
model="inworld-tts-1.5-max",
model="inworld-tts-2",
voice="Sarah",
),
),
Expand Down
18 changes: 14 additions & 4 deletions api-reference/server/services/tts/inworld.mdx
Original file line number Diff line number Diff line change
@@ -1,12 +1,22 @@
---
title: "Inworld"
description: "Text-to-speech service using Inworld AI's TTS APIs"
description: "Text-to-speech service using Inworld AI's Realtime TTS-2 (and TTS-1.5) models"
---

## Overview

Inworld provides high-quality, low-latency speech synthesis via two implementation types: `InworldTTSService` for real-time, minimal-latency use-cases through websockets and `InworldHttpTTSService` for streaming and non-streaming use-cases over HTTP. Featuring support for 12+ languages, timestamps, custom pronunciation and instant voice cloning.

## Models

The default model is **Realtime TTS-2** (`inworld-tts-2`). Realtime TTS-1.5-Max (`inworld-tts-1.5-max`) and Realtime TTS-1.5-Mini (`inworld-tts-1.5-mini`) remain available.

| Display name | Model ID |
| ------------------------- | ----------------------- |
| Realtime TTS-2 _(default)_ | `inworld-tts-2` |
| Realtime TTS-1.5-Max | `inworld-tts-1.5-max` |
| Realtime TTS-1.5-Mini | `inworld-tts-1.5-mini` |

<CardGroup cols={2}>
<Card
title="Inworld TTS API Reference"
Expand Down Expand Up @@ -75,7 +85,7 @@ WebSocket-based service for lowest latency streaming.
`settings=InworldTTSService.Settings(voice=...)` instead._
</ParamField>

<ParamField path="model" type="str" default="inworld-tts-1.5-max" deprecated>
<ParamField path="model" type="str" default="inworld-tts-2" deprecated>
ID of the model to use for synthesis. _Deprecated in v0.0.105. Use
`settings=InworldTTSService.Settings(model=...)` instead._
</ParamField>
Expand Down Expand Up @@ -155,7 +165,7 @@ HTTP-based service supporting both streaming and non-streaming modes.
`settings=InworldHttpTTSService.Settings(voice=...)` instead._
</ParamField>

<ParamField path="model" type="str" default="inworld-tts-1.5-max" deprecated>
<ParamField path="model" type="str" default="inworld-tts-2" deprecated>
ID of the model to use for synthesis. _Deprecated in v0.0.105. Use
`settings=InworldHttpTTSService.Settings(model=...)` instead._
</ParamField>
Expand Down Expand Up @@ -208,7 +218,7 @@ tts = InworldTTSService(
api_key=os.getenv("INWORLD_API_KEY"),
settings=InworldTTSService.Settings(
voice="Ashley",
model="inworld-tts-1.5-max",
model="inworld-tts-2",
temperature=0.8,
speaking_rate=1.1,
),
Expand Down
Loading