Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
121 changes: 121 additions & 0 deletions src/blog/tanstack-ai-audio-generation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
---
title: 'TanStack AI Just Learned to Compose Music'
published: 2026-04-24
excerpt: TanStack AI adds a new generateAudio activity with streaming, plus fal and Gemini Lyria adapters for music, sound effects, text-to-speech, and transcription. One typed API, any provider.
authors:
- Alem Tuzlak
---

![TanStack AI Just Learned to Compose Music](/blog-assets/tanstack-ai-audio-generation/header.png)

The AI audio ecosystem is a mess. Gemini's Lyria wants a natural-language prompt and returns raw PCM you have to wrap in a RIFF header yourself. Fal hosts dozens of audio models where one wants `music_length_ms` in milliseconds, the next wants `seconds_total`, and most want plain `duration`. ElevenLabs has its own shape. Whisper has another. Every provider disagrees on whether you get a URL, a base64 blob, or a raw buffer.

If you are shipping an AI product that needs music, sound effects, speech, or transcription, you end up writing the same boring glue code five times.

**TanStack AI just removed that glue.** The latest release lands a full audio stack: a new `generateAudio` activity, streaming support, fal and Gemini Lyria adapters, and framework hooks for React, Solid, Vue, and Svelte. One typed API, any provider.

Here is what shipped and why you should care.

## One activity, any audio model

The new `generateAudio()` activity sits alongside `generateImage`, `generateSpeech`, `generateVideo`, and `generateTranscription` in `@tanstack/ai`. It takes a text prompt, dispatches to whatever adapter you hand it, and returns a `GeneratedAudio` object with exactly one of `url` or `b64Json`.

```typescript
import { generateAudio } from '@tanstack/ai'
import { geminiAudio } from '@tanstack/ai-gemini/adapters'

const adapter = geminiAudio('lyria-3-pro-preview')

const result = await generateAudio({
adapter,
prompt: 'A cinematic orchestral piece with a rising string motif',
})

// result.audio is { url: string } | { b64Json: string } — exactly one, enforced by the type
```

Swap `geminiAudio` for `falAudio` and the exact same call generates music through MiniMax, DiffRhythm, Stable Audio 2.5, or any of the other models in fal's catalog. The adapter translates per-model details (like fal's `music_length_ms` vs `seconds_total` vs `duration` naming) so your app code never sees them.

## Streaming, because audio generation takes seconds

Music and SFX generation is slow. Lyria 3 Pro takes several seconds. Stable Audio takes longer. If you are building a UI, blocking the request the whole time is a bad experience.

`generateAudio` now supports `stream: true`, returning an `AsyncIterable<StreamChunk>` you can pipe straight through `toServerSentEventsResponse()`:

```typescript
export async function POST(req: Request) {
const { prompt } = await req.json()

const stream = await generateAudio({
adapter: falAudio('fal-ai/minimax-music/v2.6'),
prompt,
stream: true,
})

return toServerSentEventsResponse(stream)
}
```

The client receives progress events and the final audio over a single SSE connection, the same transport model already used by `generateImage` and `generateVideo`. No new infrastructure, no special-case code paths.

## Framework hooks that feel like the others

Every framework integration gets a new hook matching the existing media-hook shape:

- `@tanstack/ai-react`: `useGenerateAudio`
- `@tanstack/ai-solid`: `useGenerateAudio`
- `@tanstack/ai-vue`: `useGenerateAudio`
- `@tanstack/ai-svelte`: `createGenerateAudio`

The API is identical to `useGenerateImage` and friends:

```tsx
import { useGenerateAudio } from '@tanstack/ai-react'

function MusicGen() {
const { generate, result, isLoading, error, stop, reset } = useGenerateAudio({
connection,
})

return (
<>
<button onClick={() => generate({ prompt: 'Lo-fi hip-hop beat' })}>
Generate
</button>
{isLoading && <button onClick={stop}>Stop</button>}
{result?.audio.url && <audio src={result.audio.url} controls />}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

In TypeScript, for a union type like { url: string } | { b64Json: string }, is result.audio.url valid without narrowing? Please provide canonical docs/examples.

💡 Result:

No, result.audio.url is not valid without narrowing on a union type like { url: string } | { b64Json: string } for result.audio. TypeScript only allows access to properties common to ALL members of a union type. Here, url exists only on the first member, while b64Json exists only on the second—there are no common properties beyond structural compatibility. Attempting audio.url produces the error: "Property 'url' does not exist on type '{ url: string; } | { b64Json: string; }". This prevents runtime errors if audio is the { b64Json: string } variant. To access url safely, narrow the type first using a type guard, such as the in operator: if ('url' in result.audio) { // TypeScript narrows to { url: string } console.log(result.audio.url); // Valid } else { // TypeScript narrows to { b64Json: string } console.log(result.audio.b64Json); } For better ergonomics with multiple variants, use discriminated unions by adding a literal discriminant property (e.g., kind: 'url' | 'b64'): type Audio = { kind: 'url'; url: string } | { kind: 'b64'; b64Json: string }; Then narrow with switch (result.audio.kind) { ... }. This behavior is unchanged as of TypeScript 5.x (current in 2026).

Citations:


🏁 Script executed:

# Read the file around the relevant lines
cat -n src/blog/tanstack-ai-audio-generation.md | sed -n '30,90p'

Repository: TanStack/tanstack.com

Length of output: 2685


Fix TypeScript type error in the React snippet

Line 86 accesses result.audio.url directly on the union type declared on line 34 ({ url: string } | { b64Json: string }). TypeScript does not allow this without narrowing—the code will fail type-checking with "Property 'url' does not exist on type '{ url: string; } | { b64Json: string; }'" because url is not a common property across all union members.

Narrow the type using the in operator before accessing the property:

Fix
-      {result?.audio.url && <audio src={result.audio.url} controls />}
+      {result?.audio && 'url' in result.audio && (
+        <audio src={result.audio.url} controls />
+      )}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
{result?.audio.url && <audio src={result.audio.url} controls />}
{result?.audio && 'url' in result.audio && (
<audio src={result.audio.url} controls />
)}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/blog/tanstack-ai-audio-generation.md` at line 86, The snippet accesses
result.audio.url on a union type ({ url: string } | { b64Json: string }) without
narrowing; update the conditional to narrow the union before accessing url
(e.g., check that result and result.audio exist AND that "url" is in
result.audio using the in operator or a type guard) so you only render <audio>
when result.audio has a url property; refer to result, audio, url and b64Json to
locate the code and apply the in-based or guard-based type narrowing.

</>
)
}
```

Both `connection` (SSE) and `fetcher` (plain HTTP) transports are supported, so this works with TanStack Start, Next.js, Remix, or any backend you already have.

## Providers that shipped in this release

**Gemini** gets two new entry points:

- `geminiAudio()` for Lyria 3 Pro and Lyria 3 Clip music generation. Lyria Pro reads duration from the natural-language prompt; Clip is fixed at 30 seconds and returns MP3.
- A new `gemini-3.1-flash-tts-preview` TTS model with 70+ languages, 200+ audio tags, and multi-speaker dialogue via `multiSpeakerVoiceConfig`.

**Fal** gets three tree-shakeable adapters:

- `falSpeech()` for TTS via `fal-ai/gemini-3.1-flash-tts`, `fal-ai/minimax/speech-2.6-hd`, and the `fal-ai/kokoro/*` family.
- `falTranscription()` for STT via `fal-ai/whisper`, `fal-ai/wizper`, and `fal-ai/speech-to-text/turbo`.
- `falAudio()` for music, SFX, and the wider fal catalog: audio-to-audio, voice conversion and cloning, enhancement, separation, isolation, understanding, and merge.

All four follow the tree-shakeable subpath-import pattern, so your bundle only grows by the adapters you actually import.

## Try it

The new activity is live in `@tanstack/ai` and the two provider packages:

```bash
pnpm add @tanstack/ai @tanstack/ai-fal
# or
pnpm add @tanstack/ai @tanstack/ai-gemini
```

Then open the [audio generation guide](/ai/docs/media/audio-generation) for the full adapter matrix, or pull the `ts-react-chat` example to see working TTS and transcription tabs plus a `/generations/audio` route covering Lyria and fal side by side.

**Star [TanStack AI on GitHub](https://github.com/TanStack/ai)** if you want to see where this goes next.
Loading