diff --git a/public/blog-assets/tanstack-ai-audio-generation/header.png b/public/blog-assets/tanstack-ai-audio-generation/header.png new file mode 100644 index 000000000..68aea3162 Binary files /dev/null and b/public/blog-assets/tanstack-ai-audio-generation/header.png differ diff --git a/src/blog/tanstack-ai-audio-generation.md b/src/blog/tanstack-ai-audio-generation.md new file mode 100644 index 000000000..e26270118 --- /dev/null +++ b/src/blog/tanstack-ai-audio-generation.md @@ -0,0 +1,121 @@ +--- +title: 'TanStack AI Just Learned to Compose Music' +published: 2026-04-24 +excerpt: TanStack AI adds a new generateAudio activity with streaming, plus fal and Gemini Lyria adapters for music, sound effects, text-to-speech, and transcription. One typed API, any provider. +authors: + - Alem Tuzlak +--- + +![TanStack AI Just Learned to Compose Music](/blog-assets/tanstack-ai-audio-generation/header.png) + +The AI audio ecosystem is a mess. Gemini's Lyria wants a natural-language prompt and returns raw PCM you have to wrap in a RIFF header yourself. Fal hosts dozens of audio models where one wants `music_length_ms` in milliseconds, the next wants `seconds_total`, and most want plain `duration`. ElevenLabs has its own shape. Whisper has another. Every provider disagrees on whether you get a URL, a base64 blob, or a raw buffer. + +If you are shipping an AI product that needs music, sound effects, speech, or transcription, you end up writing the same boring glue code five times. + +**TanStack AI just removed that glue.** The latest release lands a full audio stack: a new `generateAudio` activity, streaming support, fal and Gemini Lyria adapters, and framework hooks for React, Solid, Vue, and Svelte. One typed API, any provider. + +Here is what shipped and why you should care. + +## One activity, any audio model + +The new `generateAudio()` activity sits alongside `generateImage`, `generateSpeech`, `generateVideo`, and `generateTranscription` in `@tanstack/ai`. It takes a text prompt, dispatches to whatever adapter you hand it, and returns a `GeneratedAudio` object with exactly one of `url` or `b64Json`. + +```typescript +import { generateAudio } from '@tanstack/ai' +import { geminiAudio } from '@tanstack/ai-gemini/adapters' + +const adapter = geminiAudio('lyria-3-pro-preview') + +const result = await generateAudio({ + adapter, + prompt: 'A cinematic orchestral piece with a rising string motif', +}) + +// result.audio is { url: string } | { b64Json: string } — exactly one, enforced by the type +``` + +Swap `geminiAudio` for `falAudio` and the exact same call generates music through MiniMax, DiffRhythm, Stable Audio 2.5, or any of the other models in fal's catalog. The adapter translates per-model details (like fal's `music_length_ms` vs `seconds_total` vs `duration` naming) so your app code never sees them. + +## Streaming, because audio generation takes seconds + +Music and SFX generation is slow. Lyria 3 Pro takes several seconds. Stable Audio takes longer. If you are building a UI, blocking the request the whole time is a bad experience. + +`generateAudio` now supports `stream: true`, returning an `AsyncIterable` you can pipe straight through `toServerSentEventsResponse()`: + +```typescript +export async function POST(req: Request) { + const { prompt } = await req.json() + + const stream = await generateAudio({ + adapter: falAudio('fal-ai/minimax-music/v2.6'), + prompt, + stream: true, + }) + + return toServerSentEventsResponse(stream) +} +``` + +The client receives progress events and the final audio over a single SSE connection, the same transport model already used by `generateImage` and `generateVideo`. No new infrastructure, no special-case code paths. + +## Framework hooks that feel like the others + +Every framework integration gets a new hook matching the existing media-hook shape: + +- `@tanstack/ai-react`: `useGenerateAudio` +- `@tanstack/ai-solid`: `useGenerateAudio` +- `@tanstack/ai-vue`: `useGenerateAudio` +- `@tanstack/ai-svelte`: `createGenerateAudio` + +The API is identical to `useGenerateImage` and friends: + +```tsx +import { useGenerateAudio } from '@tanstack/ai-react' + +function MusicGen() { + const { generate, result, isLoading, error, stop, reset } = useGenerateAudio({ + connection, + }) + + return ( + <> + + {isLoading && } + {result?.audio.url &&