-
-
Notifications
You must be signed in to change notification settings - Fork 347
docs(blog): TanStack AI Just Learned to Compose Music #854
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
LadyBluenotes
merged 2 commits into
TanStack:main
from
AlemTuzlak:blog/tanstack-ai-audio-generation
Apr 24, 2026
Merged
Changes from all commits
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,121 @@ | ||
| --- | ||
| title: 'TanStack AI Just Learned to Compose Music' | ||
| published: 2026-04-24 | ||
| excerpt: TanStack AI adds a new generateAudio activity with streaming, plus fal and Gemini Lyria adapters for music, sound effects, text-to-speech, and transcription. One typed API, any provider. | ||
| authors: | ||
| - Alem Tuzlak | ||
| --- | ||
|
|
||
|  | ||
|
|
||
| The AI audio ecosystem is a mess. Gemini's Lyria wants a natural-language prompt and returns raw PCM you have to wrap in a RIFF header yourself. Fal hosts dozens of audio models where one wants `music_length_ms` in milliseconds, the next wants `seconds_total`, and most want plain `duration`. ElevenLabs has its own shape. Whisper has another. Every provider disagrees on whether you get a URL, a base64 blob, or a raw buffer. | ||
|
|
||
| If you are shipping an AI product that needs music, sound effects, speech, or transcription, you end up writing the same boring glue code five times. | ||
|
|
||
| **TanStack AI just removed that glue.** The latest release lands a full audio stack: a new `generateAudio` activity, streaming support, fal and Gemini Lyria adapters, and framework hooks for React, Solid, Vue, and Svelte. One typed API, any provider. | ||
|
|
||
| Here is what shipped and why you should care. | ||
|
|
||
| ## One activity, any audio model | ||
|
|
||
| The new `generateAudio()` activity sits alongside `generateImage`, `generateSpeech`, `generateVideo`, and `generateTranscription` in `@tanstack/ai`. It takes a text prompt, dispatches to whatever adapter you hand it, and returns a `GeneratedAudio` object with exactly one of `url` or `b64Json`. | ||
|
|
||
| ```typescript | ||
| import { generateAudio } from '@tanstack/ai' | ||
| import { geminiAudio } from '@tanstack/ai-gemini/adapters' | ||
|
|
||
| const adapter = geminiAudio('lyria-3-pro-preview') | ||
|
|
||
| const result = await generateAudio({ | ||
| adapter, | ||
| prompt: 'A cinematic orchestral piece with a rising string motif', | ||
| }) | ||
|
|
||
| // result.audio is { url: string } | { b64Json: string } — exactly one, enforced by the type | ||
| ``` | ||
|
|
||
| Swap `geminiAudio` for `falAudio` and the exact same call generates music through MiniMax, DiffRhythm, Stable Audio 2.5, or any of the other models in fal's catalog. The adapter translates per-model details (like fal's `music_length_ms` vs `seconds_total` vs `duration` naming) so your app code never sees them. | ||
|
|
||
| ## Streaming, because audio generation takes seconds | ||
|
|
||
| Music and SFX generation is slow. Lyria 3 Pro takes several seconds. Stable Audio takes longer. If you are building a UI, blocking the request the whole time is a bad experience. | ||
|
|
||
| `generateAudio` now supports `stream: true`, returning an `AsyncIterable<StreamChunk>` you can pipe straight through `toServerSentEventsResponse()`: | ||
|
|
||
| ```typescript | ||
| export async function POST(req: Request) { | ||
| const { prompt } = await req.json() | ||
|
|
||
| const stream = await generateAudio({ | ||
| adapter: falAudio('fal-ai/minimax-music/v2.6'), | ||
| prompt, | ||
| stream: true, | ||
| }) | ||
|
|
||
| return toServerSentEventsResponse(stream) | ||
| } | ||
| ``` | ||
|
|
||
| The client receives progress events and the final audio over a single SSE connection, the same transport model already used by `generateImage` and `generateVideo`. No new infrastructure, no special-case code paths. | ||
|
|
||
| ## Framework hooks that feel like the others | ||
|
|
||
| Every framework integration gets a new hook matching the existing media-hook shape: | ||
|
|
||
| - `@tanstack/ai-react`: `useGenerateAudio` | ||
| - `@tanstack/ai-solid`: `useGenerateAudio` | ||
| - `@tanstack/ai-vue`: `useGenerateAudio` | ||
| - `@tanstack/ai-svelte`: `createGenerateAudio` | ||
|
|
||
| The API is identical to `useGenerateImage` and friends: | ||
|
|
||
| ```tsx | ||
| import { useGenerateAudio } from '@tanstack/ai-react' | ||
|
|
||
| function MusicGen() { | ||
| const { generate, result, isLoading, error, stop, reset } = useGenerateAudio({ | ||
| connection, | ||
| }) | ||
|
|
||
| return ( | ||
| <> | ||
| <button onClick={() => generate({ prompt: 'Lo-fi hip-hop beat' })}> | ||
| Generate | ||
| </button> | ||
| {isLoading && <button onClick={stop}>Stop</button>} | ||
| {result?.audio.url && <audio src={result.audio.url} controls />} | ||
| </> | ||
| ) | ||
| } | ||
| ``` | ||
|
|
||
| Both `connection` (SSE) and `fetcher` (plain HTTP) transports are supported, so this works with TanStack Start, Next.js, Remix, or any backend you already have. | ||
|
|
||
| ## Providers that shipped in this release | ||
|
|
||
| **Gemini** gets two new entry points: | ||
|
|
||
| - `geminiAudio()` for Lyria 3 Pro and Lyria 3 Clip music generation. Lyria Pro reads duration from the natural-language prompt; Clip is fixed at 30 seconds and returns MP3. | ||
| - A new `gemini-3.1-flash-tts-preview` TTS model with 70+ languages, 200+ audio tags, and multi-speaker dialogue via `multiSpeakerVoiceConfig`. | ||
|
|
||
| **Fal** gets three tree-shakeable adapters: | ||
|
|
||
| - `falSpeech()` for TTS via `fal-ai/gemini-3.1-flash-tts`, `fal-ai/minimax/speech-2.6-hd`, and the `fal-ai/kokoro/*` family. | ||
| - `falTranscription()` for STT via `fal-ai/whisper`, `fal-ai/wizper`, and `fal-ai/speech-to-text/turbo`. | ||
| - `falAudio()` for music, SFX, and the wider fal catalog: audio-to-audio, voice conversion and cloning, enhancement, separation, isolation, understanding, and merge. | ||
|
|
||
| All four follow the tree-shakeable subpath-import pattern, so your bundle only grows by the adapters you actually import. | ||
|
|
||
| ## Try it | ||
|
|
||
| The new activity is live in `@tanstack/ai` and the two provider packages: | ||
|
|
||
| ```bash | ||
| pnpm add @tanstack/ai @tanstack/ai-fal | ||
| # or | ||
| pnpm add @tanstack/ai @tanstack/ai-gemini | ||
| ``` | ||
|
|
||
| Then open the [audio generation guide](/ai/docs/media/audio-generation) for the full adapter matrix, or pull the `ts-react-chat` example to see working TTS and transcription tabs plus a `/generations/audio` route covering Lyria and fal side by side. | ||
|
|
||
| **Star [TanStack AI on GitHub](https://github.com/TanStack/ai)** if you want to see where this goes next. | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🌐 Web query:
In TypeScript, for a union type like { url: string } | { b64Json: string }, isresult.audio.urlvalid without narrowing? Please provide canonical docs/examples.💡 Result:
No, result.audio.url is not valid without narrowing on a union type like { url: string } | { b64Json: string } for result.audio. TypeScript only allows access to properties common to ALL members of a union type. Here, url exists only on the first member, while b64Json exists only on the second—there are no common properties beyond structural compatibility. Attempting audio.url produces the error: "Property 'url' does not exist on type '{ url: string; } | { b64Json: string; }". This prevents runtime errors if audio is the { b64Json: string } variant. To access url safely, narrow the type first using a type guard, such as the in operator: if ('url' in result.audio) { // TypeScript narrows to { url: string } console.log(result.audio.url); // Valid } else { // TypeScript narrows to { b64Json: string } console.log(result.audio.b64Json); } For better ergonomics with multiple variants, use discriminated unions by adding a literal discriminant property (e.g., kind: 'url' | 'b64'): type Audio = { kind: 'url'; url: string } | { kind: 'b64'; b64Json: string }; Then narrow with switch (result.audio.kind) { ... }. This behavior is unchanged as of TypeScript 5.x (current in 2026).
Citations:
🏁 Script executed:
Repository: TanStack/tanstack.com
Length of output: 2685
Fix TypeScript type error in the React snippet
Line 86 accesses
result.audio.urldirectly on the union type declared on line 34 ({ url: string } | { b64Json: string }). TypeScript does not allow this without narrowing—the code will fail type-checking with "Property 'url' does not exist on type '{ url: string; } | { b64Json: string; }'" becauseurlis not a common property across all union members.Narrow the type using the
inoperator before accessing the property:Fix
📝 Committable suggestion
🤖 Prompt for AI Agents