Feature Request
Description
Add audio, speech, and transcription adapters to the @tanstack/ai-fal package. The fal adapter currently supports image and video generation, but fal's platform also offers hundreds of models spanning:
- Text-to-Speech (e.g.,
fal-ai/kokoro — multi-language TTS)
- Audio generation (music, sound effects, audio-to-audio, voice-change, voice-clone, audio enhancement, separation, isolation, merge, understanding — e.g.
fal-ai/stable-audio-25, which generates both music and sound effects)
- Speech-to-Text (e.g.,
fal-ai/whisper, fal-ai/wizper — transcription)
Motivation
TanStack AI's fal adapter (@tanstack/ai-fal) currently implements falImage and falVideo adapters following the tree-shakeable adapter pattern. Adding audio/speech/transcription adapters would complete fal's media coverage and align with TanStack AI's goal of being a comprehensive, provider-agnostic AI SDK.
Proposed API
A single broad falAudio adapter rather than a music/sfx split, because fal's audio catalog doesn't cleave along those lines — dozens of audio-to-audio, voice, enhancement, and understanding models live alongside music/SFX generators, and individual models (e.g. stable-audio-25) span both music and sound effects.
import { falAudio, falSpeech, falTranscription } from '@tanstack/ai-fal/adapters'
// Audio generation (music, sound effects, audio-to-audio, etc.)
const audioAdapter = falAudio('fal-ai/stable-audio-25')
// Text-to-Speech
const speechAdapter = falSpeech('fal-ai/kokoro')
// Speech-to-Text
const transcriptionAdapter = falTranscription('fal-ai/whisper')
Additional Context
- Existing adapters use
fal.subscribe() (image) and fal.queue (video) patterns
- Audio generation may use either pattern depending on model latency
- The fal SDK (
@fal-ai/client) already supports audio responses with File/Audio output types
- Model metadata types in
model-meta.ts are extended for audio, speech, and transcription models
Note (2026-04-22)
An earlier iteration briefly split this into separate falMusic / falSoundEffects adapters and matching generateMusic / generateSoundEffects activities. That split was reverted once fal's full audio catalog (screenshot: dozens of audio-to-audio, voice-change, voice-clone, enhancement, separation, isolation, understanding, merge-audios models) made clear the music/SFX binary is too narrow. A single generateAudio activity and falAudio adapter better match the reality.
Feature Request
Description
Add audio, speech, and transcription adapters to the
@tanstack/ai-falpackage. The fal adapter currently supports image and video generation, but fal's platform also offers hundreds of models spanning:fal-ai/kokoro— multi-language TTS)fal-ai/stable-audio-25, which generates both music and sound effects)fal-ai/whisper,fal-ai/wizper— transcription)Motivation
TanStack AI's fal adapter (
@tanstack/ai-fal) currently implementsfalImageandfalVideoadapters following the tree-shakeable adapter pattern. Adding audio/speech/transcription adapters would complete fal's media coverage and align with TanStack AI's goal of being a comprehensive, provider-agnostic AI SDK.Proposed API
A single broad
falAudioadapter rather than a music/sfx split, because fal's audio catalog doesn't cleave along those lines — dozens of audio-to-audio, voice, enhancement, and understanding models live alongside music/SFX generators, and individual models (e.g.stable-audio-25) span both music and sound effects.Additional Context
fal.subscribe()(image) andfal.queue(video) patterns@fal-ai/client) already supports audio responses withFile/Audiooutput typesmodel-meta.tsare extended for audio, speech, and transcription modelsNote (2026-04-22)
An earlier iteration briefly split this into separate
falMusic/falSoundEffectsadapters and matchinggenerateMusic/generateSoundEffectsactivities. That split was reverted once fal's full audio catalog (screenshot: dozens of audio-to-audio, voice-change, voice-clone, enhancement, separation, isolation, understanding, merge-audios models) made clear the music/SFX binary is too narrow. A singlegenerateAudioactivity andfalAudioadapter better match the reality.