deepset-ai · julian-risch · Jun 5, 2026 · Jun 4, 2026 · Jun 5, 2026
@@ -11,5 +11,6 @@ Use these components to work with audio in Haystack by transcribing files or con
 
 | Name                                                       | Description                                                                                   |
 | --- | --- |
+| [FunASRTranscriber](audio/funasrtranscriber.mdx)               | Transcribe audio files using FunASR — a local, open-source speech recognition toolkit supporting 50+ languages. |
 | [LocalWhisperTranscriber](audio/localwhispertranscriber.mdx)   | Transcribe audio files using OpenAI's Whisper model using your local installation of Whisper. |
 | [RemoteWhisperTranscriber](audio/remotewhispertranscriber.mdx) | Transcribe audio files using OpenAI's Whisper model.                                          |
@@ -0,0 +1,67 @@
+---
+title: "FunASRTranscriber"
+id: funasrtranscriber
+slug: "/funasrtranscriber"
+description: "Transcribe audio files to Documents using FunASR — a local, open-source speech recognition toolkit supporting 50+ languages."
+---
+
+# FunASRTranscriber
+
+Transcribe audio files to Haystack Documents using FunASR — a local, open-source speech recognition toolkit supporting 50+ languages.
+
+<div className="key-value-table">
+
+|  |  |
+| --- | --- |
+| **Most common position in a pipeline** | As the first component in an indexing pipeline |
+| **Mandatory run variables** | `sources`: A list of audio file paths (`str` or `Path`) or `ByteStream` objects |
+| **Output variables** | `documents`: A list of Haystack Documents, one per source, with transcript text in `content` |
+| **API reference** | [FunASR integration](/reference/integrations-funasr) |
+| **GitHub link** | https://github.com/deepset-ai/haystack-core-integrations/blob/main/integrations/funasr/src/haystack_integrations/components/audio/funasr/transcriber.py |
+
+</div>
+
+## Overview
+
+`FunASRTranscriber` uses [FunASR](https://github.com/modelscope/FunASR), an open-source speech recognition toolkit from Alibaba DAMO Academy, to transcribe audio files into Haystack `Document` objects. It runs entirely locally — no API key required.
+
+The default model is `iic/SenseVoiceSmall`, a multilingual model supporting 50+ languages that is 5–10x faster than Whisper. Models are downloaded from ModelScope on first use and cached in `~/.cache/modelscope`.
+
+The component accepts audio file paths (`str` or `Path`) as well as `ByteStream` objects. Call `warm_up()` before running in a pipeline to load the model into memory.
+
+## Usage
+
+### On its own
+
+```python
+from haystack_integrations.components.audio.funasr import FunASRTranscriber
+
+transcriber = FunASRTranscriber()
+transcriber.warm_up()
+
+result = transcriber.run(sources=["speech.wav"])
+print(result["documents"][0].content)
+```
+
+### In a pipeline
+
+```python
+from haystack import Pipeline
+from haystack.components.fetchers import LinkContentFetcher
+from haystack_integrations.components.audio.funasr import FunASRTranscriber
+
+pipe = Pipeline()
+pipe.add_component("fetcher", LinkContentFetcher())
+pipe.add_component("transcriber", FunASRTranscriber())
+
+pipe.connect("fetcher", "transcriber")
+
+result = pipe.run(
+    data={
+        "fetcher": {
+            "urls": ["https://example.com/interview.wav"],
+        },
+    }
+)
+print(result["transcriber"]["documents"][0].content)
+```
@@ -166,6 +166,7 @@ export default {
             id: 'pipeline-components/audio'
           },
           items: [
+            'pipeline-components/audio/funasrtranscriber',
             'pipeline-components/audio/localwhispertranscriber',
             'pipeline-components/audio/remotewhispertranscriber',
             'pipeline-components/audio/external-integrations-audio',

@@ -11,5 +11,6 @@ Use these components to work with audio in Haystack by transcribing files or con
 
 | Name                                                       | Description                                                                                   |
 | --- | --- |
+| [FunASRTranscriber](audio/funasrtranscriber.mdx)               | Transcribe audio files using FunASR — a local, open-source speech recognition toolkit supporting 50+ languages. |
 | [LocalWhisperTranscriber](audio/localwhispertranscriber.mdx)   | Transcribe audio files using OpenAI's Whisper model using your local installation of Whisper. |
 | [RemoteWhisperTranscriber](audio/remotewhispertranscriber.mdx) | Transcribe audio files using OpenAI's Whisper model.                                          |
@@ -0,0 +1,67 @@
+---
+title: "FunASRTranscriber"
+id: funasrtranscriber
+slug: "/funasrtranscriber"
+description: "Transcribe audio files to Documents using FunASR — a local, open-source speech recognition toolkit supporting 50+ languages."
+---
+
+# FunASRTranscriber
+
+Transcribe audio files to Haystack Documents using FunASR — a local, open-source speech recognition toolkit supporting 50+ languages.
+
+<div className="key-value-table">
+
+|  |  |
+| --- | --- |
+| **Most common position in a pipeline** | As the first component in an indexing pipeline |
+| **Mandatory run variables** | `sources`: A list of audio file paths (`str` or `Path`) or `ByteStream` objects |
+| **Output variables** | `documents`: A list of Haystack Documents, one per source, with transcript text in `content` |
+| **API reference** | [FunASR integration](/reference/integrations-funasr) |
+| **GitHub link** | https://github.com/deepset-ai/haystack-core-integrations/blob/main/integrations/funasr/src/haystack_integrations/components/audio/funasr/transcriber.py |
+
+</div>
+
+## Overview
+
+`FunASRTranscriber` uses [FunASR](https://github.com/modelscope/FunASR), an open-source speech recognition toolkit from Alibaba DAMO Academy, to transcribe audio files into Haystack `Document` objects. It runs entirely locally — no API key required.
+
+The default model is `iic/SenseVoiceSmall`, a multilingual model supporting 50+ languages that is 5–10x faster than Whisper. Models are downloaded from ModelScope on first use and cached in `~/.cache/modelscope`.
+
+The component accepts audio file paths (`str` or `Path`) as well as `ByteStream` objects. Call `warm_up()` before running in a pipeline to load the model into memory.
+
+## Usage
+
+### On its own
+
+```python
+from haystack_integrations.components.audio.funasr import FunASRTranscriber
+
+transcriber = FunASRTranscriber()
+transcriber.warm_up()
+
+result = transcriber.run(sources=["speech.wav"])
+print(result["documents"][0].content)
+```
+
+### In a pipeline
+
+```python
+from haystack import Pipeline
+from haystack.components.fetchers import LinkContentFetcher
+from haystack_integrations.components.audio.funasr import FunASRTranscriber
+
+pipe = Pipeline()
+pipe.add_component("fetcher", LinkContentFetcher())
+pipe.add_component("transcriber", FunASRTranscriber())
+
+pipe.connect("fetcher", "transcriber")
+
+result = pipe.run(
+    data={
+        "fetcher": {
+            "urls": ["https://example.com/interview.wav"],
+        },
+    }
+)
+print(result["transcriber"]["documents"][0].content)
+```
@@ -162,6 +162,7 @@
             "id": "pipeline-components/audio"
           },
           "items": [
+            "pipeline-components/audio/funasrtranscriber",
             "pipeline-components/audio/localwhispertranscriber",
             "pipeline-components/audio/remotewhispertranscriber",
             "pipeline-components/audio/external-integrations-audio"