feat(mimo): adapt MiMo-V2.5-TTS series with voicedesign and voiceclone support by xiangyuw1 · Pull Request #8428 · AstrBotDevs/AstrBot

xiangyuw1 · 2026-05-29T22:50:26Z

Summary

This PR adapts the MiMo TTS provider to support the MiMo-V2.5-TTS series, including voicedesign and voiceclone models.

Changes

1. V2.5 Style Format Support

V2.5 models use parentheses （...） instead of <style>...</style> tags for style control
Added _is_v2_5() method to detect v2.5 models
Updated _build_style_prefix() to use the correct format based on model version

2. Voicedesign Model Support

Added mimo-tts-user-prompt config field for custom user prompts
Required for voicedesign models to describe the desired voice via natural language
Falls back to seed text for other models when user prompt is empty

3. Voiceclone Model Support

Added mimo-tts-voice-audio-path config field for voice audio file path
Reads audio file and encodes to DataURL format (data:audio/wav;base64,...) at runtime
Falls back to preset voice when no audio file is specified

4. Metadata and i18n Updates

Updated config metadata with new fields and descriptions
Added translations for zh-CN, en-US, and ru-RU locales
Updated hints to explain v2/v2.5 format differences

5. Tests

Added comprehensive tests for all new features
18 tests passing, covering v2.5 style, voicedesign, and voiceclone scenarios

Testing

All existing tests pass
New tests added for v2.5 style format, voicedesign user prompt, and voiceclone audio path
Tested with actual MiMo API calls

Summary by Sourcery

Adapt the MiMo TTS provider to support the MiMo-V2.5-TTS series, including new style formatting, voicedesign prompts, and voiceclone audio cloning, with corresponding config and test updates.

New Features:

Support MiMo V2.5 TTS models that use parentheses-based style and dialect prefixes instead of style tags.
Add custom user prompt configuration for voicedesign TTS models, falling back to seed text when absent.
Add voice cloning support for voiceclone TTS models by loading a configurable local audio file as the voice source, with fallback to preset voices.

Enhancements:

Extend default configuration and metadata with new MiMo TTS fields and hints describing v2 versus v2.5 formatting and model-specific options.
Update i18n config metadata for MiMo TTS options across zh-CN, en-US, and ru-RU locales.

Tests:

Add tests covering V2.5 style and singing behavior, voicedesign user prompt and seed text fallback, and voiceclone audio file and voice fallback handling.

…e support - Add v2.5 style format support: use parentheses （...） instead of <style> tags for v2.5 models - Add voicedesign model support: custom user prompt field for voice description - Add voiceclone model support: voice audio file path field with DataURL encoding - Update metadata and i18n translations (zh-CN, en-US, ru-RU) - Add comprehensive tests for all new features

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds support for MiMo TTS v2.5 series models, including voicedesign and voiceclone variants. The v2.5 models use parentheses （...） instead of <style>...</style> tags for style prefixes, accept a custom user prompt (required for voicedesign), and allow specifying a local audio file path for voice cloning.

Changes:

Updated mimo_tts_api_source.py to detect v2.5 models, route style prefix formatting, and add user-prompt / voice-audio handling.
Added two new config keys (mimo-tts-user-prompt, mimo-tts-voice-audio-path) with defaults, schema entries, and localized hints (zh-CN, en-US, ru-RU).
Added unit tests covering v2.5 style/singing parentheses, voicedesign user prompt fallback, and voiceclone audio base64 encoding.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
astrbot/core/provider/sources/mimo_tts_api_source.py	Core logic for v2.5 detection, parentheses style prefix, user prompt precedence, and voice audio reading.
astrbot/core/config/default.py	Adds new config defaults and schema metadata for the two new fields.
dashboard/src/i18n/locales/zh-CN/features/config-metadata.json	Updates Chinese hints and adds entries for new fields.
dashboard/src/i18n/locales/en-US/features/config-metadata.json	Updates English hints and adds entries for new fields.
dashboard/src/i18n/locales/ru-RU/features/config-metadata.json	Updates Russian hints and adds entries for new fields.
tests/test_mimo_api_sources.py	Tests for v2.5 parentheses, voicedesign, and voiceclone payload building.

+            if "voiceclone" in self.model_name:
+                voice_audio_b64 = self._read_voice_audio_base64()
+                if voice_audio_b64:
+                    audio_params["voice"] = voice_audio_b64


+    def _is_v2_5(self) -> bool:
+        """Check if the current model is a v2.5 series model."""
+        return "v2.5" in self.model_name


+    )
+    try:
+        payload = provider._build_payload("hello")
+        import base64


+        try:
+            suffix = path.suffix.lower().lstrip(".")
+            mime_map = {"wav": "audio/wav", "mp3": "audio/mpeg", "ogg": "audio/ogg"}
+            mime = mime_map.get(suffix, "audio/wav")


sourcery-ai

Hey - I've left some high level feedback:

Model-type checks are currently done via string containment (e.g., 'voicedesign' in self.model_name, 'voiceclone' in self.model_name, 'v2.5' in self.model_name); consider centralizing these into small helper methods (e.g., _is_voicedesign(), _is_voiceclone()) both to avoid typos and to make future model naming changes easier to accommodate.
In _read_voice_audio_base64, the bare except Exception can hide non-I/O bugs (e.g., programming errors); consider narrowing the exception type (e.g., to OSError/IOError) or re-raising unexpected exceptions after logging so that genuine issues are not silently swallowed.
For _read_voice_audio_base64, you might want to validate that the path refers to a regular file (path.is_file()) rather than only checking exists(), to avoid confusing behavior if a directory or special file is provided.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- Model-type checks are currently done via string containment (e.g., `'voicedesign' in self.model_name`, `'voiceclone' in self.model_name`, `'v2.5' in self.model_name`); consider centralizing these into small helper methods (e.g., `_is_voicedesign()`, `_is_voiceclone()`) both to avoid typos and to make future model naming changes easier to accommodate.
- In `_read_voice_audio_base64`, the bare `except Exception` can hide non-I/O bugs (e.g., programming errors); consider narrowing the exception type (e.g., to `OSError`/`IOError`) or re-raising unexpected exceptions after logging so that genuine issues are not silently swallowed.
- For `_read_voice_audio_base64`, you might want to validate that the path refers to a regular file (`path.is_file()`) rather than only checking `exists()`, to avoid confusing behavior if a directory or special file is provided.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

gemini-code-assist

Code Review

This pull request introduces support for MiMo TTS v2.5 models, custom user prompts, and voice cloning, along with corresponding configuration updates, localizations, and unit tests. The review feedback recommends defensively handling potentially null configuration values to avoid AttributeError crashes, caching the base64-encoded voice audio to prevent blocking the asyncio event loop with repeated disk reads, and using the standard mimetypes module for more robust MIME type detection.

gemini-code-assist · 2026-05-29T22:52:05Z

+        self.user_prompt = provider_config.get("mimo-tts-user-prompt", "")
+        self.voice_audio_path = provider_config.get("mimo-tts-voice-audio-path", "")


Defensively handle cases where mimo-tts-user-prompt or mimo-tts-voice-audio-path might be configured as None (e.g., when cleared in the UI or parsed from null JSON values). Using or "" ensures they are always initialized as strings, preventing potential AttributeError crashes when calling .strip() later. Also, initialize a cache variable for the base64-encoded voice audio to avoid repeated disk reads.

Suggested change

self.user_prompt = provider_config.get("mimo-tts-user-prompt", "")

self.voice_audio_path = provider_config.get("mimo-tts-voice-audio-path", "")

self.user_prompt = provider_config.get("mimo-tts-user-prompt") or ""

self.voice_audio_path = provider_config.get("mimo-tts-voice-audio-path") or ""

self._voice_audio_cache: str | None = None

gemini-code-assist · 2026-05-29T22:52:05Z

+    def _read_voice_audio_base64(self) -> str:
+        if not self.voice_audio_path.strip():
+            return ""
+        path = Path(self.voice_audio_path.strip())
+        if not path.exists():
+            logger.warning("Voice audio file not found: %s", path)
+            return ""
+        try:
+            suffix = path.suffix.lower().lstrip(".")
+            mime_map = {"wav": "audio/wav", "mp3": "audio/mpeg", "ogg": "audio/ogg"}
+            mime = mime_map.get(suffix, "audio/wav")
+            b64 = base64.b64encode(path.read_bytes()).decode("utf-8")
+            return f"data:{mime};base64,{b64}"
+        except Exception as exc:
+            logger.warning("Failed to read voice audio file %s: %s", path, exc)
+            return ""


Reading the voice audio file from disk and base64-encoding it on every single payload construction is highly inefficient and blocks the single-threaded asyncio event loop. Since the clone audio file is static for the lifetime of the provider, we should cache the base64-encoded result after the first read. Because this helper is a synchronous function, modifying the shared cache state is safe from race conditions in the single-threaded asyncio event loop.

Additionally, instead of hardcoding a limited set of audio formats in mime_map, we can use Python's standard library mimetypes module to dynamically and robustly guess the correct MIME type.

def _read_voice_audio_base64(self) -> str: if self._voice_audio_cache is not None: return self._voice_audio_cache if not self.voice_audio_path.strip(): self._voice_audio_cache = "" return "" path = Path(self.voice_audio_path.strip()) if not path.exists(): logger.warning("Voice audio file not found: %s", path) self._voice_audio_cache = "" return "" try: import mimetypes mime, _ = mimetypes.guess_type(str(path)) mime = mime or "audio/wav" b64 = base64.b64encode(path.read_bytes()).decode("utf-8") self._voice_audio_cache = f"data:{mime};base64,{b64}" return self._voice_audio_cache except Exception as exc: logger.warning("Failed to read voice audio file %s: %s", path, exc) self._voice_audio_cache = "" return ""

References

In a single-threaded asyncio event loop, synchronous functions (code blocks without 'await') are executed atomically and will not be interrupted by other coroutines. Therefore, they are safe from race conditions when modifying shared state within that block.

Copilot AI review requested due to automatic review settings May 29, 2026 22:50

dosubot Bot added size:M This PR changes 30-99 lines, ignoring generated files. area:provider The bug / feature is about AI Provider, Models, LLM Agent, LLM Agent Runner. labels May 29, 2026

Copilot AI reviewed May 29, 2026

View reviewed changes

sourcery-ai Bot reviewed May 29, 2026

View reviewed changes

gemini-code-assist Bot reviewed May 29, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(mimo): adapt MiMo-V2.5-TTS series with voicedesign and voiceclone support#8428

feat(mimo): adapt MiMo-V2.5-TTS series with voicedesign and voiceclone support#8428
xiangyuw1 wants to merge 1 commit into
AstrBotDevs:masterfrom
xiangyuw1:feat/mimo-v2.5-tts-adaptation

xiangyuw1 commented May 29, 2026 •

edited by sourcery-ai Bot

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

sourcery-ai Bot left a comment

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 29, 2026

Uh oh!

gemini-code-assist Bot May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		self.user_prompt = provider_config.get("mimo-tts-user-prompt", "")
		self.voice_audio_path = provider_config.get("mimo-tts-voice-audio-path", "")

-        self.user_prompt = provider_config.get("mimo-tts-user-prompt", "")
-        self.voice_audio_path = provider_config.get("mimo-tts-voice-audio-path", "")
+        self.user_prompt = provider_config.get("mimo-tts-user-prompt") or ""
+        self.voice_audio_path = provider_config.get("mimo-tts-voice-audio-path") or ""
+        self._voice_audio_cache: str | None = None

Uh oh!

Conversation

xiangyuw1 commented May 29, 2026 • edited by sourcery-ai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

1. V2.5 Style Format Support

2. Voicedesign Model Support

3. Voiceclone Model Support

4. Metadata and i18n Updates

5. Tests

Testing

Summary by Sourcery

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

sourcery-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 29, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 29, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

xiangyuw1 commented May 29, 2026 •

edited by sourcery-ai Bot

Loading