feat(tools): add image caption fallback for FileReadTool when provider lacks image modality by kawayiYokami · Pull Request #8425 · AstrBotDevs/AstrBot

kawayiYokami · 2026-05-29T18:42:14Z

Motivation / 动机

当前 FileReadTool 读取图片文件时，直接返回 ImageContent 给 LLM。如果当前使用的 provider 不支持多模态（image modality），图片内容会被缓存但模型完全看不到图片内容，等于白读。

此 PR 增加了降级逻辑：

如果当前 provider 支持图片 → 行为不变，返回 ImageContent
如果不支持图片，但配置了图转文模型（default_image_caption_provider_id）→ 调用图转文模型生成文字描述返回
如果不支持图片，也没有配置图转文模型 → 返回明确的错误提示："您的供应商不支持多模态，并且当前没有配置图转文模型，无法读取图片"

Modifications / 改动点

修改 astrbot/core/tools/computer_tools/fs.py
- 新增 _provider_supports_image() 辅助函数：检查当前 provider 是否支持 image modality
- 新增 _caption_image_fallback() 辅助函数：调用配置的图转文 provider 生成图片描述
- 修改 FileReadTool.call()：在返回 ImageContent 前检查 provider 能力，必要时降级
This is NOT a breaking change. / 这不是一个破坏性变更。

Screenshots or Test Results / 运行截图或测试结果

11 passed (image test included), 3 pre-existing failures unrelated to this change

Checklist / 检查清单

My changes have been well-tested, and verification steps and screenshots have been provided above.
I have ensured that no new dependencies are introduced.
My changes do not introduce malicious code.

Summary by Sourcery

Add an image-captioning fallback for FileReadTool so image files remain usable when the active provider lacks image modality support.

New Features:

Provide automatic image captioning for image reads when a dedicated image caption provider is configured but the main provider does not support image modality.

Enhancements:

Detect provider image-modality support before returning ImageContent from FileReadTool and transparently degrade to text captions or clear error messages when unsupported.

…r lacks image modality

sourcery-ai

Hey - I've left some high level feedback:

In _provider_supports_image, swallowing all exceptions and defaulting to True can silently mask misconfigurations; consider at least logging the exception and/or defaulting to False when capability detection fails so that missing image modality is not accidentally treated as supported.
The broad except Exception in _caption_image_fallback will also catch programming errors (e.g., wrong types from text_chat); narrowing this to expected error types or re-raising after logging unexpected exceptions would make failures easier to diagnose.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- In `_provider_supports_image`, swallowing all exceptions and defaulting to `True` can silently mask misconfigurations; consider at least logging the exception and/or defaulting to `False` when capability detection fails so that missing image modality is not accidentally treated as supported.
- The broad `except Exception` in `_caption_image_fallback` will also catch programming errors (e.g., wrong types from `text_chat`); narrowing this to expected error types or re-raising after logging unexpected exceptions would make failures easier to diagnose.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

gemini-code-assist

Code Review

This pull request introduces a fallback mechanism to caption images when the active provider does not support image modality. The review comments point out a critical issue: passing the file path directly to the caption provider will fail in sandbox environments because the host cannot access the sandbox filesystem. The reviewer suggests extracting the base64 image data from the tool result and passing it as a data URI instead, and recommends adding unit tests for this new functionality.

gemini-code-assist · 2026-05-29T18:43:55Z

+async def _caption_image_fallback(
+    context: ContextWrapper[AstrAgentContext],
+    image_path: str,
+) -> ToolExecResult:
+    """Try to caption an image using the configured image caption provider.
+
+    Returns the caption text or an error message if no caption provider is available.
+    """
+    from astrbot.core.provider.provider import Provider
+
+    umo = context.context.event.unified_msg_origin
+    cfg = context.context.context.get_config(umo=umo)
+    provider_settings = cfg.get("provider_settings", {})
+    caption_provider_id = provider_settings.get("default_image_caption_provider_id", "")
+
+    if not caption_provider_id:
+        return (
+            "Error: your provider does not support image modality, "
+            "and no image caption provider is configured. Unable to read image file."
+        )
+
+    caption_provider = context.context.context.get_provider_by_id(caption_provider_id)
+    if caption_provider is None or not isinstance(caption_provider, Provider):
+        return (
+            "Error: your provider does not support image modality, "
+            f"and the configured image caption provider `{caption_provider_id}` is not available. "
+            "Unable to read image file."
+        )
+
+    caption_prompt = provider_settings.get(
+        "image_caption_prompt", "Please describe the image."
+    )
+
+    try:
+        llm_resp = await caption_provider.text_chat(
+            prompt=caption_prompt,
+            image_urls=[image_path],
+        )
+        caption = (llm_resp.completion_text or "").strip()
+        if not caption:
+            return "Error: image caption provider returned an empty description."
+        return f"[Image description]: {caption}"
+    except Exception as exc:
+        logger.error(f"Image captioning failed: {exc}")
+        return f"Error: failed to generate image description: {exc}"


In sandbox mode (local_env = False), the image file resides inside the sandbox container/environment, while the host running astrbot executes the provider's text_chat. Passing image_path (which is a sandbox path) directly to the provider will fail because the host cannot access the sandbox filesystem directly.

Since read_file_tool_result already reads and compresses the image into base64 format (returned as mcp.types.ImageContent), we should pass the base64 data URI (e.g., data:{mimeType};base64,{data}) to _caption_image_fallback instead of the file path. This avoids re-reading the file and works seamlessly in both local and sandbox environments.

async def _caption_image_fallback( context: ContextWrapper[AstrAgentContext], image_url: str, ) -> ToolExecResult: """Try to caption an image using the configured image caption provider. Returns the caption text or an error message if no caption provider is available. """ from astrbot.core.provider.provider import Provider umo = context.context.event.unified_msg_origin cfg = context.context.context.get_config(umo=umo) provider_settings = cfg.get("provider_settings", {}) caption_provider_id = provider_settings.get("default_image_caption_provider_id", "") if not caption_provider_id: return ( "Error: your provider does not support image modality, " "and no image caption provider is configured. Unable to read image file." ) caption_provider = context.context.context.get_provider_by_id(caption_provider_id) if caption_provider is None or not isinstance(caption_provider, Provider): return ( "Error: your provider does not support image modality, " f"and the configured image caption provider `{caption_provider_id}` is not available. " "Unable to read image file." ) caption_prompt = provider_settings.get( "image_caption_prompt", "Please describe the image." ) try: llm_resp = await caption_provider.text_chat( prompt=caption_prompt, image_urls=[image_url], ) caption = (llm_resp.completion_text or "").strip() if not caption: return "Error: image caption provider returned an empty description." return f"[Image description]: {caption}" except Exception as exc: logger.error(f"Image captioning failed: {exc}") return f"Error: failed to generate image description: {exc}"

gemini-code-assist · 2026-05-29T18:43:55Z

+            if (
+                isinstance(result, mcp.types.CallToolResult)
+                and result.content
+                and any(
+                    isinstance(item, mcp.types.ImageContent) for item in result.content
+                )
+                and not _provider_supports_image(context)
+            ):
+                return await _caption_image_fallback(context, normalized_path)
+
+            return result


Update the fallback call to extract the base64 image data from the CallToolResult and pass it as a data URI to _caption_image_fallback. This ensures compatibility with sandbox environments where the host cannot directly access the local file path. Additionally, please ensure this new attachment handling functionality is accompanied by corresponding unit tests.

Suggested change

if (

isinstance(result, mcp.types.CallToolResult)

and result.content

and any(

isinstance(item, mcp.types.ImageContent) for item in result.content

)

and not _provider_supports_image(context)

):

return await _caption_image_fallback(context, normalized_path)

return result

if (

isinstance(result, mcp.types.CallToolResult)

and result.content

and any(

isinstance(item, mcp.types.ImageContent) for item in result.content

)

and not _provider_supports_image(context)

):

image_item = next(

item for item in result.content if isinstance(item, mcp.types.ImageContent)

)

image_url = f"data:{image_item.mimeType};base64,{image_item.data}"

return await _caption_image_fallback(context, image_url)

return result

References

New functionality, such as handling attachments, should be accompanied by corresponding unit tests.

feat(tools): add image caption fallback for FileReadTool when provide…

5fffbb4

…r lacks image modality

dosubot Bot added size:M This PR changes 30-99 lines, ignoring generated files. area:core The bug / feature is about astrbot's core, backend labels May 29, 2026

sourcery-ai Bot reviewed May 29, 2026

View reviewed changes

gemini-code-assist Bot reviewed May 29, 2026

View reviewed changes

Soulter approved these changes May 30, 2026

View reviewed changes

dosubot Bot added the lgtm This PR has been approved by a maintainer label May 30, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(tools): add image caption fallback for FileReadTool when provider lacks image modality#8425

feat(tools): add image caption fallback for FileReadTool when provider lacks image modality#8425
kawayiYokami wants to merge 1 commit into
AstrBotDevs:masterfrom
kawayiYokami:feat/file-read-image-fallback

kawayiYokami commented May 29, 2026 •

edited by sourcery-ai Bot

Loading

Uh oh!

sourcery-ai Bot left a comment

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 29, 2026

Uh oh!

gemini-code-assist Bot May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

kawayiYokami commented May 29, 2026 • edited by sourcery-ai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation / 动机

Modifications / 改动点

Screenshots or Test Results / 运行截图或测试结果

Checklist / 检查清单

Summary by Sourcery

Uh oh!

sourcery-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 29, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 29, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kawayiYokami commented May 29, 2026 •

edited by sourcery-ai Bot

Loading