feat(tools): add image caption fallback for FileReadTool when provider lacks image modality#8425
Conversation
…r lacks image modality
There was a problem hiding this comment.
Hey - I've left some high level feedback:
- In
_provider_supports_image, swallowing all exceptions and defaulting toTruecan silently mask misconfigurations; consider at least logging the exception and/or defaulting toFalsewhen capability detection fails so that missing image modality is not accidentally treated as supported. - The broad
except Exceptionin_caption_image_fallbackwill also catch programming errors (e.g., wrong types fromtext_chat); narrowing this to expected error types or re-raising after logging unexpected exceptions would make failures easier to diagnose.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `_provider_supports_image`, swallowing all exceptions and defaulting to `True` can silently mask misconfigurations; consider at least logging the exception and/or defaulting to `False` when capability detection fails so that missing image modality is not accidentally treated as supported.
- The broad `except Exception` in `_caption_image_fallback` will also catch programming errors (e.g., wrong types from `text_chat`); narrowing this to expected error types or re-raising after logging unexpected exceptions would make failures easier to diagnose.Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
There was a problem hiding this comment.
Code Review
This pull request introduces a fallback mechanism to caption images when the active provider does not support image modality. The review comments point out a critical issue: passing the file path directly to the caption provider will fail in sandbox environments because the host cannot access the sandbox filesystem. The reviewer suggests extracting the base64 image data from the tool result and passing it as a data URI instead, and recommends adding unit tests for this new functionality.
| async def _caption_image_fallback( | ||
| context: ContextWrapper[AstrAgentContext], | ||
| image_path: str, | ||
| ) -> ToolExecResult: | ||
| """Try to caption an image using the configured image caption provider. | ||
|
|
||
| Returns the caption text or an error message if no caption provider is available. | ||
| """ | ||
| from astrbot.core.provider.provider import Provider | ||
|
|
||
| umo = context.context.event.unified_msg_origin | ||
| cfg = context.context.context.get_config(umo=umo) | ||
| provider_settings = cfg.get("provider_settings", {}) | ||
| caption_provider_id = provider_settings.get("default_image_caption_provider_id", "") | ||
|
|
||
| if not caption_provider_id: | ||
| return ( | ||
| "Error: your provider does not support image modality, " | ||
| "and no image caption provider is configured. Unable to read image file." | ||
| ) | ||
|
|
||
| caption_provider = context.context.context.get_provider_by_id(caption_provider_id) | ||
| if caption_provider is None or not isinstance(caption_provider, Provider): | ||
| return ( | ||
| "Error: your provider does not support image modality, " | ||
| f"and the configured image caption provider `{caption_provider_id}` is not available. " | ||
| "Unable to read image file." | ||
| ) | ||
|
|
||
| caption_prompt = provider_settings.get( | ||
| "image_caption_prompt", "Please describe the image." | ||
| ) | ||
|
|
||
| try: | ||
| llm_resp = await caption_provider.text_chat( | ||
| prompt=caption_prompt, | ||
| image_urls=[image_path], | ||
| ) | ||
| caption = (llm_resp.completion_text or "").strip() | ||
| if not caption: | ||
| return "Error: image caption provider returned an empty description." | ||
| return f"[Image description]: {caption}" | ||
| except Exception as exc: | ||
| logger.error(f"Image captioning failed: {exc}") | ||
| return f"Error: failed to generate image description: {exc}" |
There was a problem hiding this comment.
In sandbox mode (local_env = False), the image file resides inside the sandbox container/environment, while the host running astrbot executes the provider's text_chat. Passing image_path (which is a sandbox path) directly to the provider will fail because the host cannot access the sandbox filesystem directly.
Since read_file_tool_result already reads and compresses the image into base64 format (returned as mcp.types.ImageContent), we should pass the base64 data URI (e.g., data:{mimeType};base64,{data}) to _caption_image_fallback instead of the file path. This avoids re-reading the file and works seamlessly in both local and sandbox environments.
async def _caption_image_fallback(
context: ContextWrapper[AstrAgentContext],
image_url: str,
) -> ToolExecResult:
"""Try to caption an image using the configured image caption provider.
Returns the caption text or an error message if no caption provider is available.
"""
from astrbot.core.provider.provider import Provider
umo = context.context.event.unified_msg_origin
cfg = context.context.context.get_config(umo=umo)
provider_settings = cfg.get("provider_settings", {})
caption_provider_id = provider_settings.get("default_image_caption_provider_id", "")
if not caption_provider_id:
return (
"Error: your provider does not support image modality, "
"and no image caption provider is configured. Unable to read image file."
)
caption_provider = context.context.context.get_provider_by_id(caption_provider_id)
if caption_provider is None or not isinstance(caption_provider, Provider):
return (
"Error: your provider does not support image modality, "
f"and the configured image caption provider `{caption_provider_id}` is not available. "
"Unable to read image file."
)
caption_prompt = provider_settings.get(
"image_caption_prompt", "Please describe the image."
)
try:
llm_resp = await caption_provider.text_chat(
prompt=caption_prompt,
image_urls=[image_url],
)
caption = (llm_resp.completion_text or "").strip()
if not caption:
return "Error: image caption provider returned an empty description."
return f"[Image description]: {caption}"
except Exception as exc:
logger.error(f"Image captioning failed: {exc}")
return f"Error: failed to generate image description: {exc}"| if ( | ||
| isinstance(result, mcp.types.CallToolResult) | ||
| and result.content | ||
| and any( | ||
| isinstance(item, mcp.types.ImageContent) for item in result.content | ||
| ) | ||
| and not _provider_supports_image(context) | ||
| ): | ||
| return await _caption_image_fallback(context, normalized_path) | ||
|
|
||
| return result |
There was a problem hiding this comment.
Update the fallback call to extract the base64 image data from the CallToolResult and pass it as a data URI to _caption_image_fallback. This ensures compatibility with sandbox environments where the host cannot directly access the local file path. Additionally, please ensure this new attachment handling functionality is accompanied by corresponding unit tests.
| if ( | |
| isinstance(result, mcp.types.CallToolResult) | |
| and result.content | |
| and any( | |
| isinstance(item, mcp.types.ImageContent) for item in result.content | |
| ) | |
| and not _provider_supports_image(context) | |
| ): | |
| return await _caption_image_fallback(context, normalized_path) | |
| return result | |
| if ( | |
| isinstance(result, mcp.types.CallToolResult) | |
| and result.content | |
| and any( | |
| isinstance(item, mcp.types.ImageContent) for item in result.content | |
| ) | |
| and not _provider_supports_image(context) | |
| ): | |
| image_item = next( | |
| item for item in result.content if isinstance(item, mcp.types.ImageContent) | |
| ) | |
| image_url = f"data:{image_item.mimeType};base64,{image_item.data}" | |
| return await _caption_image_fallback(context, image_url) | |
| return result |
References
- New functionality, such as handling attachments, should be accompanied by corresponding unit tests.
Motivation / 动机
当前
FileReadTool读取图片文件时,直接返回ImageContent给 LLM。如果当前使用的 provider 不支持多模态(image modality),图片内容会被缓存但模型完全看不到图片内容,等于白读。此 PR 增加了降级逻辑:
ImageContentdefault_image_caption_provider_id)→ 调用图转文模型生成文字描述返回Modifications / 改动点
修改
astrbot/core/tools/computer_tools/fs.py_provider_supports_image()辅助函数:检查当前 provider 是否支持 image modality_caption_image_fallback()辅助函数:调用配置的图转文 provider 生成图片描述FileReadTool.call():在返回ImageContent前检查 provider 能力,必要时降级This is NOT a breaking change. / 这不是一个破坏性变更。
Screenshots or Test Results / 运行截图或测试结果
Checklist / 检查清单
Summary by Sourcery
Add an image-captioning fallback for FileReadTool so image files remain usable when the active provider lacks image modality support.
New Features:
Enhancements: