Skip to content

fix: add timeout protection for event hook calls#8431

Open
leafliber wants to merge 2 commits into
AstrBotDevs:masterfrom
leafliber:fix/hook-timeout
Open

fix: add timeout protection for event hook calls#8431
leafliber wants to merge 2 commits into
AstrBotDevs:masterfrom
leafliber:fix/hook-timeout

Conversation

@leafliber
Copy link
Copy Markdown

@leafliber leafliber commented May 30, 2026

Modifications / 改动点

  1. 新增timeout参数用于配置单个事件钩子的超时时间,默认300秒
  2. 超时后会打印警告日志并跳过该钩子继续执行
  3. 优化日志输出,提前缓存插件名和处理器名减少重复计算
  4. 补全了函数文档注释

Fixes: #8430

  • This is NOT a breaking change. / 这不是一个破坏性变更。

Screenshots or Test Results / 运行截图或测试结果

uv run pyt
est tests/unit/test_session_lock.py -v 2>&1 | tail -30'
platform darwin -- Python 3.12.13, pytest-9.0.3, pluggy-1.6.0 -- /Users/cassia/AstrBot/.venv/bin/python
cachedir: .pytest_cache
rootdir: /Users/cassia/AstrBot
configfile: pyproject.toml
plugins: cov-7.1.0, asyncio-1.3.0, anyio-4.13.0
asyncio: mode=Mode.STRICT, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collecting ... collected 20 items

tests/unit/test_session_lock.py::TestSessionLockManagerBasic::test_init PASSED [ 5%]
tests/unit/test_session_lock.py::TestSessionLockManagerBasic::test_acquire_release_lock PASSED [ 10%]
tests/unit/test_session_lock.py::TestSessionLockManagerBasic::test_lock_is_reusable PASSED [ 15%]
tests/unit/test_session_lock.py::TestCrossLoopIsolation::test_different_loops_have_different_managers PASSED [ 20%]
tests/unit/test_session_lock.py::TestCrossLoopIsolation::test_locks_isolated_across_loops PASSED [ 25%]
tests/unit/test_session_lock.py::TestCrossLoopIsolation::test_same_loop_blocks_on_same_session PASSED [ 30%]
tests/unit/test_session_lock.py::TestConcurrency::test_concurrent_acquisitions_same_loop PASSED [ 35%]
tests/unit/test_session_lock.py::TestConcurrency::test_thread_safety_of_loop_manager_creation PASSED [ 40%]
tests/unit/test_session_lock.py::TestEventLoopCleanup::test_weakref_cleanup_on_loop_close PASSED [ 45%]
tests/unit/test_session_lock.py::TestEventLoopCleanup::test_access_after_loop_close_in_new_loop_works PASSED [ 50%]
tests/unit/test_session_lock.py::TestIssue5464::test_multiple_event_loops_no_cross_loop_error PASSED [ 55%]
tests/unit/test_session_lock.py::TestIssue5464::test_lock_object_not_shared_across_loops PASSED [ 60%]
tests/unit/test_session_lock.py::TestIssue5464::test_concurrent_access_same_session_different_loops PASSED [ 65%]
tests/unit/test_session_lock.py::TestEdgeCases::test_empty_session_id PASSED [ 70%]
tests/unit/test_session_lock.py::TestEdgeCases::test_special_characters_in_session_id PASSED [ 75%]
tests/unit/test_session_lock.py::TestEdgeCases::test_very_long_session_id PASSED [ 80%]
tests/unit/test_session_lock.py::TestEdgeCases::test_lock_not_held_after_context_exit PASSED [ 85%]
tests/unit/test_session_lock.py::TestEdgeCases::test_exception_during_lock PASSED [ 90%]
tests/unit/test_session_lock.py::TestEdgeCases::test_nested_lock_different_sessions PASSED [ 95%]
tests/unit/test_session_lock.py::TestEdgeCases::test_reentrant_lock_same_session PASSED [100%]

============================== 20 passed in 0.54s ==============================


Checklist / 检查清单

  • 😊 If there are new features added in the PR, I have discussed it with the authors through issues/emails, etc.
    / 如果 PR 中有新加入的功能,已经通过 Issue / 邮件等方式和作者讨论过。

  • 👀 My changes have been well-tested, and "Verification Steps" and "Screenshots" have been provided above.
    / 我的更改经过了良好的测试,并已在上方提供了“验证步骤”和“运行截图”

  • 🤓 I have ensured that no new dependencies are introduced, OR if new dependencies are introduced, they have been added to the appropriate locations in requirements.txt and pyproject.toml.
    / 我确保没有引入新依赖库,或者引入了新依赖库的同时将其添加到 requirements.txtpyproject.toml 文件相应位置。

  • 😮 My changes do not introduce malicious code.
    / 我的更改没有引入恶意代码。

Summary by Sourcery

Add per-hook timeout handling to event hook execution to prevent long-running handlers from blocking the pipeline.

New Features:

  • Introduce a configurable timeout parameter for individual event hook handlers with a default of 300 seconds.

Enhancements:

  • Improve event hook logging by caching plugin and handler names and extending function documentation.

Tests:

  • Add unit tests covering event hook timeout behavior, propagation control, logging, and argument passing.

@dosubot dosubot Bot added size:M This PR changes 30-99 lines, ignoring generated files. area:core The bug / feature is about astrbot's core, backend labels May 30, 2026
Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 2 issues, and left some high level feedback:

  • In call_event_hook, plugin_name and handler_name are defined inside the try but used afterward in the event.is_stopped() block; if an exception occurs before assignment (e.g., the assert ever fails), this will raise UnboundLocalError, so consider defining them before the try or guarding their use.
  • The new timeout logic treats any timeout <= 0 as ‘no timeout’; if you want to avoid surprising behavior for None or non-float values coming from configuration, you might want to normalize/validate timeout at the function boundary before the handler loop.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `call_event_hook`, `plugin_name` and `handler_name` are defined inside the `try` but used afterward in the `event.is_stopped()` block; if an exception occurs before assignment (e.g., the `assert` ever fails), this will raise `UnboundLocalError`, so consider defining them before the `try` or guarding their use.
- The new timeout logic treats any `timeout <= 0` as ‘no timeout’; if you want to avoid surprising behavior for `None` or non-float values coming from configuration, you might want to normalize/validate `timeout` at the function boundary before the handler loop.

## Individual Comments

### Comment 1
<location path="astrbot/core/pipeline/context_utils.py" line_range="122-123" />
<code_context>
+                    )
+            else:
+                await handler.handler(event, *args, **kwargs)
         except BaseException:
             logger.error(traceback.format_exc())

         if event.is_stopped():
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Catching `BaseException` here may unintentionally swallow cancellations and system-level exceptions.

This also catches `asyncio.CancelledError` and critical signals like `KeyboardInterrupt`, preventing proper task cancellation and potentially hiding serious failures. Consider catching `Exception` instead so normal handler errors are logged but cancellations and critical exceptions still propagate. If you must keep `BaseException`, explicitly re-raise `CancelledError` and `SystemExit` after logging.

```suggestion
        except Exception:
            logger.error(traceback.format_exc())
```
</issue_to_address>

### Comment 2
<location path="tests/unit/test_call_event_hook.py" line_range="61-76" />
<code_context>
+
+
+@pytest.mark.asyncio
+async def test_hook_timeout_skips_handler(mock_star_map, mock_handlers_registry):
+    async def slow_handler(*args, **kwargs):
+        await asyncio.sleep(10)
+
+    handler_md = _make_handler_metadata(slow_handler)
+    mock_handlers_registry.get_handlers_by_event_type = MagicMock(
+        return_value=[handler_md]
+    )
+    event = _make_event()
+
+    result = await call_event_hook(event, EventType.OnLLMRequestEvent, timeout=0.5)
+
+    assert result is False
+
+
</code_context>
<issue_to_address>
**suggestion (testing):** Consider avoiding long real sleeps in timeout tests to keep the suite fast and robust if timeout logic regresses.

Here, `slow_handler` uses `await asyncio.sleep(10)` and relies on `asyncio.wait_for` to interrupt it. If the timeout logic ever regresses, this will actually sleep for 10 seconds per test, which can slow or stall CI. Instead, consider waiting on an `asyncio.Event()` that is never set (or a very large sleep that is cancelled externally), e.g.:

```python
aSync def slow_handler(*args, **kwargs):
    event = asyncio.Event()
    await event.wait()  # only wait_for controls duration
```

This still exercises the timeout/cancellation behavior without depending on a long real sleep.

```suggestion
@pytest.mark.asyncio
async def test_hook_timeout_skips_handler(mock_star_map, mock_handlers_registry):
    async def slow_handler(*args, **kwargs):
        # Wait indefinitely; the test's timeout mechanism (e.g. asyncio.wait_for)
        # is responsible for cancelling this.
        event = asyncio.Event()
        await event.wait()

    handler_md = _make_handler_metadata(slow_handler)
    mock_handlers_registry.get_handlers_by_event_type = MagicMock(
        return_value=[handler_md]
    )
    event = _make_event()

    result = await call_event_hook(event, EventType.OnLLMRequestEvent, timeout=0.5)

    assert result is False


```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread astrbot/core/pipeline/context_utils.py
Comment thread tests/unit/test_call_event_hook.py
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a timeout protection mechanism for event hook execution in call_event_hook using asyncio.wait_for (defaulting to 300 seconds) and adds comprehensive unit tests. The review feedback highlights several important improvements: replacing except BaseException: with except Exception: to avoid swallowing critical system exceptions like asyncio.CancelledError; using .get() on star_map to prevent potential KeyError exceptions; renaming the timeout parameter to hook_timeout to avoid naming conflicts with **kwargs passed to hook handlers; and safely handling timeout=None to prevent a TypeError during comparison.

Comment thread astrbot/core/pipeline/context_utils.py
Comment thread astrbot/core/pipeline/context_utils.py
Comment thread astrbot/core/pipeline/context_utils.py Outdated
Comment thread astrbot/core/pipeline/context_utils.py Outdated
1. 提取默认超时时间为常量_DEFAULT_HOOK_TIMEOUT
2. 将timeout参数重命名为hook_timeout提升可读性
3. 添加参数合法性校验,非法值自动回退到默认超时
4. 更新所有测试用例适配参数变更
5. 新增测试用例验证非法超时参数的处理逻辑
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:core The bug / feature is about astrbot's core, backend size:M This PR changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] 添加超时保护:在会话锁内部调用的插件事件钩子(OnLLMRequestEvent 和 OnWaitingLLMRequestEvent)

1 participant