fix: add timeout protection for event hook calls#8431
Conversation
There was a problem hiding this comment.
Hey - I've found 2 issues, and left some high level feedback:
- In
call_event_hook,plugin_nameandhandler_nameare defined inside thetrybut used afterward in theevent.is_stopped()block; if an exception occurs before assignment (e.g., theassertever fails), this will raiseUnboundLocalError, so consider defining them before thetryor guarding their use. - The new timeout logic treats any
timeout <= 0as ‘no timeout’; if you want to avoid surprising behavior forNoneor non-float values coming from configuration, you might want to normalize/validatetimeoutat the function boundary before the handler loop.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `call_event_hook`, `plugin_name` and `handler_name` are defined inside the `try` but used afterward in the `event.is_stopped()` block; if an exception occurs before assignment (e.g., the `assert` ever fails), this will raise `UnboundLocalError`, so consider defining them before the `try` or guarding their use.
- The new timeout logic treats any `timeout <= 0` as ‘no timeout’; if you want to avoid surprising behavior for `None` or non-float values coming from configuration, you might want to normalize/validate `timeout` at the function boundary before the handler loop.
## Individual Comments
### Comment 1
<location path="astrbot/core/pipeline/context_utils.py" line_range="122-123" />
<code_context>
+ )
+ else:
+ await handler.handler(event, *args, **kwargs)
except BaseException:
logger.error(traceback.format_exc())
if event.is_stopped():
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Catching `BaseException` here may unintentionally swallow cancellations and system-level exceptions.
This also catches `asyncio.CancelledError` and critical signals like `KeyboardInterrupt`, preventing proper task cancellation and potentially hiding serious failures. Consider catching `Exception` instead so normal handler errors are logged but cancellations and critical exceptions still propagate. If you must keep `BaseException`, explicitly re-raise `CancelledError` and `SystemExit` after logging.
```suggestion
except Exception:
logger.error(traceback.format_exc())
```
</issue_to_address>
### Comment 2
<location path="tests/unit/test_call_event_hook.py" line_range="61-76" />
<code_context>
+
+
+@pytest.mark.asyncio
+async def test_hook_timeout_skips_handler(mock_star_map, mock_handlers_registry):
+ async def slow_handler(*args, **kwargs):
+ await asyncio.sleep(10)
+
+ handler_md = _make_handler_metadata(slow_handler)
+ mock_handlers_registry.get_handlers_by_event_type = MagicMock(
+ return_value=[handler_md]
+ )
+ event = _make_event()
+
+ result = await call_event_hook(event, EventType.OnLLMRequestEvent, timeout=0.5)
+
+ assert result is False
+
+
</code_context>
<issue_to_address>
**suggestion (testing):** Consider avoiding long real sleeps in timeout tests to keep the suite fast and robust if timeout logic regresses.
Here, `slow_handler` uses `await asyncio.sleep(10)` and relies on `asyncio.wait_for` to interrupt it. If the timeout logic ever regresses, this will actually sleep for 10 seconds per test, which can slow or stall CI. Instead, consider waiting on an `asyncio.Event()` that is never set (or a very large sleep that is cancelled externally), e.g.:
```python
aSync def slow_handler(*args, **kwargs):
event = asyncio.Event()
await event.wait() # only wait_for controls duration
```
This still exercises the timeout/cancellation behavior without depending on a long real sleep.
```suggestion
@pytest.mark.asyncio
async def test_hook_timeout_skips_handler(mock_star_map, mock_handlers_registry):
async def slow_handler(*args, **kwargs):
# Wait indefinitely; the test's timeout mechanism (e.g. asyncio.wait_for)
# is responsible for cancelling this.
event = asyncio.Event()
await event.wait()
handler_md = _make_handler_metadata(slow_handler)
mock_handlers_registry.get_handlers_by_event_type = MagicMock(
return_value=[handler_md]
)
event = _make_event()
result = await call_event_hook(event, EventType.OnLLMRequestEvent, timeout=0.5)
assert result is False
```
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
There was a problem hiding this comment.
Code Review
This pull request introduces a timeout protection mechanism for event hook execution in call_event_hook using asyncio.wait_for (defaulting to 300 seconds) and adds comprehensive unit tests. The review feedback highlights several important improvements: replacing except BaseException: with except Exception: to avoid swallowing critical system exceptions like asyncio.CancelledError; using .get() on star_map to prevent potential KeyError exceptions; renaming the timeout parameter to hook_timeout to avoid naming conflicts with **kwargs passed to hook handlers; and safely handling timeout=None to prevent a TypeError during comparison.
1. 提取默认超时时间为常量_DEFAULT_HOOK_TIMEOUT 2. 将timeout参数重命名为hook_timeout提升可读性 3. 添加参数合法性校验,非法值自动回退到默认超时 4. 更新所有测试用例适配参数变更 5. 新增测试用例验证非法超时参数的处理逻辑
Modifications / 改动点
Fixes: #8430
Screenshots or Test Results / 运行截图或测试结果
uv run pyt
est tests/unit/test_session_lock.py -v 2>&1 | tail -30'
platform darwin -- Python 3.12.13, pytest-9.0.3, pluggy-1.6.0 -- /Users/cassia/AstrBot/.venv/bin/python
cachedir: .pytest_cache
rootdir: /Users/cassia/AstrBot
configfile: pyproject.toml
plugins: cov-7.1.0, asyncio-1.3.0, anyio-4.13.0
asyncio: mode=Mode.STRICT, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collecting ... collected 20 items
tests/unit/test_session_lock.py::TestSessionLockManagerBasic::test_init PASSED [ 5%]
tests/unit/test_session_lock.py::TestSessionLockManagerBasic::test_acquire_release_lock PASSED [ 10%]
tests/unit/test_session_lock.py::TestSessionLockManagerBasic::test_lock_is_reusable PASSED [ 15%]
tests/unit/test_session_lock.py::TestCrossLoopIsolation::test_different_loops_have_different_managers PASSED [ 20%]
tests/unit/test_session_lock.py::TestCrossLoopIsolation::test_locks_isolated_across_loops PASSED [ 25%]
tests/unit/test_session_lock.py::TestCrossLoopIsolation::test_same_loop_blocks_on_same_session PASSED [ 30%]
tests/unit/test_session_lock.py::TestConcurrency::test_concurrent_acquisitions_same_loop PASSED [ 35%]
tests/unit/test_session_lock.py::TestConcurrency::test_thread_safety_of_loop_manager_creation PASSED [ 40%]
tests/unit/test_session_lock.py::TestEventLoopCleanup::test_weakref_cleanup_on_loop_close PASSED [ 45%]
tests/unit/test_session_lock.py::TestEventLoopCleanup::test_access_after_loop_close_in_new_loop_works PASSED [ 50%]
tests/unit/test_session_lock.py::TestIssue5464::test_multiple_event_loops_no_cross_loop_error PASSED [ 55%]
tests/unit/test_session_lock.py::TestIssue5464::test_lock_object_not_shared_across_loops PASSED [ 60%]
tests/unit/test_session_lock.py::TestIssue5464::test_concurrent_access_same_session_different_loops PASSED [ 65%]
tests/unit/test_session_lock.py::TestEdgeCases::test_empty_session_id PASSED [ 70%]
tests/unit/test_session_lock.py::TestEdgeCases::test_special_characters_in_session_id PASSED [ 75%]
tests/unit/test_session_lock.py::TestEdgeCases::test_very_long_session_id PASSED [ 80%]
tests/unit/test_session_lock.py::TestEdgeCases::test_lock_not_held_after_context_exit PASSED [ 85%]
tests/unit/test_session_lock.py::TestEdgeCases::test_exception_during_lock PASSED [ 90%]
tests/unit/test_session_lock.py::TestEdgeCases::test_nested_lock_different_sessions PASSED [ 95%]
tests/unit/test_session_lock.py::TestEdgeCases::test_reentrant_lock_same_session PASSED [100%]
============================== 20 passed in 0.54s ==============================
Checklist / 检查清单
😊 If there are new features added in the PR, I have discussed it with the authors through issues/emails, etc.
/ 如果 PR 中有新加入的功能,已经通过 Issue / 邮件等方式和作者讨论过。
👀 My changes have been well-tested, and "Verification Steps" and "Screenshots" have been provided above.
/ 我的更改经过了良好的测试,并已在上方提供了“验证步骤”和“运行截图”。
🤓 I have ensured that no new dependencies are introduced, OR if new dependencies are introduced, they have been added to the appropriate locations in
requirements.txtandpyproject.toml./ 我确保没有引入新依赖库,或者引入了新依赖库的同时将其添加到
requirements.txt和pyproject.toml文件相应位置。😮 My changes do not introduce malicious code.
/ 我的更改没有引入恶意代码。
Summary by Sourcery
Add per-hook timeout handling to event hook execution to prevent long-running handlers from blocking the pipeline.
New Features:
Enhancements:
Tests: