Fix anyio patch: narrow to zombie-scope skip (preserve self-delivery + re-delivery)#13
Merged
Merged
Conversation
Previous patch skipped three task categories — done, current, _must_cancel — and only set should_retry=True on actual cancel delivery. Two of those skips break legitimate anyio cancellation: * Skipping current_task strands deferred self-delivery. The pattern `with CancelScope() as s: s.cancel(); await sleep(N)` relies on a call_soon reschedule firing on the next tick (when current_task() is None) to actually cancel the host task. The wider patch refused to reschedule from the synchronous pass and the await ran to completion. * Skipping _must_cancel strands re-delivery. anyio's contract is to keep redelivering until the scope exits; tasks that swallow the first CancelledError need a second one. The wider patch dropped should_retry on the _must_cancel pass and the second delivery never came (Artem's review on ClickHouse#128). The done-task skip is the only deviation the production observation actually requires (April 24: three zombie-scopes, 55k epoll_pwait/sec). Narrow the patch to just that — every other branch is byte-for-byte upstream anyio 4.13.0 — so deferred self-delivery and re-delivery behave exactly like upstream. Tests: * Drop the two stub tests that enshrined the broken behavior (current task → no retry; _must_cancel → no retry). * Add three real-anyio regression tests covering the patterns Artem identified: self-cancel+await, task cancels own TG scope+await, swallowed CancelledError re-delivery. * Add a mixed done+live stub test asserting the done-skip doesn't short-circuit re-delivery for other tasks in the same scope. 10/10 anyio tests pass. Full suite: 1300 passed, 2 skipped, 0 failed. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Addresses the review feedback on ClickHouse/nerve#128. The previous monkeypatch was too aggressive — it skipped three task categories (done, current,
_must_cancel) and only setshould_retry=Truewhen it actually calledtask.cancel(). Two of those skips broke legitimate anyio cancellation:Deferred self-delivery:
with anyio.CancelScope() as s: s.cancel(); await sleep(N)ran the full sleep instead of cancelling immediately.s.cancel()calls_deliver_cancellationsynchronously whilecurrent_task()points at the host task; anyio relies on acall_soonreschedule firing on the next tick (whencurrent_task()is None) to land the cancel. Skipping the current task withoutshould_retry=Truekilled that reschedule.Re-delivery after swallowed CancelledError: anyio's contract is to keep redelivering until the scope exits. Skipping
_must_canceltasks withoutshould_retry=Truestranded any task that caught the firstCancelledErrorand looped.Fix
nerve/_anyio_patch.pyis now byte-for-byte identical to upstream anyio 4.13.0CancelScope._deliver_cancellationexcept for oneif task.done(): continueat the top of the loop body, beforeshould_retry = Trueis set. That's the only deviation the production observation actually requires — the April 24 zombie-scope shape (three simultaneous scopes, ~55k epoll_pwait/sec combined, 100% CPU on one core, each containing a single done task).Module docstring rewritten to explain the narrower scope and why the wider skips were reverted.
Tests
tests/test_anyio_patch.py:test_no_hot_loop_when_only_task_is_current,test_no_hot_loop_when_current_task_has_must_cancel) — they enshrined the now-removed broken behavior.test_self_cancel_then_await_sleep_cancels_immediately—with CancelScope() as s: s.cancel(); await sleep(5)returns in <0.5s.test_task_cancels_own_taskgroup_scope_then_awaits— same shape via TaskGroup; awaited sleep must not run to completion.test_swallowed_cancelled_error_is_redelivered— task catches firstCancelledError, loops, gets redelivered (verified ≥3 deliveries insidefail_after(2.0)).test_live_task_still_reschedules_alongside_done_task— a mixed done +_must_cancel-flagged stub scope must still reschedule (proves the done-skip doesn't short-circuit re-delivery for other tasks).test_no_hot_loop_when_only_task_is_done) and the three "upstream behavior still works" tests (fail_after,move_on_after, task-group cancel).Verification
pytest tests/test_anyio_patch.py -v— 10/10 pass.pytest -q— 1300 passed, 2 skipped, 0 failed.Notes for the upstream PR
Phase 1: this PR opens against the fork (
constkolesnyak/nerve:main) for review. Phase 2 (separate) will push the same change toupstream/anyio-cpu-fixand updateClickHouse/nerve#128.🤖 Generated with Claude Code