Skip to content

gh-148820: Fix _PyRawMutex use-after-free on spurious semaphore wakeup#148852

Merged
colesbury merged 1 commit intopython:mainfrom
colesbury:gh-148820-raw-mutex-park-intr
Apr 22, 2026
Merged

gh-148820: Fix _PyRawMutex use-after-free on spurious semaphore wakeup#148852
colesbury merged 1 commit intopython:mainfrom
colesbury:gh-148820-raw-mutex-park-intr

Conversation

@colesbury
Copy link
Copy Markdown
Contributor

@colesbury colesbury commented Apr 21, 2026

_PyRawMutex_UnlockSlow CAS-removes the waiter from the list and then calls _PySemaphore_Wakeup, with no handshake. If _PySemaphore_Wait returns Py_PARK_INTR, the waiter can destroy its stack-allocated semaphore before the unlocker's Wakeup runs, causing a fatal error from ReleaseSemaphore / sem_post.

Loop in _PyRawMutex_LockSlow until _PySemaphore_Wait returns Py_PARK_OK, which is only signalled when a matching Wakeup has been observed.

Also include GetLastError() and the handle in the Windows fatal messages in _PySemaphore_Init, _PySemaphore_Wait, and _PySemaphore_Wakeup to make similar races easier to diagnose in the future.

… wakeup

_PyRawMutex_UnlockSlow CAS-removes the waiter from the list and then
calls _PySemaphore_Wakeup, with no handshake. If _PySemaphore_Wait
returns Py_PARK_INTR, the waiter can destroy its stack-allocated
semaphore before the unlocker's Wakeup runs, causing a fatal error from
ReleaseSemaphore / sem_post.

Loop in _PyRawMutex_LockSlow until _PySemaphore_Wait returns Py_PARK_OK,
which is only signalled when a matching Wakeup has been observed.

Also include GetLastError() and the handle in the Windows fatal messages
in _PySemaphore_Init, _PySemaphore_Wait, and _PySemaphore_Wakeup to make
similar races easier to diagnose in the future.
@colesbury
Copy link
Copy Markdown
Contributor Author

I played around with adding a regression test for this, but I think it's trick to get right. You have to raise a SIGINT at just the right point. However, if you raise SIGINT too frequently you can cause a stack overflow in the main thread due to Python signal handlers being called within Python signal handlers...

@colesbury colesbury marked this pull request as ready for review April 22, 2026 00:58
@colesbury colesbury requested a review from mpage April 22, 2026 00:58
Copy link
Copy Markdown
Contributor

@mpage mpage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@colesbury colesbury merged commit ad3c5b7 into python:main Apr 22, 2026
66 checks passed
@miss-islington-app
Copy link
Copy Markdown

Thanks @colesbury for the PR 🌮🎉.. I'm working now to backport this PR to: 3.13, 3.14.
🐍🍒⛏🤖

@miss-islington-app
Copy link
Copy Markdown

Sorry, @colesbury, I could not cleanly backport this to 3.14 due to a conflict.
Please backport using cherry_picker on command line.

cherry_picker ad3c5b7958b890382f431a53349320cb7c84d405 3.14

@colesbury colesbury deleted the gh-148820-raw-mutex-park-intr branch April 22, 2026 18:31
@miss-islington-app
Copy link
Copy Markdown

Sorry, @colesbury, I could not cleanly backport this to 3.13 due to a conflict.
Please backport using cherry_picker on command line.

cherry_picker ad3c5b7958b890382f431a53349320cb7c84d405 3.13

@bedevere-app
Copy link
Copy Markdown

bedevere-app Bot commented Apr 22, 2026

GH-148884 is a backport of this pull request to the 3.14 branch.

@bedevere-app bedevere-app Bot removed the needs backport to 3.14 bugs and security fixes label Apr 22, 2026
@bedevere-app
Copy link
Copy Markdown

bedevere-app Bot commented Apr 22, 2026

GH-148885 is a backport of this pull request to the 3.13 branch.

@bedevere-app bedevere-app Bot removed the needs backport to 3.13 bugs and security fixes label Apr 22, 2026
colesbury added a commit that referenced this pull request Apr 22, 2026
…e wakeup (gh-148852) (#148885)

_PyRawMutex_UnlockSlow CAS-removes the waiter from the list and then
calls _PySemaphore_Wakeup, with no handshake. If _PySemaphore_Wait
returns Py_PARK_INTR, the waiter can destroy its stack-allocated
semaphore before the unlocker's Wakeup runs, causing a fatal error from
ReleaseSemaphore / sem_post.

Loop in _PyRawMutex_LockSlow until _PySemaphore_Wait returns Py_PARK_OK,
which is only signalled when a matching Wakeup has been observed.

Also include GetLastError() and the handle in the Windows fatal messages
in _PySemaphore_Init, _PySemaphore_Wait, and _PySemaphore_Wakeup to make
similar races easier to diagnose in the future.

(cherry picked from commit ad3c5b7)
colesbury added a commit that referenced this pull request Apr 22, 2026
…e wakeup (gh-148852) (#148884)

_PyRawMutex_UnlockSlow CAS-removes the waiter from the list and then
calls _PySemaphore_Wakeup, with no handshake. If _PySemaphore_Wait
returns Py_PARK_INTR, the waiter can destroy its stack-allocated
semaphore before the unlocker's Wakeup runs, causing a fatal error from
ReleaseSemaphore / sem_post.

Loop in _PyRawMutex_LockSlow until _PySemaphore_Wait returns Py_PARK_OK,
which is only signalled when a matching Wakeup has been observed.

Also include GetLastError() and the handle in the Windows fatal messages
in _PySemaphore_Init, _PySemaphore_Wait, and _PySemaphore_Wakeup to make
similar races easier to diagnose in the future.

(cherry picked from commit ad3c5b7)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants