Optimize _WS_EXT_RE backtracking on Python 3.11+#12346
Optimize _WS_EXT_RE backtracking on Python 3.11+#12346HarshithReddy01 wants to merge 8 commits intoaio-libs:masterfrom
Conversation
Use an atomic group for Python 3.11+ in websocket extension parsing and add focused tests to validate behavior and guard against worst-case backtracking regressions.
Include contributor attribution and a misc changelog note for the websocket extension regex optimization change.
for more information, see https://pre-commit.ci
Merging this PR will not alter performance
Comparing Footnotes
|
❌ 12 Tests Failed:
View the top 3 failed test(s) by shortest run time
View the full list of 1 ❄️ flaky test(s)
To view more test analytics, go to the Test Analytics Dashboard |
|
The Test (ubuntu, 3.14t, false) failure is pre-existing on master and unrelated to this PR, it's test_parse_set_cookie_headers_uses_unquote_with_octal failing due to Python 3.14t's new strict control character check in http.cookies.Morsel. The fail-fast strategy cancelled all other jobs. Happy to rerun CI once confirmed. |
Yeah, it's a security patch to 3.14 that's broken it. We'll need to sort a separate PR for that shortly. |
|
Seems to have broken something on PyPy... |
|
@HarshithReddy01 Any idea on why PyPy is not parsing these correctly? Maybe we need the condition to also exclude PyPy as well as Python 3.10. |
What do these changes do?
This change optimizes WebSocket extension parsing in
aiohttp/_websocket/helpers.pyby using an atomic outer group for Python 3.11+ in_WS_EXT_RE.The previous pattern used a repeating outer non-atomic group:
(?:;\s*(?:...))*.With many valid extension tokens followed by an invalid suffix, regex matching could spend extra time backtracking across prior iterations before failing.
On Python 3.11+, the outer group is changed to atomic
(?>...)*, which prevents that backtracking path while preserving the same matching intent.On Python 3.10 and lower, behavior is unchanged because atomic groups are not supported there.
Are there changes in behavior for the user?
No user-facing behavior change is intended.
Accepted/rejected extension strings remain the same for valid inputs.
This is a performance-oriented change focused on reducing worst-case backtracking on failing matches.
Is it a substantial burden for the maintainers to support this?
No.
The implementation is a small version-gated regex compile-time branch, with both variants kept structurally identical except for the atomic-group syntax on 3.11+.
Related issue number
Discussed in GHSA-qhr8-wxhx-9q9w (closed).
This PR is submitted as a performance improvement, not a security fix/CVE claim.
Checklist
CONTRIBUTORS.txtCHANGES/folder