Reject URLs with multiple brackets in host component#1661
Reject URLs with multiple brackets in host component#1661rodrigobnogueira wants to merge 2 commits intoaio-libs:masterfrom
Conversation
5156ddf to
702ae36
Compare
Fixes host-confusion parsing where URLs containing multiple bracket characters in the authority (e.g. http://[:localhost[]].google:80) were silently canonicalized to an unintended host. Both split_url() and split_netloc() now raise ValueError when: - more than one '[' or ']' appears in the netloc/hostinfo, or - '[' does not start the host subcomponent (per RFC 3986 IP-literal) Adds 7 regression tests covering the affected code paths.
|
Just a heads-up regarding the CI failure: the This is a known upstream incompatibility between |
2195606 to
a52a3dd
Compare
for more information, see https://pre-commit.ci
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #1661 +/- ##
=======================================
Coverage 99.47% 99.48%
=======================================
Files 30 30
Lines 5942 5988 +46
Branches 283 286 +3
=======================================
+ Hits 5911 5957 +46
Misses 22 22
Partials 9 9
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
What do these changes do?
Fixes a parsing edge case where URLs containing multiple bracket characters in the host authority component were silently canonicalized to an unintended host.
For example,
split_netlocusedstr.partitionto extract content between the first[and first]. When more than one bracket pair was present, this picked up content like:localhost[which contains a colon and passed the IPv6 content check — ultimately resolving tolocalhostafter bracket-stripping in the encoder.Both
split_url()andsplit_netloc()now validate that:[and one]appear in the authority/hostinfo, and[starts the host subcomponent (per RFC 3986IP-literalgrammar)This is a companion fix to #1654, which addressed text before a single bracket pair; this addresses multiple bracket pairs.
Are there changes in behavior for the user?
Yes — previously accepted malformed URLs like
http://[:localhost[]].google:80now raiseValueError: Invalid IPv6 URL, consistent with other malformed IPv6 literals.Related issues and pull requests
Complements #1654.