[issue35377] urlparse doesn't validate the scheme
Steven D'Aprano
report at bugs.python.org
Mon Dec 3 17:51:57 EST 2018
Steven D'Aprano <steve+python at pearwood.info> added the comment:
I'm changing the name to better describe the problem, and suggest a better solution.
The urlparse.urlsplit and .urlunsplit functions currently don't validate the scheme argument, if given. According to the RFC:
Scheme names consist of a sequence of characters. The lower case
letters "a"--"z", digits, and the characters plus ("+"), period
("."), and hyphen ("-") are allowed. For resiliency, programs
interpreting URLs should treat upper case letters as equivalent to
lower case in scheme names (e.g., allow "HTTP" as well as "http").
https://www.ietf.org/rfc/rfc1738.txt
If the scheme is specified, I suggest it should be normalised to lowercase and validated, something like this:
# untested
if scheme:
# scheme_chars already defined in module
badchars = set(scheme) - set(scheme_chars)
if badchars:
raise ValueError('"%c" is invalid in URL schemes' % badchars.pop())
scheme = scheme.lower()
This will help avoid errors such as passing 'http://' as the scheme.
----------
keywords: -patch
stage: patch review ->
title: urlsplit scheme argument broken -> urlparse doesn't validate the scheme
versions: +Python 3.8 -Python 2.7, Python 3.7
_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue35377>
_______________________________________
More information about the Python-bugs-list
mailing list