[issue35377] urlparse doesn't validate the scheme

Steven D'Aprano report at bugs.python.org
Mon Dec 3 17:51:57 EST 2018


Steven D'Aprano <steve+python at pearwood.info> added the comment:

I'm changing the name to better describe the problem, and suggest a better solution.

The urlparse.urlsplit and .urlunsplit functions currently don't validate the scheme argument, if given. According to the RFC:

   Scheme names consist of a sequence of characters. The lower case
   letters "a"--"z", digits, and the characters plus ("+"), period
   ("."), and hyphen ("-") are allowed. For resiliency, programs
   interpreting URLs should treat upper case letters as equivalent to
   lower case in scheme names (e.g., allow "HTTP" as well as "http").

https://www.ietf.org/rfc/rfc1738.txt

If the scheme is specified, I suggest it should be normalised to lowercase and validated, something like this:

    # untested
    if scheme:
        # scheme_chars already defined in module
        badchars = set(scheme) - set(scheme_chars)
        if badchars:
            raise ValueError('"%c" is invalid in URL schemes' % badchars.pop())
        scheme = scheme.lower()


This will help avoid errors such as passing 'http://' as the scheme.

----------
keywords:  -patch
stage: patch review -> 
title: urlsplit scheme argument broken -> urlparse doesn't validate the scheme
versions: +Python 3.8 -Python 2.7, Python 3.7

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue35377>
_______________________________________


More information about the Python-bugs-list mailing list