[New-bugs-announce] [issue40409] urllib.parse.urlsplit parses schemes that do not begin with letters

Samani Gikandi report at bugs.python.org
Mon Apr 27 15:59:49 EDT 2020


New submission from Samani Gikandi <samani at gojulas.com>:

RFC 3986 (STD66) says that a URL scheme should begin with an "letter", however urllib.parse.urlsplit (and urlparse) parse strings that don't adhere to this as valid schemes.

Example from Python3.8 using "+git+ssh://git@github.com/user/project.git":

>>> from urllib.parse import urlsplit, urlparse
>>> urlparse("+git+ssh://git@github.com/user/project.git")
ParseResult(scheme='+git+ssh', netloc='git at github.com', path='/user/project.git', params='', query='', fragment='')
>>> urlsplit("+git+ssh://git@github.com/user/project.git")
SplitResult(scheme='+git+ssh', netloc='git at github.com', path='/user/project.git', query='', fragment='')

I double checked this behavior and number of other languages (Rust, Go, Javascript, Ruby) all complain if you try to use parse this URL

For reference, RFC3986 section 3.1 --

Scheme names consist of a sequence of characters beginning with a
   letter and followed by any combination of letters, digits, plus
   ("+"), period ("."), or hyphen ("-"). 

   [...]

   scheme      = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )

----------
components: Library (Lib)
messages: 367452
nosy: sgg
priority: normal
severity: normal
status: open
title: urllib.parse.urlsplit parses schemes that do not begin with letters
type: behavior
versions: Python 3.5, Python 3.6, Python 3.7, Python 3.8, Python 3.9

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue40409>
_______________________________________


More information about the New-bugs-announce mailing list