[docs] [issue29651] Inconsistent/undocumented urlsplit/urlparse behavior on invalid inputs
Vasiliy Faronov
report at bugs.python.org
Sat Feb 25 14:45:27 EST 2017
New submission from Vasiliy Faronov:
There is a problem with the standard library's urlsplit and urlparse functions, in Python 2.7 (module urlparse) and 3.2+ (module urllib.parse).
The documentation for these functions [1] does not explain how they behave when given an invalid URL.
One could try invoking them manually and conclude that they tolerate anything thrown at them:
>>> urlparse('http:////::\\\\!!::!!++///')
ParseResult(scheme='http', netloc='', path='//::\\\\!!::!!++///',
params='', query='', fragment='')
>>> urlparse(os.urandom(32).decode('latin-1'))
ParseResult(scheme='', netloc='', path='\x7f¼â1gdä»6\x82', params='',
query='', fragment='\n\xadJ\x18+fli\x9cÛ\x9ak*ÄÅ\x02³F\x85Ç\x18')
Without studying the source code, it is impossible to know that there is a very narrow class of inputs on which they raise ValueError [2]:
>>> urlparse('http://[')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.5/urllib/parse.py", line 295, in urlparse
splitresult = urlsplit(url, scheme, allow_fragments)
File "/usr/lib/python3.5/urllib/parse.py", line 345, in urlsplit
raise ValueError("Invalid IPv6 URL")
ValueError: Invalid IPv6 URL
This could be viewed as a documentation issue. But it could also be viewed as an implementation issue. Instead of raising ValueError on those square brackets, urlsplit could simply consider them *invalid* parts of an RFC 3986 reg-name, and lump them into netloc, as it already does with other *invalid* characters:
>>> urlparse('http://\0\0æí\n/')
ParseResult(scheme='http', netloc='\x00\x00æí\n', path='/', params='',
query='', fragment='')
Note that the raising behavior was introduced in Python 2.7/3.2.
See also issue 8721 [3].
[1] https://docs.python.org/3/library/urllib.parse.html
[2] https://github.com/python/cpython/blob/e32ec93/Lib/urllib/parse.py#L406-L408
[3] http://bugs.python.org/issue8721
----------
assignee: docs at python
components: Documentation, Library (Lib)
messages: 288577
nosy: docs at python, vfaronov
priority: normal
severity: normal
status: open
title: Inconsistent/undocumented urlsplit/urlparse behavior on invalid inputs
type: behavior
versions: Python 2.7, Python 3.3, Python 3.4, Python 3.5, Python 3.6, Python 3.7
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue29651>
_______________________________________
More information about the docs
mailing list