[issue19451] urlparse accepts invalid hostnames

Terry J. Reedy report at bugs.python.org
Sat Nov 2 00:39:40 CET 2013


Terry J. Reedy added the comment:

The 3.4 urllib.parse.urlparse doc says "The module has been designed to match the Internet RFC on Relative Uniform Resource Locators. It supports the following URL schemes: <list of 24, including 'file:'>".

To me, 'support' means 'accept every valid URL for the particular scheme' but not necessarily 'reject every URL that is invalid for the particular scheme'.

The other RFCs references are these: 
"Following the syntax specifications in RFC 1808, urlparse recognizes a netloc only if it is properly introduced by ‘//’." and
" The fragment is now parsed for all URL schemes (unless allow_fragment is false), in accordance with RFC 3986."

I currently see this, at best, as a request to deprecate 'over-acceptance', to be removed in the future. But if there are urls in the wild that use _s, then practicality says that this should be closed as invalid.

----------
nosy: +terry.reedy
type: behavior -> enhancement
versions:  -Python 2.6, Python 2.7, Python 3.1, Python 3.2, Python 3.3, Python 3.5

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue19451>
_______________________________________


More information about the Python-bugs-list mailing list