[issue22852] urllib.parse wrongly strips empty #fragment

Fri Mar 13 01:41:00 CET 2015

Martin Panter added the comment:

There have been a few recent bug reports (Issue 23505, Issue 23636) that may be solved by the has_netloc proposal. So I am posting a patch implementing it. The changes were a bit more involved than I anticipated, but should still be usable.

I reused some of Stian’s tests, however the results are slightly different in my patch, matching the existing behaviour:

* Never sets netloc, query, fragment to None
* Always leaves hostname as None rather than ""
* Retains username, password and port components in netloc
* Converts hostname to lowercase

Unfortunately I discovered that you cannot add __slots__ to namedtuple() subclasses; see Issue 17295 and Issue 1173475. Therefore in my patch I have removed __slots__ from the SplitResult etc classes, so that those classes can gain the has_netloc etc attributes.

I chose to make the default has_netloc value based on existing urlunsplit() behaviour:

>>> empty_netloc = ""
>>> SplitResult("mailto", empty_netloc, "chris at example.com", "", "").has_netloc
False
>>> SplitResult("file", empty_netloc, "/path", "", "").has_netloc
True

I found out that the “urllib.robotparser” module uses a urlunparse(urlparse()) combination to normalize URLs, so had to be changed. This is a backwards incompatibility of this proposal.

----------
keywords: +patch
type:  -> enhancement
Added file: http://bugs.python.org/file38465/has_netloc.patch

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue22852>
_______________________________________