[issue34276] urllib.parse doesn't round-trip file URI's with multiple leading slashes

Mon Jul 30 09:10:24 EDT 2018

Martin Panter <vadmium+py at gmail.com> added the comment:

This may be a very old regression (from 2002) caused by Issue 591713 and Mercurial rev. 554f975073a0. The original check for the double slash, added in 0d6bd391acd8, “escapes” a path beginning with a double slash by prefixing it with two more slashes (empty “netloc”). This should round-trip Chris’s problem URLs.

I think the logic in “urlsplit” should always add the extra double slash for the netloc, regardless of path, at least if a scheme is present and it is registered in “uses_netloc”. This should fix Chris’s instance of the bug, since “file:” is registered. There is already a patch in Issue 1722348 which should do this (although it includes other changes as well).

The double slash should also be escaped if no scheme is present. (The empty scheme string is already in “uses_netloc”.) This might satisfy Issue 23505.

IMO it would be better to do the escaping by default, for all schemes unknown to “urllib”, and to blacklist specific schemes like “mailto:” instead. But that would be out of scope for a bug fix.

----------
nosy: +martin.panter

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue34276>
_______________________________________