[issue43882] [security] urllib.parse should sanitize urls containing ASCII newline and tabs.

Mike Lissner report at bugs.python.org
Wed May 5 14:35:05 EDT 2021


Mike Lissner <mlissner at michaeljaylissner.com> added the comment:

> Instead of the patches as you see them, we could've raised an exception.

In my mind the definition of a valid URL is what browsers recognize. They're moving towards the WHATWG definition, and so too must we. 

If we make python raise an exception when a URL has a newline in the scheme (e..g: "htt\np"), we'd be raising exceptions for *valid* URLs as browsers define them. That doesn't seem right at all to me. I'd be frustrated to have to catch such an exception, and I'd wonder how to pass through valid exceptions without urlparse raising something.


> Making the output 'sanitized' means that invalid input is converted into valid output.  This goes against the principle of least surprise.

Well, not quite, right? The URLs this fixes *are* valid according to browsers. Browsers say these tabs and newlines are OK. 

----

I agree though that there's an issue with the approach of stripping input in a way that affects output. That doesn't seem right. 

I think the solution I'd favor (and I imagine what's coming in 43883) is to do this properly so that newlines are preserved in the output, but so that the scheme is also placed properly in the scheme attribute. 

So instead of this (from the initial report):

> In [9]: from urllib.parse import urlsplit
> In [10]: urlsplit("java\nscript:alert('bad')")
> Out[10]: SplitResult(scheme='', netloc='', path="java\nscript:alert('bad')", query='', fragment='')

We get something like this:

> In [10]: urlsplit("java\nscript:alert('bad')")
> Out[10]: SplitResult(scheme='java\nscript', netloc='', path="alert('bad')", query='', fragment='')

In other words, keep the funky characters and parse properly.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue43882>
_______________________________________


More information about the Python-bugs-list mailing list