[issue35748] urlparse library detecting wrong hostname leads to open redirect vulnerability

Steven D'Aprano report at bugs.python.org
Sat Jan 19 00:34:41 EST 2019


Steven D'Aprano <steve+python at pearwood.info> added the comment:

I believe that Python's behaviour here is correct. You are supplying a netloc which includes a username "www.google.com\" with no password. That might be what you intend to do, or it might be malicious data. That depends on context, and the urlparse module can't tell what the context is and has no reason to assume malice.

If I am reading this correctly:

https://tools.ietf.org/html/rfc1738#section-3.1

the colon after the username can be omitted, so the URL is legal and Python has returned the correct value for the netloc.

As Christian says, Python is not an end-user application like a browser. It is right and proper for a browser to expect that the user is non-technical and may not have noticed the @ sign, and to expect malicious behaviour, or to assume that backslash \ is a typo for forward slash / but Python programmers by definition are technical users and it is their responsibility to validate their data.

There are legitimate uses for the userinfo component (user:password at hostname) and it is not the library's responsibility to assume that backslashes are typos for forward slashes.

So I think that the behaviour here is correct, and this should be closed. But if you disagree, please explain what you think the library should do, and why. WHen you do, remember that:

* there are legitimate users for user:password at hostname;
* either the user name or the password can contain backslashes.

----------
nosy: +steven.daprano

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue35748>
_______________________________________


More information about the Python-bugs-list mailing list