[issue3991] urllib.request.urlopen does not handle non-ASCII characters

Toshio Kuratomi report at bugs.python.org
Tue Sep 30 20:11:24 CEST 2008


Toshio Kuratomi <a.badger at gmail.com> added the comment:

The purpose of such a function would be to take something that is not a
valid uri but 1) is a common way of expressing the way to get to the
resource and 2) follows certain rules and turns that into something that
is a valid uri.  non-ASCii strings in the path are a good example of
this since there is a well defined method to encode the strings into the
URL if you are given a character encoding to apply to it.

My first, naive thought is that if the input can be parsed by
urlparse(), then there is a very good chance that we have the ability to
escape the string properly.  Looking at the invalid uri that I gave, for
instance, if you additionally specified an encoding for the path element
there's no reason a function couldn't do the escaping.

What are example inputs that you are concerned about?  I'll see if I can
come up with code that works with them.

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue3991>
_______________________________________


More information about the Python-bugs-list mailing list