[issue3991] urllib.request.urlopen does not handle non-ASCII characters

Bill Janssen report at bugs.python.org
Mon Sep 29 22:47:32 CEST 2008


Bill Janssen <bill.janssen at gmail.com> added the comment:

As I read RFC 2396,

1.5:  "A URI is a sequence of characters from a very
   limited set, i.e. the letters of the basic Latin alphabet, digits,
   and a few special characters."

2.4:  "Data must be escaped if it does not have a representation using an
   unreserved character; this includes data that does not correspond to
   a printable character of the US-ASCII coded character set, or that
   corresponds to any US-ASCII character that is disallowed, as
   explained below."

So your URL string is invalid.  You need to escape the characters properly.

(RFC 2396 is what the HTTP RFC cites as its authority on URLs.)

----------
nosy: +janssen

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue3991>
_______________________________________


More information about the Python-bugs-list mailing list