Problem: neither urllib2.quote nor urllib.quote encode the unicode strings arguments

Jerry Hill malaclypse2 at gmail.com
Fri Oct 3 19:37:18 EDT 2008


On Fri, Oct 3, 2008 at 5:38 PM, Valery Khamenya <khamenya at gmail.com> wrote:
> Hi all
> things like urllib.quote(u"пиво Müller ") fail with error message:
> <type 'exceptions.KeyError'>: u'\u043f'
> Similarly with urllib2.
> Anyone got a hint?? I need it to form the URI containing non-ascii chars.

Do you know what, exactly, you'd like the result to be?  The encoding
of unicode characters into URIs is not well defined.  My understanding
is that the most common case is to percent-encode UTF-8, like this:

>>> u = u"Müller"
>>> import urllib
>>> urllib.quote(u.encode('utf8'))
'M%C3%BCller'

If you need to, you can encode your unicode string differently, like this:

>>> urllib.quote(u.encode('latin-1'))
'M%FCller'

-- 
Jerry


More information about the Python-list mailing list