utf8 encoding problem

Erik Max Francis max at alcyone.com
Thu Jan 22 06:07:31 EST 2004


Wichert Akkerman wrote:

> I'm struggling with what should be a trivial problem but I can't seem
> to
> come up with a proper solution: I am working on a CGI that takes utf-8
> input from a browser. The input is nicely encoded so you get something
> like this:
> 
>   firstname=t%C3%A9s
> 
> where %C3CA9 is a single character in utf-8 encoding. Passing this
> through urllib.unquote does not help:
> 
>   >>> urllib.unquote(u't%C3%A9st')
>   u't%C3%A9st'

Unquote it as a normal string, then convert it to Unicode.

>>> import urllib
>>> x = 't%C3%A9s'
>>> y = urllib.unquote(x)
>>> y
't\xc3\xa9s'
>>> z = unicode(y, 'utf-8')
>>> z
u't\xe9s'

-- 
 __ Erik Max Francis && max at alcyone.com && http://www.alcyone.com/max/
/  \ San Jose, CA, USA && 37 20 N 121 53 W && &tSftDotIotE
\__/ I do not promise to consider race or religion in my appointments.
    I promise only that I will not consider them. -- John F. Kennedy



More information about the Python-list mailing list