utf8 encoding problem

Denis S. Otkidach ods at strana.ru
Thu Jan 22 08:47:00 EST 2004


On Thu, 22 Jan 2004, Wichert Akkerman wrote:

WA> I'm struggling with what should be a trivial problem but I
WA> can't seem to
WA> come up with a proper solution: I am working on a CGI that
WA> takes utf-8
WA> input from a browser. The input is nicely encoded so you get
WA> something
WA> like this:
WA>
WA>   firstname=t%C3%A9s
WA>
WA> where %C3CA9 is a single character in utf-8 encoding.
WA> Passing this
WA> through urllib.unquote does not help:
WA>
WA>   >>> urllib.unquote(u't%C3%A9st')
WA>   u't%C3%A9st'

You have to pass 8-bit string, but not unicode.  The following
code works as expected:

>>> urllib.unquote('t%C3%A9st').decode('utf-8')
u't\xe9st'

P.S. According to HTML standard, with
application/x-www-form-urlencoded content type form data are
resricted to ASCII codes:
http://www.w3.org/TR/html4/interact/forms.html#form-data-set
http://www.w3.org/TR/html4/interact/forms.html#submit-format

-- 
Denis S. Otkidach
http://www.python.ru/      [ru]





More information about the Python-list mailing list