utf8 encoding problem
Denis S. Otkidach
ods at strana.ru
Thu Jan 22 08:47:00 EST 2004
On Thu, 22 Jan 2004, Wichert Akkerman wrote:
WA> I'm struggling with what should be a trivial problem but I
WA> can't seem to
WA> come up with a proper solution: I am working on a CGI that
WA> takes utf-8
WA> input from a browser. The input is nicely encoded so you get
WA> something
WA> like this:
WA>
WA> firstname=t%C3%A9s
WA>
WA> where %C3CA9 is a single character in utf-8 encoding.
WA> Passing this
WA> through urllib.unquote does not help:
WA>
WA> >>> urllib.unquote(u't%C3%A9st')
WA> u't%C3%A9st'
You have to pass 8-bit string, but not unicode. The following
code works as expected:
>>> urllib.unquote('t%C3%A9st').decode('utf-8')
u't\xe9st'
P.S. According to HTML standard, with
application/x-www-form-urlencoded content type form data are
resricted to ASCII codes:
http://www.w3.org/TR/html4/interact/forms.html#form-data-set
http://www.w3.org/TR/html4/interact/forms.html#submit-format
--
Denis S. Otkidach
http://www.python.ru/ [ru]
More information about the Python-list
mailing list