utf8 encoding problem

Wichert Akkerman wichert at wiggy.net
Thu Jan 22 09:16:28 EST 2004


Previously Denis S. Otkidach wrote:
> You have to pass 8-bit string, but not unicode.  The following
> code works as expected:
> 
> >>> urllib.unquote('t%C3%A9st').decode('utf-8')
> u't\xe9st'

Ah, that does work indeed, thanks.

> P.S. According to HTML standard, with
> application/x-www-form-urlencoded content type form data are
> resricted to ASCII codes:
> http://www.w3.org/TR/html4/interact/forms.html#form-data-set
> http://www.w3.org/TR/html4/interact/forms.html#submit-format

Luckily that is not true, otherwise it would be completely impossible to
have websites using non-ascii input. To be specific, the encoding used
for HTML forms is determined by:

1. accept-charset attribute of the form element if present. This is
   not handled by all browsers though.
2. the encoding used for the html page containing the form
3. ascii otherwise

this is specified in section 17.3 of the HTML 4.01 standard you are
referring to. 

Wichert.

-- 
Wichert Akkerman <wichert at wiggy.net>    It is simple to make things.
http://www.wiggy.net/                   It is hard to make things simple.





More information about the Python-list mailing list