utf8 encoding problem

Denis S. Otkidach ods at strana.ru
Thu Jan 22 09:50:53 EST 2004


On Thu, 22 Jan 2004, Wichert Akkerman wrote:

WA> > P.S. According to HTML standard, with
WA> > application/x-www-form-urlencoded content type form data
WA> are
WA> > resricted to ASCII codes:
WA> >
WA> http://www.w3.org/TR/html4/interact/forms.html#form-data-set
WA> >
WA> http://www.w3.org/TR/html4/interact/forms.html#submit-format
WA>
WA> Luckily that is not true, otherwise it would be completely
WA> impossible to

It's true: "The "get" method restricts form data set values to
ASCII characters. Only the "post" method (with
enctype="multipart/form-data") is specified to cover the entire
[ISO10646] character set."

But almost nobody follow this rule.

WA> have websites using non-ascii input. To be specific, the

No, you can use "post" method with multipart/form-data content
type to do that.

WA> encoding used
WA> for HTML forms is determined by:
WA>
WA> 1. accept-charset attribute of the form element if present.
WA> This is
WA>    not handled by all browsers though.

accept-charset attribute can list several encodings.  Which one
to choose?

WA> 2. the encoding used for the html page containing the form

Pages are often re-coded by proxies, so the encoding of submitted
data is not always is one as assumed by form processing script.
The same issue apply for cross-site forms.

WA> 3. ascii otherwise

This is the only reliable encoding :)

WA> this is specified in section 17.3 of the HTML 4.01 standard
WA> you are
WA> referring to.

Sorry, there is no such specification in section 17.3 of the HTML
4.01 standard.  Certainly, your method to determine encoding is
OK for most cases.

-- 
Denis S. Otkidach
http://www.python.ru/      [ru]





More information about the Python-list mailing list