[Python-Dev] Can the cgi module be made Unicode-aware?

Skip Montanaro skip@pobox.com
Fri, 12 Apr 2002 17:54:47 -0500


    Alex> http://www.w3.org/TR/REC-html40/interact/forms.html#h-17.3

    Martin> The same document (at #submit-format) also explains that
    Martin> application/x-www-form-urlencoded only supports ASCII, so S=
kip
    Martin> shouldn't be too surprised that his form fails for non-ASCI=
I
    Martin> text.

Are you misinterpreting what part has to be ASCII?  If I submit a form
containing the word

    lei=DF

it appears that the last letter is not encoded as ß before being
urlencoded.  Instead, the bytes that represent that character in the de=
sired
encoding are encoded using the usual % notation.  For example, if the
charset is Latin-1, the encoded string is "lei%DF", not "lei%26%23223%3=
B".
That may not be the correct way to do it, but the meager empirical evid=
ence
I was able to gather from Mozilla and Opera suggests that's how it's do=
ne.

Skip