utf8 encoding problem

"Martin v. Löwis" martin at v.loewis.de
Sun Jan 25 03:58:11 EST 2004


Andrew Clover wrote:
> Quite so, in theory. Of course in reality, no browser today includes a
> Content-Type header in the subparts of a multipart/form-data submission,
> so there's nowhere to specify an charset here either! argh.

Right. In this case, the algorithm Wichert quotes should apply.

I once tried to study why browsers won't send Content-Type headers.
Actually, they *do* send Content-Type headers, but omit the charset=
parameter. I submitted various bug reports, and the Mozilla people
replied that they tried to, and found that various CGI scripts would
break when confronted with the standards-conforming request, but
work when they get the deprecated form.

So it looks like this situation will extend indefinitely.

> multipart/form-data as implemented in current UAs is just as encoding-unaware
> as application/x-www-form-urlencoded, sadly. In practical terms it does not
> really matter much which is used.

Right - for practical terms, standards don't matter much. As this thread 
shows, the form used *does* matter in practical terms though: Users
of application/x-www-form-urlencoded are now confronted with the
unescaping-then-decoding issue, which apparently is a challenge.

Regards,
Martin




More information about the Python-list mailing list