[Python-Dev] Encoding detection in the standard library?

Tue Apr 22 20:06:16 CEST 2008

> When a web browser POSTs data, there is no standard way of communicating
> which encoding it's using.

That's just not true. Web browser should and do use the encoding of the
web page that originally contained the form.

> There are some hints which make it easier
> (accept-charset attributes, the encoding used to send the page to the
> browser), but no guarantees.

Not true. The latter is guaranteed (unless you assume bugs - but if
you do, can you present a specific browser that has that bug?)

> Email is a smaller problem, because it usually has a helpful
> content-type header, but that's no guarantee.

Then assume windows-1252. Mailers who don't use MIME for non-ASCII
characters mostly died 10 years ago; those people who continue to
use them likely can accept occasional moji-bake (or else they would
have switched long ago).

> Now, at the moment, the only data I have to support this claim is my
> experience with DrProject in non-English locations.
> If I'm the only one who has had these sorts of problems, I'll go back to
> "Unicode for Dummies".

For web forms, I always encode the pages in UTF-8, and that always
works.

For email, I once added encoding processing to the pipermail (the
mailman archiver), and that also always works.

> I'll go back and take another look at the problem, then come back if new
> revelations appear.

Good luck!

Martin