[Python-Dev] PEP 263 considered faulty (for some Japanese)

Jason Orendorff jason@jorendorff.com
Wed, 13 Mar 2002 06:27:46 -0600


Fredrik Lundh wrote:
> which reminds me: the HTTP protocol says that a charset specified
> at the HTTP protocol level should override any encoding specified in
> the document itself.

I believe HTTP (RFC 2616) rather meekly asserts that the HTTP
Content-Type header *always* defines the encoding of the body.
If no charset is specified, the body is ISO-8859-1.

I believe this requirement is ignored in practice.  HTTP servers
don't correctly label outgoing documents, and HTTP clients ignore
whatever the HTTP server says.

Browsers usually search HTML documents for <meta> and XML documents
for <?xml encoding=?>, and I think they always prefer a document's
internal mark to what the HTTP headers say.  (Anyone know for sure?)

Just another charset headache.

## Jason Orendorff    http://www.jorendorff.com/