[Python-Dev] Bytes path support

Marko Rauhamaa marko at pacujo.net
Sat Aug 23 11:46:34 CEST 2014


Isaac Morland <ijmorlan at uwaterloo.ca>:

>>  HTTP/1.1 200 OK
>>  Content-Type: text/html; charset=ISO-8859-1
>>
>>  <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
>>  <html>
>>  <head>
>>  <meta http-equiv="Content-Type" content="text/html; charset=utf-16">
>
> For HTML it's not quite so bad.  According to the HTML 4 standard:
> [...]
>
> The Content-Type header takes precedence over a <meta> element. I
> thought I read once that the reason was to allow proxy servers to
> transcode documents but I don't have a cite for that. Also, the <meta>
> element "must only be used when the character encoding is organized
> such that ASCII-valued bytes stand for ASCII characters" so the
> initial UTF-16 example wouldn't be conformant in HTML.

That's not how I read it:

   The META declaration must only be used when the character encoding is
   organized such that ASCII characters stand for themselves (at least
   until the META element is parsed). META declarations should appear as
   early as possible in the HEAD element.

   <URL: http://www.w3.org/TR/1998/REC-html40-19980424/charset.ht
   ml#doc-char-set>

IOW, you must obey the HTTP character encoding until you have parsed a
conflicting META content-type declaration.

The author of the standard keeps a straight face and continues:

   For cases where neither the HTTP protocol nor the META element
   provides information about the character encoding of a document, HTML
   also provides the charset attribute on several elements. By combining
   these mechanisms, an author can greatly improve the chances that,
   when the user retrieves a resource, the user agent will recognize the
   character encoding.


Marko


More information about the Python-Dev mailing list