[Python-Dev] Bytes path support

"Martin v. Löwis" martin at v.loewis.de
Tue Aug 26 13:14:23 CEST 2014


Am 24.08.14 03:11, schrieb Greg Ewing:
> Isaac Morland wrote:
>> In HTML 5 it allows non-ASCII-compatible encodings as long as U+FEFF
>> (byte order mark) is used:
>>
>> http://www.w3.org/TR/html-markup/syntax.html#encoding-declaration
>>
>> Not sure about XML.
> 
> According to Appendix F here:
> 
> http://www.w3.org/TR/xml/#sec-guessing
> 
> an XML parser needs to be prepared to try all the encodings it
> supports until it finds one that works well enough to decode
> the XML declaration, then it can find out the exact encoding
> used.

That's not what this section says. Instead, it says that
you need to auto-detect UCS-4, UTF-16, UTF-8 from the BOM,
or guess them or EBCDIC from the encoding of '<?'. This should
be enough to actually parse the encoding declaration. Other
non-ASCII-compatible encodings can only be used if declared
in an upper-level protocol (such as HTTP).

The parser is not expected to try out all encodings it supports.

Regards,
Martin



More information about the Python-Dev mailing list