[Python-Dev] Bytes path support
"Martin v. Löwis"
martin at v.loewis.de
Tue Aug 26 13:14:23 CEST 2014
Am 24.08.14 03:11, schrieb Greg Ewing:
> Isaac Morland wrote:
>> In HTML 5 it allows non-ASCII-compatible encodings as long as U+FEFF
>> (byte order mark) is used:
>>
>> http://www.w3.org/TR/html-markup/syntax.html#encoding-declaration
>>
>> Not sure about XML.
>
> According to Appendix F here:
>
> http://www.w3.org/TR/xml/#sec-guessing
>
> an XML parser needs to be prepared to try all the encodings it
> supports until it finds one that works well enough to decode
> the XML declaration, then it can find out the exact encoding
> used.
That's not what this section says. Instead, it says that
you need to auto-detect UCS-4, UTF-16, UTF-8 from the BOM,
or guess them or EBCDIC from the encoding of '<?'. This should
be enough to actually parse the encoding declaration. Other
non-ASCII-compatible encodings can only be used if declared
in an upper-level protocol (such as HTTP).
The parser is not expected to try out all encodings it supports.
Regards,
Martin
More information about the Python-Dev
mailing list