elementtree and gbk encoding

Fredrik Lundh fredrik at pythonware.com
Wed Mar 15 07:46:15 EST 2006


Diez B. Roggisch wrote:

> Interestingly enough, that has not to be the case. A document can very well
> be well-formed without a header. The constraints for well-formedness are
> scattered throughout the spec, so I'm not sure what they say about the used
> encoding in absence of a header.

if there's no header, and no external override, the document must use either
UTF-8 or UTF-16, and for UTF-16, a leading byte order mark must be present
(ASCII is of course a subset of UTF-8, but e.g. ISO-8859-1 isn't).

reading

    http://www.w3.org/TR/2004/REC-xml-20040204/#sec-guessing

may also help (at least if you read between the lines).

> Boy, that XML-stuff is always full of surprises - even after so many years
> dealing with it..

a specification written for humans would have saved the world a lot of con-
fusion...

</F> 






More information about the Python-list mailing list