elementtree and gbk encoding
Diez B. Roggisch
deets at nospam.web.de
Wed Mar 15 07:29:54 EST 2006
> no, the parser must not to choke on a file for which the encoding has been
> overridden.
>
> for example, the HTTP standard allows the transport layer to recode text/*
> re- sources as long as it updates the charset properly, so if you e.g send
> an XML document as text/xml and charset=iso-8859-1, the transport layer
> can recode that to charset=utf-8, *without* rewriting the XML header.
I have to correct myself: I was under the impression that XML _has_ to
contain an XMLDecl (which is the header, possibly with encoding) to be
well-formed.
Interestingly enough, that has not to be the case. A document can very well
be well-formed without a header. The constraints for well-formedness are
scattered throughout the spec, so I'm not sure what they say about the used
encoding in absence of a header.
I am certain though that I've met parsers which weren't able to digest xml
without XMLDecl - which formed my impression. But then, that wasn't
correct.
Boy, that XML-stuff is always full of surprises - even after so many years
dealing with it..
DIez
More information about the Python-list
mailing list