elementtree and gbk encoding

Diez B. Roggisch deets at nospam.web.de
Wed Mar 15 07:29:54 EST 2006


> no, the parser must not to choke on a file for which the encoding has been
> overridden.
> 
> for example, the HTTP standard allows the transport layer to recode text/*
> re- sources as long as it updates the charset properly, so if you e.g send
> an XML document as text/xml and charset=iso-8859-1, the transport layer
> can recode that to charset=utf-8, *without* rewriting the XML header.

I have to correct myself: I was under the impression that XML _has_ to
contain an XMLDecl (which is the header, possibly with encoding) to be
well-formed.

Interestingly enough, that has not to be the case. A document can very well
be well-formed without a header. The constraints for well-formedness are
scattered throughout the spec, so I'm not sure what they say about the used
encoding in absence of a header. 

I am certain though that I've met parsers which weren't able to digest xml
without XMLDecl - which formed my impression. But then, that wasn't
correct.

Boy, that XML-stuff is always full of surprises - even after so many years
dealing with it..

DIez



More information about the Python-list mailing list