cElementTree encoding woes

Peter Otten __peter__ at web.de
Mon Feb 20 06:39:50 EST 2006


Diez B. Roggisch wrote:

> I've got to deal with a pretty huge XML-document, and to do so I use the
> cElementTree.iterparse functionality. Working great.
> 
> Only trouble: The guys creating that chunk of XML - well, lets just say
> they are "encodingly challanged", so they don't produce utf-8, but only
> cp1252 instead, together with some weird name (Windows-1252) for that.
> That is not part of the standard codecs module. cp1252 is, of course.
> 
> But that won't work for iterparse. So currently, I manually change the
> encoding given to utf-8, and use a stream-recoder.
> 
> However, I was wondering if I could teach cElementTree about that encoding
> name. I tried to register cp1252 under the name Windows-1252, but had no
> luck - cET won't buy it.
> 
> Any suggestions?

Both my python2.3 and python2.4 interpreters seem to know "Windows-1252":

>>> import codecs
>>> codecs.open("windows.xml", encoding="windows-1252")
<open file 'windows.xml', mode 'rb' at 0x403737e0>

Maybe the problem lies in the python installation rather than cElementTree?
Just guessing, though.

Peter




More information about the Python-list mailing list