cElementTree encoding woes

Fredrik Lundh fredrik at pythonware.com
Mon Feb 20 06:55:39 EST 2006


Diez B. Roggisch wrote:

> I've got to deal with a pretty huge XML-document, and to do so I use the
> cElementTree.iterparse functionality. Working great.
>
> Only trouble: The guys creating that chunk of XML - well, lets just say they
> are "encodingly challanged", so they don't produce utf-8, but only cp1252
> instead, together with some weird name (Windows-1252) for that. That is not
> part of the standard codecs module. cp1252 is, of course.
>
> But that won't work for iterparse. So currently, I manually change the
> encoding given to utf-8, and use a stream-recoder.
>
> However, I was wondering if I could teach cElementTree about that encoding
> name. I tried to register cp1252 under the name Windows-1252, but had no
> luck - cET won't buy it.

you need cET 1.0.5 or later for this to work.  for earlier versions, you have to use
stream recoding:

    http://effbot.org/zone/celementtree-encoding.htm

</F> 






More information about the Python-list mailing list