cElementTree encoding woes

Diez B. Roggisch deets at nospam.web.de
Mon Feb 20 06:52:44 EST 2006


> Both my python2.3 and python2.4 interpreters seem to know "Windows-1252":
> 
>>>> import codecs
>>>> codecs.open("windows.xml", encoding="windows-1252")
> <open file 'windows.xml', mode 'rb' at 0x403737e0>
> 
> Maybe the problem lies in the python installation rather than
> cElementTree? Just guessing, though.

Hm. No idea why I was under the impression they weren't there - but still,
it doesn't work: I get

inf = file(sys.argv[1])
#inf = codecs.StreamRecoder(inf,encoder, decoder, reader, writer)

for event, elem in cElementTree.iterparse(inf):
    pass

pukes on me with

Traceback (most recent call last):
  File "./splitter.py", line 31, in ?
    for event, elem in cElementTree.iterparse(inf):
  File "<string>", line 61, in __iter__
SyntaxError: not well-formed (invalid token): line 35, column 34

That is the first french character encountered.

"""<title>Introduction aux Probabilités</title>"""


So - then the problem is not the codec being ignored, but it simply is not
working. 

Regards,

Diez



More information about the Python-list mailing list