elementtree and gbk encoding

Fredrik Lundh fredrik at pythonware.com
Wed Mar 15 13:55:50 EST 2006


Steven Bethard wrote:

> Hmm...  I downloaded the newest cElementTree (and I already had the
> newest ElementTree), and here's what I get:

>  >>> tree = myparser(filename, 'gbk')
> Traceback (most recent call last):
>    File "<interactive input>", line 1, in ?
>    File "<interactive input>", line 8, in myparser
> SyntaxError: not well-formed (invalid token): line 8, column 6
>
> FWIW, the file used above doesn't have an <?xml encoding?> header:
>
>  >>> open(filename).read()
> '<DOC>\n<DOCID>ART242</DOCID>\n<HEADER>\n
> <DATE></DATE>\n</HEADER>\n<BODY>\n<HEADLINE>\n<S ID=2566>

<S ID=2655> isn't a valid XML tag (the attribute value must be quoted)

if I recode the file into UTF-8 and fix the two S tags, the result displays
just fine in IE and Firefox (I get a few boxes/question marks, but I assume
that's a font problem).

</F>






More information about the Python-list mailing list