ignoring chinese characters parsing xml file

Fabian López fabian at syameses.com
Mon Oct 22 17:29:35 EDT 2007


Thanks Mark, the code is like this. The attrib name is the problem:

from lxml import etree

context = etree.iterparse("file.xml")
for action, elem in context:
    if elem.tag == "weblog":
        print action, elem.tag , elem.attrib["name"],elem.attrib["url"],
elem.attrib["rssUrl"]

And the xml file like:
<weblog name="xxxxxx" url="http://weblogli.com " when="4" />


22 Oct 2007 20:20:16 GMT, Marc 'BlackJack' Rintsch <bj_666 at gmx.net>:
>
> On Mon, 22 Oct 2007 21:24:40 +0200, Fabian López wrote:
>
> > I am parsing an XML file that includes chineses characters, like ^
> > uu啖啖才是w.扉L锍才是 or ヘアアイロン... The problem is that I get an error like:
> > UnicodeEncodeerror:'charmap' codec can't encode characters in
> > position..
>
> You say you are *parsing* the file but this is an *encode* error.  Parsing
> means *decoding*.
>
> You have to show some code and the actual traceback to get help.  Crystal
> balls are not that reliable.  ;-)
>
> Ciao,
>         Marc 'BlackJack' Rintsch
> --
> http://mail.python.org/mailman/listinfo/python-list
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20071022/99235a48/attachment.html>


More information about the Python-list mailing list