ignoring chinese characters parsing xml file

Fabian López fabian at syameses.com
Tue Oct 23 12:39:57 EDT 2007


Thanks, I have tried all you told me. It was an error on print statement. So
I decided to catch the exception if I had an UnicodeEncodeError, that is, if
I had chinese/japanese characters because they don't interest to me and it
worked.
The strip_asian function of Ryan didn't work well here, but it's a good idea
for next goals.
Thanks a lot!
Fabian

2007/10/23, limodou <limodou at gmail.com>:
>
> On 10/23/07, Stefan Behnel <stefan.behnel-n05pAM at web.de> wrote:
> > Fabian López wrote:
> > > Thanks Mark, the code is like this. The attrib name is the problem:
> > >
> > > from lxml import etree
> > >
> > > context = etree.iterparse("file.xml")
> > > for action, elem in context:
> > >     if elem.tag == "weblog":
> > >         print action, elem.tag , elem.attrib["name"],elem.attrib
> ["url"],
> >
> > The problem is the print statement. Looks like your terminal encoding
> (that
> > Python needs to encode the unicode string to) can't handle these unicode
> > characters.
> >
> I agree. For Japanese, you should know the exactly encoding name, and
> convert them, just like:
>
> print text.encoding('encoding')
>
> --
> I like python!
> UliPad <<The Python Editor>>: http://code.google.com/p/ulipad/
> meide <<wxPython UI module>>: http://code.google.com/p/meide/
> My Blog: http://www.donews.net/limodou
> --
> http://mail.python.org/mailman/listinfo/python-list
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20071023/c8aaf190/attachment.html>


More information about the Python-list mailing list