ignoring chinese characters parsing xml file

Ryan Ginstrom software at ginstrom.com
Mon Oct 22 18:06:00 EDT 2007


> On Behalf Of Fabian Lopez
> like ^ÔuÔuà¢à¢²ÅÊDZw.¼ššìéLï³²ÅÊÇÛ or ¥Ø¥¢¥¢¥¤¥í¥ó... The problem is that
I get 

Just thought I'd point out here that the second string is Japanese, not
Chinese.

>From your second post, it appears that you've parsed the text without
problems -- it's when you go to print them out that you get the error. This
is no doubt because your default encoding can't handle Chinese/Japanese
characters. I can imagine several ways to fix this, including encoding the
text in utf-8 for printout.

If you really want to strip out Asian characters, here's a way:

def strip_asian(text):
    """"Returns the Unicode string text, minus any Asian characters"""
    return u''.join([x for x in text if ord(x) < 0x3000])



Regards,
Ryan Ginstrom




More information about the Python-list mailing list