ignoring chinese characters parsing xml file
Ryan Ginstrom
software at ginstrom.com
Mon Oct 22 18:06:00 EDT 2007
> On Behalf Of Fabian Lopez
> like ^ÔuÔuà¢à¢²ÅÊDZw.¼ìéLï³²ÅÊÇÛ or ¥Ø¥¢¥¢¥¤¥í¥ó... The problem is that
I get
Just thought I'd point out here that the second string is Japanese, not
Chinese.
>From your second post, it appears that you've parsed the text without
problems -- it's when you go to print them out that you get the error. This
is no doubt because your default encoding can't handle Chinese/Japanese
characters. I can imagine several ways to fix this, including encoding the
text in utf-8 for printout.
If you really want to strip out Asian characters, here's a way:
def strip_asian(text):
""""Returns the Unicode string text, minus any Asian characters"""
return u''.join([x for x in text if ord(x) < 0x3000])
Regards,
Ryan Ginstrom
More information about the Python-list
mailing list