[Tutor] UnicodeEncodeError
Albert-Jan Roskam
fomcl at yahoo.com
Wed Nov 25 14:44:24 CET 2009
Hi,
I'm parsing an xml file using elementtree, but it seems to get stuck on certain non-ascii characters (for example: "ê"). I'm using Python 2.4. Here's the relevant code fragment:
# CODE:
for element in doc.getiterator():
try:
m = re.match(search_text, str(element.text))
except UnicodeEncodeError:
raise # I want to get rid of this exception.
# PRINTBACK:
m = re.match(search_text, str(element.text))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xea' in position 4: ordinal not in range(128)
How can I get rid of this unicode encode error. I tried:
s = str(element.text)
s.encode("utf-8")
(and then feeding it into the regex)
The xml file is in UTF-8. Somehow I need to tell the program not to use ascii but utf-8, right?
Thanks in advance!
Cheers!!
Albert-Jan
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In the face of ambiguity, refuse the temptation to guess.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20091125/32965c58/attachment.htm>
More information about the Tutor
mailing list