[Tutor] UnicodeEncodeError

Wed Nov 25 14:44:24 CET 2009

Hi,

I'm parsing an xml file using elementtree, but it seems to get stuck on certain non-ascii characters (for example: "ê"). I'm using Python 2.4. Here's the relevant code fragment:

# CODE:
for element in doc.getiterator():
  try:
    m = re.match(search_text, str(element.text))
  except UnicodeEncodeError:
    raise # I want to get rid of this exception.

# PRINTBACK:
    m = re.match(search_text, str(element.text))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xea' in position 4: ordinal not in range(128)

How can I get rid of this unicode encode error. I tried:
s = str(element.text)
s.encode("utf-8")
(and then feeding it into the regex)

The xml file is in UTF-8. Somehow I need to tell the program not to use ascii but utf-8, right?

Thanks in advance!

Cheers!!
Albert-Jan

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In the face of ambiguity, refuse the temptation to guess.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20091125/32965c58/attachment.htm>