elementtree w/utf8

Tim Arnold tim.arnold at sas.com
Thu Oct 25 17:15:36 EDT 2007


Hi, I'm getting the by-now-familiar error:
return codecs.charmap_decode(input,errors,decoding_map)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xa9' in position 
4615: ordinal not in range(128)

the html file I'm working with is in utf-8, I open it with codecs, try to 
feed it to TidyHTMLTreeBuilder, but no luck. Here's my code:
from elementtree import ElementTree as ET
from elementtidy import TidyHTMLTreeBuilder

            fd = codecs.open(htmfile,encoding='utf-8')
            tidyTree = 
TidyHTMLTreeBuilder.TidyHTMLTreeBuilder(encoding='utf-8')
            tidyTree.feed(fd.read())
            self.tree = tidyTree.close()
            fd.close()

what am I doing wrong? Thanks in advance.

On a related note, I have another question--where/how can I get the 
cElementTree.py module? Sorry for something so basic, but I tried installing 
cElementTree, but while I could compile with setup.py build, I didn't end up 
with a cElementTree.py file anywhere. The directory structure on my system 
(HPux, but no root access) doesn't work well with setup.py install.

thanks,
--Tim Arnold





More information about the Python-list mailing list