insert comments into elementtree

Tim Arnold tim.arnold at sas.com
Fri Nov 16 12:54:23 EST 2007


Hi, I'm using the TidyHTMLTreeBuilder to generate some elementtrees from 
html. One by-product is that I'm losing comments embedded in the html. So 
I'm trying to put them back in, but I'm doing something wrong: here's the 
code snippet of how I generate the Trees:

from elementtree import ElementTree as ET
from elementtidy import TidyHTMLTreeBuilder
XHTML = "{http://www.w3.org/1999/xhtml}"

            htmfile = os.path.join(self.htmloc,filename)
            fd = open(htmfile)
            tidyTree = TidyHTMLTreeBuilder.TidyHTMLTreeBuilder('utf-8')
            tidyTree.feed(fd.read())
            fd.close()
            try:
                tmp = tidyTree.close()
            except:
                print 'Bad file: %s\nSkipping.' % filename
                continue
             tree = ET.ElementTree(tmp)

and here's the method I use to put the comments back in:

def addComments(self,tree):
        body = tree.find('./%sbody' % XHTML)
        for elem in body:
            if elem.tag == '%sdiv' % XHTML and elem.get('class'):
                if elem.get('class') == 'remapped':
                    comElem = ET.SubElement(elem,ET.Comment('stopindex'))

self.addComments(tree)
filename = os.path.join(self.deliverloc,name)
self.htmlcontent.write(tree,filename,encoding=self.encoding

when I try this I get errors from the ElementTree _write method:
TypeError: cannot concatenate 'str' and 'instance' objects

thanks for any help!
--Tim Arnold







More information about the Python-list mailing list