elementtree question

Ivo ivonet at gmail.com
Fri Sep 21 15:56:47 EDT 2007


Tim Arnold wrote:
> Hi, I'm using elementtree and elementtidy to work with some HTML files. For 
> some of these files I need to enclose the body content in a new div tag, 
> like this:
> <body>
>   <div class="remapped">
>    original contents...
>   </div>
> </body>
> 
> I figure there must be a way to do it by creating a 'div' SubElement to the 
> 'body' tag and somehow copying the rest of the tree under that SubElement, 
> but it's beyond my comprehension.
> 
> How can I accomplish this?
> (I know I could put the class on the body tag itself, but that won't satisfy 
> the powers-that-be).
> 
> thanks,
> --Tim Arnold
> 
> 

You could also try something like this:

from sgmllib import SGMLParser
class IParse(SGMLParser):
     def __init__(self, verbose=0):
         SGMLParser.__init__(self, verbose)
         self.data = ""
     def _attr_to_str(self, attrs):
         return ' '.join(['%s="%s"' % a for a in attrs])

     def start_body(self, attrs):
         self.data += "<body %s>" % self._attr_to_str(attrs)
         print "remapping"
         self.data += '''<div class="remapped">'''
     def end_body(self):
         self.data += "</div>" # end remapping
         self.data += "</body>"
     def handle_data(self, data):
         self.data += data
     def unknown_starttag(self, tag, attrs):
         self.data+="<%s %s>" % (tag,  self._attr_to_str(attrs),)
     def unknown_endtag(self, tag):
         self.data += "</%s>" % tag


if __name__=="__main__":
     i = IParse()
     i.feed('''
<html>
     <body bgcolor="#fffff">
         original
         <i>italic</i>
         <b class="test">contents</b>...
     </body>
</html>''');

     print i.data
     i.close()


just look at the code from sgmllib (standard lib) and it is very easy to 
make a parser. for some much needed refactoring




More information about the Python-list mailing list