elementtree question
Ivo
ivonet at gmail.com
Fri Sep 21 15:56:47 EDT 2007
Tim Arnold wrote:
> Hi, I'm using elementtree and elementtidy to work with some HTML files. For
> some of these files I need to enclose the body content in a new div tag,
> like this:
> <body>
> <div class="remapped">
> original contents...
> </div>
> </body>
>
> I figure there must be a way to do it by creating a 'div' SubElement to the
> 'body' tag and somehow copying the rest of the tree under that SubElement,
> but it's beyond my comprehension.
>
> How can I accomplish this?
> (I know I could put the class on the body tag itself, but that won't satisfy
> the powers-that-be).
>
> thanks,
> --Tim Arnold
>
>
You could also try something like this:
from sgmllib import SGMLParser
class IParse(SGMLParser):
def __init__(self, verbose=0):
SGMLParser.__init__(self, verbose)
self.data = ""
def _attr_to_str(self, attrs):
return ' '.join(['%s="%s"' % a for a in attrs])
def start_body(self, attrs):
self.data += "<body %s>" % self._attr_to_str(attrs)
print "remapping"
self.data += '''<div class="remapped">'''
def end_body(self):
self.data += "</div>" # end remapping
self.data += "</body>"
def handle_data(self, data):
self.data += data
def unknown_starttag(self, tag, attrs):
self.data+="<%s %s>" % (tag, self._attr_to_str(attrs),)
def unknown_endtag(self, tag):
self.data += "</%s>" % tag
if __name__=="__main__":
i = IParse()
i.feed('''
<html>
<body bgcolor="#fffff">
original
<i>italic</i>
<b class="test">contents</b>...
</body>
</html>''');
print i.data
i.close()
just look at the code from sgmllib (standard lib) and it is very easy to
make a parser. for some much needed refactoring
More information about the Python-list
mailing list