htmllib question

John Hunter jdhunter at nitace.bsd.uchicago.edu
Sun May 20 18:06:27 EDT 2001


I have some html that I need to parse.  I want to call some function
on all of the html unless it is in PRE tag.  Then I just want to
output it verbatim.

I have gotten far enough with the htmllib that I can print out the
PRE and non-PRE sections separately, but I am losing all the html
markup.  

Sample document
...some_html_1...
<pre> some code 1</pre>
...some_html_2...
<pre> some code 2</pre>
...some_html_3...

And I want to output
somefunc(...some_html_1...)
<pre> some code 1</pre>
somefunc(...some_html_2...)
<pre> some code 2</pre>
somefunc(...some_html_3...)


Any suggestions will be much welcomed

Thanks,
John Hunter


The PRE/non-PRE separator:

#!/usr/local/bin/python
import htmllib
import formatter

class Parser(htmllib.HTMLParser):

    def __init__(self, verbose=0):
        self.anchors = {}
        f = formatter.NullFormatter()
        htmllib.HTMLParser.__init__(self, f, verbose)
        self.save_bgn()
        
    def start_pre(self, attrs):
        print
        print "Text: ", repr(self.save_end())
        self.save_bgn()
        
    def end_pre(self):
        print
        print "CODE: ", repr(self.save_end())
        self.save_bgn();
        
file = open("Edit.html")
html = file.read()
file.close()

p = Parser()
p.feed(html)
p.close()



More information about the Python-list mailing list