htmllib question
John Hunter
jdhunter at nitace.bsd.uchicago.edu
Sun May 20 18:06:27 EDT 2001
I have some html that I need to parse. I want to call some function
on all of the html unless it is in PRE tag. Then I just want to
output it verbatim.
I have gotten far enough with the htmllib that I can print out the
PRE and non-PRE sections separately, but I am losing all the html
markup.
Sample document
...some_html_1...
<pre> some code 1</pre>
...some_html_2...
<pre> some code 2</pre>
...some_html_3...
And I want to output
somefunc(...some_html_1...)
<pre> some code 1</pre>
somefunc(...some_html_2...)
<pre> some code 2</pre>
somefunc(...some_html_3...)
Any suggestions will be much welcomed
Thanks,
John Hunter
The PRE/non-PRE separator:
#!/usr/local/bin/python
import htmllib
import formatter
class Parser(htmllib.HTMLParser):
def __init__(self, verbose=0):
self.anchors = {}
f = formatter.NullFormatter()
htmllib.HTMLParser.__init__(self, f, verbose)
self.save_bgn()
def start_pre(self, attrs):
print
print "Text: ", repr(self.save_end())
self.save_bgn()
def end_pre(self):
print
print "CODE: ", repr(self.save_end())
self.save_bgn();
file = open("Edit.html")
html = file.read()
file.close()
p = Parser()
p.feed(html)
p.close()
More information about the Python-list
mailing list