[Tutor] re question
Alexandre Ratti
alex at gabuzomeu.net
Sat Aug 9 14:08:52 EDT 2003
Hello,
Jonathan Hayward wrote:
> I looked through the library docs on this, and tried to do it with
> re's because figuring out how to use HTMLParser looked like more work
> than using re's -- 3 hours' documentation search to avoid one hour of
> reinventing the wheel.
Here is an example I wrote yesterday night to test the HTMLParser
module. (This module seems to have been added in Python 2.2. Now we have
two parsers for HTML code: "HTMLParser.HTMLParser" and
"htmllib.HTMLParser".)
###
from HTMLParser import HTMLParser
class CustomParser(HTMLParser):
def __init__(self):
self.inStrong = False
self.payload = []
HTMLParser.__init__(self)
def parse(self, text):
self.reset()
self.feed(text)
def handle_starttag(self, tag, attr):
if tag == "strong":
self.inStrong = True
def handle_endtag(self, tag):
if tag == "strong":
self.inStrong = False
def handle_data(self, data):
if self.inStrong:
self.payload.append(data)
if __name__ == "__main__":
s = """text text <strong>this is strong</strong> text <strong>this
is strong too</strong>"""
p = CustomParser()
p.parse(s)
print p.payload
###
Cheers.
Alexandre
More information about the Tutor
mailing list