[Tutor] re question

Sat Aug 9 14:08:52 EDT 2003

Hello,

Jonathan Hayward wrote:

> I looked through the library docs on this, and tried to do it with 
> re's because figuring out how to use HTMLParser looked like more work 
> than using re's -- 3 hours' documentation search to avoid one hour of 
> reinventing the wheel. 

Here is an example I wrote yesterday night to test the HTMLParser 
module. (This module seems to have been added in Python 2.2. Now we have 
two parsers for HTML code: "HTMLParser.HTMLParser" and 
"htmllib.HTMLParser".)

###
from HTMLParser import HTMLParser

class CustomParser(HTMLParser):
    def __init__(self):
        self.inStrong = False
        self.payload = []
        HTMLParser.__init__(self)

    def parse(self, text):
        self.reset()
        self.feed(text)

    def handle_starttag(self, tag, attr):
        if tag == "strong":
            self.inStrong = True

    def handle_endtag(self, tag):
        if tag == "strong":
            self.inStrong = False

    def handle_data(self, data):
        if self.inStrong:
            self.payload.append(data)

if __name__ == "__main__":
    s = """text text <strong>this is strong</strong> text <strong>this 
is strong too</strong>"""
    p = CustomParser()
    p.parse(s)
    print p.payload
###

Cheers.

Alexandre