HTMLParser question

Rajarshi Guha rajarshi at presidency.com
Thu Aug 19 11:27:24 EDT 2004


Hi,
  I have some HTML that looks essentially consists of a series of <div>'s
and each <div> having one of two classes (tnt-question or tnt-answer).
I'm using HTMLParser to handle the tags as:

class MyHTMLParser(HTMLParser.HTMLParser):

    def handle_starttag(self, tag, attrs):
        if len(attrs) == 1:
            cls,whichcls = attrs[0]
            if whichcls == 'tnt-question':
                print self.get_starttag_text(), self.getpos()
    def handle_endtag(self, tag):
        pass
    def handle_data(self, data):
        print data

if __name__ == '__main__':

    htmldata = string.join(open('tt.html','r').readlines())
    parser = MyHTMLParser()
    parser.feed( htmldata )

However what I would like is that when the parser reaches some HTML like
this:

        <div class="tnt-question">
            How do I add a user to a MySQL system?
        </div>

I should get back the data between the open and close tags. However the
above code prints the text contained between all tags, not just the <div>
tags with the class='tnt-question'.

Is there a way to call handle_data() when a specific tag is being handled?
Placing a call to handle_data() in handle_starttag seems to be the way -
but I';m not sure how to actually do it - what data should I pass to the
call?

Any pointers would be appreciated
Thanks,
Rajarshi





More information about the Python-list mailing list