confused by HTMLParser class

XLiIV Tymoteusz.Jankowski at gmail.com
Wed May 28 02:00:59 EDT 2008


On May 28, 3:20 am, globalrev <skanem... at yahoo.se> wrote:
> tried all kinds of combos to get this to work.
>
> http://docs.python.org/lib/module-HTMLParser.html
>
> from HTMLParser import HTMLParser
>
> class MyHTMLParser(HTMLParser):
>
>     def handle_starttag(self, tag, attrs):
>         print "Encountered the beginning of a %s tag" % tag
>
>     def handle_endtag(self, tag):
>         print "Encountered the end of a %s tag" % tag
>
> from HTMLParser import HTMLParser
> import urllib
> import myhtmlparser
>
> x = MyHTMLParser(HTMLParser())
> site = urllib.urlopen("http://docs.python.org/lib/module-
> HTMLParser.html")
> for row in site:
>     print x.handle_starttag()

this works fine to me:


from HTMLParser import HTMLParser

class MyHTMLParser(HTMLParser):

    def handle_starttag(self, tag, attrs):
        print "Encountered the beginning of a %s tag" % tag

    def handle_endtag(self, tag):
        print "Encountered the end of a %s tag" % tag

#from HTMLParser import HTMLParser
import urllib
#import mythmlparser

site = urllib.urlopen("http://docs.python.org/lib/module-
HTMLParser.html")
x = MyHTMLParser()  #   x = MyHTMLParser(HTMLParser())
x.feed(site.read())
x.close()
for row in site:
    print x.handle_starttag()
site.close()


You should also read this:
http://www.diveintopython.org/html_processing/extracting_data.html
for example



More information about the Python-list mailing list