HTML Parser - beginner needs help

Alex Martelli aleaxit at yahoo.com
Thu Sep 14 17:40:18 EDT 2000


"zet" <zet at i.com.ua> wrote in message
news:968957222.616980 at ipt2.iptelecom.net.ua...
> > if you want something special, maybe you'd be better off building your
own
> > parsing class, it's not that difficult.
> Is there any examples for doing this?
>
> And for what then all this stuff in HTMLParser? do_tag(), start_tag(),
> end_tag() and so on?
> How to use it? What for all of this?

The various do_x (for each tag x that has no closing-tag requited) or
start_y and end_y (for each tag y that has both opening and closing)
are methods you can override (deriving your own class from HTMLParser)
to process certain tags in special ways.  You can do that directly by
subclassing sgmllib.SGMLParser, but you can also choose to subclass
htmllib.HTMLParser instead if it does something you can use.

I posted an example about a month ago where somebody's problem
was to extract data contained in <TABLE> tags into Python lists.  I
think deja.com's advanced search functions should make it easy
to find...


Alex






More information about the Python-list mailing list