Help with parsing web page
RiGGa
rigga at hasnomail.com
Tue Jun 15 09:15:51 EDT 2004
Miki Tebeka wrote:
> Hello RiGGa,
>
>> Anyone?, I have found out I can use sgmllib but find the documentation is
>> not that clear, if anyone knows of a tutorial or howto it would be
>> appreciated.
> I'm not an expert but this is how I work:
>
> You make a subclass of HTMLParser and override the callback functions.
> Usually I use only start_<TAB> end_<TAB> and handle_data.
> Since you don't know *when* each callback function is called you need to
> keep an internal state. It can be a simple variable or a stack if you
> want to deal with nested tags.
>
> A short example:
> #!/usr/bin/env python
>
> from htmllib import HTMLParser
> from formatter import NullFormatter
>
> class TitleParser(HTMLParser):
> def __init__(self):
> HTMLParser.__init__(self, NullFormatter())
> self.state = ""
> self.data = ""
>
> def start_title(self, attrs):
> self.state = "title"
> self.data = ""
>
> def end_title(self):
> print "Title:", self.data.strip()
>
> def handle_data(self, data):
> if self.state:
> self.data += data
>
> if __name__ == "__main__":
> from sys import argv
>
> parser = TitleParser()
> parser.feed(open(argv[1]).read())
>
> HTH.
> --
> -------------------------------------------------------------------------
> Miki Tebeka <miki.tebeka at zoran.com>
> The only difference between children and adults is the price of the toys.
Thanks for taking the time to help its appreciated, I am new to Python so a
little confused with what you have posted however I will go through it
again and se if it makes more sense.
Many thanks
Rigga
More information about the Python-list
mailing list