Would anyone show me how to use htmllib?

jackxh at my-deja.com jackxh at my-deja.com
Tue Oct 31 00:59:13 EST 2000


Thank you for the example.
I went back and take a look htmllib again. Some part makes more sense
now. Here is what I wanted to do:

I noticed that are lots of patterns in html pages, I want to extract
infomation out of html pages(based on patterns). I have done this using
perl's regular expression before. Now I am wondering if I can speed up
development process and have a stardard approach for this problem using
python htmllib.

For reference, htmllib library documenation metioned:
######################################################################
#This module defines a class which can serve as a base for parsing text
#files formatted in the HyperText Mark-up Language (HTML).
######################################################################

All of the examples I have seen are extracting URL links from a html
page. I was wondering if I can do more with this modules.

Jack X.



In article <8teabh0f6q at news1.newsguy.com>,
  "Alex Martelli" <aleaxit at yahoo.com> wrote:
> <jackxh at my-deja.com> wrote in message
news:8te2ch$8ou$1 at nnrp1.deja.com...
> > Hi
> > I have read the python library reference. I am a python newbe, I
think I
> > have to overload some functions to get it working. Could anyone give
to
> > a example to show me how it works?
>
> Override, rather than overload.  Normally, yes.  Unless
> you just want the list of links from an HTML page, in
> which case this simple script will do it:
>
> import htmllib
> import formatter
>
> parser=htmllib.HTMLParser(formatter.NullFormatter())
> parser.feed(open('myfile.html').read())
> parser.close()
>
> print parser.anchorlist
>
> Now, if, instead of just instantiating HTMLParser, you
> instantiate a class of your own that derives from it
> and overrides the methods you're interested in, then
> you can do different things.  But it's hard to give a
> meaningful example without knowing what it is you
> want to do.  For some tasks, building your own
> formatter-class and using the plain parser-class from
> htmllib may be a simpler way, too.
>
> Alex
>
>


Sent via Deja.com http://www.deja.com/
Before you buy.



More information about the Python-list mailing list