[Tutor] parsing html
David Porter
jcm@bigskytel.com
Tue, 19 Sep 2000 19:39:22 -0600
* michaelbaker@operamail.com <michaelbaker@operamail.com>:
>
> >I've tried the docs and searching python.org and I check the tutor
> >archives back to Jan 2000 - I'm trying to write a little program that will
> >select keywords from a dictionary or list and submit them to
> >www.google.com. I can submit and read results from google using urllib and
> >file.read() just fine, but this returns raw html. I'd like to cut through
> >the html I can't get the sgmllib.SGLMParser to work :(. can someone point
> >me to some examples of using sgmllib or suggest another way? thanks in
> >advance, m baker
The following thread from comp.lang.python includes both explanations and
examples of using sgmllib and htmllib:
http://x51.deja.com/viewthread.xp?AN=669758132&search=thread&svcclass=dncurrent&ST=PS&CONTEXT=969413393.1966866459&HIT_CONTEXT=969413393.1966866459&HIT_NUM=3&recnum=%3cNQ9w5.346$n4.24503@newsc.telia.net%3e%231/1&group=comp.lang.python&frpage=viewthread.xp&back=clarinet
That is one line.
This example from the effbot would be very easy to alter:
http://www.deja.com/threadmsg_ct.xp?AN=669758132&fmt=text
Right now it extracts the strings from <img src=""> tags.
David