[Tutor] parsing html

David Porter jcm@bigskytel.com
Tue, 19 Sep 2000 19:39:22 -0600


* michaelbaker@operamail.com <michaelbaker@operamail.com>:
> 
> >I've tried the docs and searching python.org and I check the tutor 
> >archives back to Jan 2000 - I'm trying to write a little program that will 
> >select keywords from a dictionary or list and submit them to 
> >www.google.com. I can submit and read results from google using urllib and 
> >file.read() just fine, but this returns raw html. I'd like to cut through 
> >the html I can't get the sgmllib.SGLMParser to work :(. can someone point 
> >me to some examples of using sgmllib or suggest another way?  thanks in 
> >advance, m baker

The following thread from comp.lang.python includes both explanations and 
examples of using sgmllib and htmllib:

http://x51.deja.com/viewthread.xp?AN=669758132&search=thread&svcclass=dncurrent&ST=PS&CONTEXT=969413393.1966866459&HIT_CONTEXT=969413393.1966866459&HIT_NUM=3&recnum=%3cNQ9w5.346$n4.24503@newsc.telia.net%3e%231/1&group=comp.lang.python&frpage=viewthread.xp&back=clarinet

That is one line.

This example from the effbot would be very easy to alter:

http://www.deja.com/threadmsg_ct.xp?AN=669758132&fmt=text

Right now it extracts the strings from <img src=""> tags.


  David