Get entire web page?

Steve Holden sholden at holdenweb.com
Fri Nov 3 23:44:04 EST 2000


Whatdoesthisbuttondo? wrote:
> 
> I'm trying to write a script so that I can do a throughput test for a
> firewall/proxy.  I know how to grab the text of a page, what I'm not
> sure about is how to get all of the referenced graphics.
> Any help or pointers existing python code would definitely be
> appreciated.
> 
> Thanks
> 
> Sent via Deja.com http://www.deja.com/
> Before you buy.

The htmllib module allows you to parse the returned HTML, and you need
to handle the images.  This means writing a subclass of HTMLParser
which overrides (defines its own) handle_image method: the parser will
call handle_image each time it processes an image tag.  From the 1.5.2
docs:

handle_image (source, alt[, ismap[, align[, width[, height]]]]) 
	This method is called to handle images. The default implementation
	simply passes the alt value to the handle_data() method. 

It's scary the first time you do it, but turns out to be quite simple
if you overcome your fear.  Good luck!

By the way:  DON'T TOUCH THAT BUTTON!  ;-)

regards
 Steve
-- 
Helping people meet their information needs with training and technology.
703 967 0887      sholden at bellatlantic.net      http://www.holdenweb.com/





More information about the Python-list mailing list