[Tutor] Writing a web bot.

Remco Gerlich scarblac@pino.selwerd.nl
Sat, 8 Jul 2000 23:49:09 +0200


On Fri, Jul 07, 2000 at 06:29:53PM -0400, Furmanek, Greg wrote:
> Hi all.
> 
> It appears I have found myself in a position
> where I could use some help.
> 
> The task I am trying to perform is write an
> internet bot.  I was going to use urllib for
> this project however one of the requirements
> is for the connection to be continuous during
> the session.  
> 
> Connect to a site.
> Get page, parse.
> Get another page, parse.
> use POST method, get another page, parse.
> Disconnect from the site.
> 
> The connection is not supposed to be dropped
> between the requests.
> 
> Is there a simple way to do this task???

I've never needed to do this and I haven't studied urllib. But can't
you change urllib so that it uses an existing connection if it has used
a connection before? Find it out it opens connections, then redefine the
functions or inherit a class in your own module, something like that.

Also, websucker.py and webchecker.py have already been written (they're
in Python's Tools/ directory, maybe you need to download the source
distribution to get them). These tools download a whole site or check links
in a whole site. You probably want something like that. A script to parse
robots.txt files is also included.

But they use seperate connections for each file, I think...

-- 
Remco Gerlich,  scarblac@pino.selwerd.nl