Web Spider

Peter Hansen peter at engcorp.com
Tue Jul 6 11:19:01 EDT 2004


Thomas Lindgaard wrote:

> A couple of questions:
> 
> 1) Why use the 
>   if __name__ == '__main__':
> construct?

Answered indirectly in this FAQ: 
http://www.python.org/doc/faq/programming.html#how-do-i-find-the-current-module-name

> 2) In Retrievepool.__init__ the Retriever.__init__ is called with
> self.inputQueue and self.outputQueue as arguments. Does this mean that
> each Retriever thread has a reference to Retrievepool.inputQueue and
> Retrievepool.outputQueue 

Yes, and that's sort of the whole point of the thing.

> 3) How many threads will be running? Spider.run initializes the
> Retrievepool and this will consist of MAX_THREADS threads, so once the
> crawler is running there will be the main thread (caught in the while loop
> in Spider.run) and MAX_THREADS Retriever threads running, right?

Yep.  Good analysis. :-)  You could inject this somewhere to
check:

print len(threading.enumerate()), 'threads exist'

-Peter



More information about the Python-list mailing list