Web Spider
Peter Hansen
peter at engcorp.com
Tue Jul 6 11:19:01 EDT 2004
Thomas Lindgaard wrote:
> A couple of questions:
>
> 1) Why use the
> if __name__ == '__main__':
> construct?
Answered indirectly in this FAQ:
http://www.python.org/doc/faq/programming.html#how-do-i-find-the-current-module-name
> 2) In Retrievepool.__init__ the Retriever.__init__ is called with
> self.inputQueue and self.outputQueue as arguments. Does this mean that
> each Retriever thread has a reference to Retrievepool.inputQueue and
> Retrievepool.outputQueue
Yes, and that's sort of the whole point of the thing.
> 3) How many threads will be running? Spider.run initializes the
> Retrievepool and this will consist of MAX_THREADS threads, so once the
> crawler is running there will be the main thread (caught in the while loop
> in Spider.run) and MAX_THREADS Retriever threads running, right?
Yep. Good analysis. :-) You could inject this somewhere to
check:
print len(threading.enumerate()), 'threads exist'
-Peter
More information about the Python-list
mailing list