Do I have to use threads?

Gary Herron gherron at islandtraining.com
Wed Jan 6 02:29:20 EST 2010


aditya shukla wrote:
> Hello people,
>
> I have 5 directories corresponding 5  different urls .I want to 
> download images from those urls and place them in the respective 
> directories.I have to extract the contents and download them 
> simultaneously.I can extract the contents and do then one by one. My 
> questions is for doing it simultaneously do I have to use threads?
>
> Please point me in the right direction.
>
>
> Thanks
>
> Aditya

You've been given some bad advice here.

First -- threads are lighter-weight than processes, so threads are 
probably *more* efficient.  However, with only five thread/processes, 
the difference is probably not noticeable.    (If the prejudice against 
threads comes from concerns over the GIL -- that also is a misplaced 
concern in this instance.  Since you only have network connection, you 
will receive only one packet at a time, so only one thread will be 
active at a time.   If the extraction process uses a significant enough 
amount of CPU time so that the extractions are all running at the same 
time *AND* if you are running on a machine with separate CPU/cores *AND* 
you would like the extractions to be running truly in parallel on those 
separate cores,  *THEN*, and only then, will processes be more efficient 
than threads.)

Second, running 5 wgets is equivalent to 5 processes not 5 threads.

And third -- you don't have to use either threads *or* processes.  There 
is another possibility which is much more light-weight:  asynchronous 
I/O,  available through the low level select module, or more usefully 
via the higher-level asyncore module.  (Although the learning curve 
might trip you up, and some people find the programming model for 
asyncore hard to fathom,  I find it more intuitive in this case than 
threads/processes.)

In fact, the asyncore manual page has a ~20 line class which implements 
a web page retrieval.  You could replace that example's single call to 
http_client with five calls, one for each of your ULRs.  Then when you 
enter the last line (that is the asyncore.loop() call) the five  will be 
downloading simultaneously.

See http://docs.python.org/library/asyncore.html

Gary Herron
 



More information about the Python-list mailing list