Is it better to use threads or fork in the following case

Martin P. Hellwig martin.hellwig at dcuktec.org
Sun May 3 18:14:19 EDT 2009


grocery_stocker wrote:
> Let's say there is a new zip file with updated information every 30
> minutes on a remote website. Now, I wanna connect to this website
> every 30 minutes, download the file, extract the information, and then
> have the program search the file search for certain items.
> 
> Would it be better to use threads to break this up? I have one thread
> download the data and then have another to actually process the data .
> Or would it be better to use fork?
> 

I concur with Diez that I don't think threading/forking will bring 
significant advantages for this particular case.

That said, if you are thinking from a responsiveness perspective, I 
would definitely say threading.

If you ask from a performance perspective I would need to know what OS 
you are running (that is is if forking is even supported) and if you 
have multiple CPU's and if you are actually planning on spawning that 
sub-process on a (possibly) a different CPU as the parent process.

So the workflow would be something like this:
Downloading block
If block has enough data to process, spawn a new process (using 
multiprocessing module) and let it write the result back to x (requiring 
lock and release).

Things to keep in mind, is the overhead of:
- Multiple interpreters running on the multiple CPU's
- IPC
- Locking/Releasing
Still less then if you would have no threading at all?

About forking, this usually means that the child process starts out as 
an exact copy of the parent process and runs ideally mostly independent 
of the parent meaning that the best case would be that the child process 
can run fine without the presents of the parent process, is this really 
what you want to do?

-- 
MPH
http://blog.dcuktec.com
'If consumed, best digested with added seasoning to own preference.'



More information about the Python-list mailing list