Multiple scripts versus single multi-threaded script

Roy Smith roy at panix.com
Thu Oct 3 12:41:51 EDT 2013


In article <f01b2e7a-9fc7-4138-bb6e-447d31179f2d at googlegroups.com>,
 JL <lightaiyee at gmail.com> wrote:

> What is the difference between running multiple python scripts and a single 
> multi-threaded script? May I know what are the pros and cons of each 
> approach? Right now, my preference is to run multiple separate python scripts 
> because it is simpler.

First, let's take a step back and think about multi-threading vs. 
multi-processing in general (i.e. in any language).

Threads are lighter-weight.  That means it's faster to start a new 
thread (compared to starting a new process), and a thread consumes fewer 
system resources than a process.  If you have lots of short-lived tasks 
to run, this can be significant.  If each task will run for a long time 
and do a lot of computation, the cost of startup becomes less of an 
issue because it's amortized over the longer run time.

Threads can communicate with each other in ways that processes can't.  
For example, file descriptors are shared by all the threads in a 
process, so one thread can open a file (or accept a network connection), 
then hand the descriptor off to another thread for processing.  Threads 
also make it easy to share large amounts of data because they all have 
access to the same memory.  You can do this between processes with 
shared memory segments, but it's more work to set up.

The downside to threads is that all of of this sharing makes them much 
more complicated to use properly.  You have to be aware of how all the 
threads are interacting, and mediate access to shared resources.  If you 
do that wrong, you get memory corruption, deadlocks, and all sorts of 
(extremely) difficult to debug problems.  A lot of the really hairy 
problems (i.e. things like one thread continuing to use memory which 
another thread has freed) are solved by using a high-level language like 
Python which handles all the memory allocation for you, but you can 
still get deadlocks and data corruption.

So, the full answer to your question is very complicated.  However, if 
you're looking for a short answer, I'd say just keep doing what you're 
doing using multiple processes and don't get into threading.



More information about the Python-list mailing list