multiprocessing vs thread performance

Sat Jan 3 08:31:12 EST 2009

mk <mrkafk at gmail.com> wrote:
>  After reading http://www.python.org/dev/peps/pep-0371/ I was under 
>  impression that performance of multiprocessing package is similar to 
>  that of thread / threading. However, to familiarize myself with both 
>  packages I wrote my own test of spawning and returning 100,000 empty 
>  threads or processes (while maintaining at most 100 processes / threads 
>  active at any one time), respectively.
> 
>  The results I got are very different from the benchmark quoted in PEP 
>  371. On twin Xeon machine the threaded version executed in 5.54 secs, 
>  while multiprocessing version took over 222 secs to complete!
> 
>  Am I doing smth wrong in code below?

Yes!

The problem with your code is that you never start more than one
process at once in the multiprocessing example.  Just check ps when it
is running and you will see.

My conjecture is that this is due to the way fork() works under unix.
I think that when the parent forks it yields the CPU to the child.
Because you are giving the child effectively no work to do it returns
immediately, re-awakening the parent, thus serialising your jobs.

If you give the children some work to do you'll see a quite different
result.  I gave each child time.sleep(1) to do and cut down the total
number to 10,000.

$ ./test_multiprocessing.py
== Process 1000 working ==
== Process 2000 working ==
== Process 3000 working ==
== Process 4000 working ==
== Process 5000 working ==
== Process 6000 working ==
== Process 7000 working ==
== Process 8000 working ==
== Process 9000 working ==
== Process 10000 working ==
=== Main thread waiting for all processes to finish ===
Total time: 101.382129192

$ ./test_threading.py
== Thread 1000 working ==
== Thread 2000 working ==
== Thread 3000 working ==
== Thread 4000 working ==
== Thread 5000 working ==
== Thread 6000 working ==
== Thread 7000 working ==
== Thread 8000 working ==
== Thread 9000 working ==
== Thread 10000 working ==
Total time:  100.659118176

So almost identical results and as expected - we ran 10,000 sleep(1)s
in 100 seconds so we must have been running 100 simultaneously.

If you replace the "time.sleep(1)" with "for _ in xrange(1000000):
pass" you get this much more interesting answer on my dual core linux
laptop, showing nicely the effect of the contention on the python
global interpreter lock and how multiprocessing avoids it.

$ ./test_multiprocessing.py
== Process 1000 working ==
== Process 2000 working ==
== Process 3000 working ==
== Process 4000 working ==
== Process 5000 working ==
== Process 6000 working ==
== Process 7000 working ==
== Process 8000 working ==
== Process 9000 working ==
== Process 10000 working ==
=== Main thread waiting for all processes to finish ===
Total time: 266.808327913

$ ./test_threading.py
== Thread 1000 working ==
== Thread 2000 working ==
== Thread 3000 working ==
== Thread 4000 working ==
== Thread 5000 working ==
== Thread 6000 working ==
== Thread 7000 working ==
== Thread 8000 working ==
== Thread 9000 working ==
== Thread 10000 working ==
Total time:  834.81882

-- 
Nick Craig-Wood <nick at craig-wood.com> -- http://www.craig-wood.com/nick