Parallel Python

parallelpython at gmail.com parallelpython at gmail.com
Thu Jan 11 18:41:16 EST 2007


sturlamolden wrote:
> parallelpyt... at gmail.com wrote:
>
> >    That's right. ppsmp starts multiple interpreters in separate
> > processes and organize communication between them through IPC.
>
> Thus you are basically reinventing MPI.
>
> http://mpi4py.scipy.org/
> http://en.wikipedia.org/wiki/Message_Passing_Interface

Thanks for bringing that into consideration.

I am well aware of MPI and have written several programs in C/C++ and
Fortran which use it.
I would agree that MPI is the most common solution to run software on a
cluster (computers connected by network). Although there is another
parallelization approach: PVM (Parallel Virtual Machine)
http://www.csm.ornl.gov/pvm/pvm_home.html. I would say ppsmp is more
similar to the later.

By the way there are links to different python parallelization
techniques (including MPI) from PP site:
http://www.parallelpython.com/component/option,com_weblinks/catid,14/Itemid,23/

The main difference between MPI python solutions and ppsmp is that with
MPI you have to organize both computations
{MPI_Comm_rank(MPI_COMM_WORLD, &id); if id==1 then ... else ....} and
data distribution (MPI_Send / MPI_Recv) by yourself. While with ppsmp
you just submit a function with arguments to the execution server and
retrieve the results later.
That makes transition from serial python software to parallel much
simpler with ppsmp than with MPI.

To make this point clearer here is a short example:
--------------------serial code 2 lines------------------
for input in inputs:
    print "Sum of primes below", input, "is", sum_primes(input)
--------------------parallel code 3 lines----------------
jobs = [(input, job_server.submit(sum_primes,(input,), (isprime,),
("math",))) for input in inputs]
for input, job in jobs:
    print "Sum of primes below", input, "is", job()
---------------------------------------------------------------
In this example parallel execution was added at the cost of 1 line of
code!

The other difference with MPI is that ppsmp dynamically decides where
to run each given job. For example if there are other active processes
running in the system ppsmp will use in the bigger extent the
processors which are free. Since in MPI the whole tasks is usually
divided  between processors equally at the beginning, the overall
runtime will be determined by the slowest running process (the one
which shares processor with another running program). In this
particular case ppsmp will outperform MPI.

The third, probably less important, difference is that with MPI based
parallel python code you must have MPI installed in the system.

Overall ppsmp is still work in progress and there are other interesting
features which I would like to implement. This is the main reason why I
do not open the source of ppsmp - to have better control of its future
development, as advised here: http://en.wikipedia.org/wiki/Freeware :-)

Best regards,
Vitalii




More information about the Python-list mailing list