parallel programming in Python

Dave Angel d at davea.name
Thu May 10 08:34:26 EDT 2012


On 05/10/2012 08:14 AM, Jabba Laci wrote:
> Hi,
>
> I would like to do some parallel programming with Python but I don't
> know how to start. There are several ways to go but I don't know what
> the differences are between them: threads, multiprocessing, gevent,
> etc.
>
> I want to use a single machine with several cores. I want to solve
> problems like this: iterate over a loop (with millions of steps) and
> do some work at each step. The steps are independent, so here I would
> like to process several steps in parallel. I want to store the results
> in a global list (which should be "synchronised"). Typical use case:
> crawl webpages, extract images and collect the images in a list.
>
> What's the best way?
>
> Thanks,
>
> Laszlo

There's no single best-way.  First question is your programming
environment.  That includes the OS you're running, and the version # and
implementation of Python.

I'll assume you're using CPython 2.7 on Linux, which is what I have the
most experience on.  But after you answer, others will probably make
suggestions appropriate to whatever you're actually using

Next question is whether the problem you're solving at any given moment
is cpu-bound or i/o bound.  I'll try to answer for both cases, here.

CPU-bound:
In CPython 2.7, there's a GIL, which is a global lock preventing more
than one CPU-bound thread from running at the same time.  it's more
complex than that, but bottom line is that multiple threads won't help
(and might hurt) a CPU-bound program, even in a multi-core situation. 
So use multiple processes, and cooperate between them with queues or
shared memory, or even files. In fact, you can use multiple computers,
and communicate using sockets, in many cases.

IO-bound:
This is what CPython is good at solving with threads.  Once you make a
blocking I/O call, usually the C code involves releases the GIL, and
other threads can run.  For this situation, the fact that you can share
data structures makes threads a performance win.

Web crawling is likely to be IO-bound, but i wanted to be as complete as
I could.

-- 

DaveA




More information about the Python-list mailing list