parallel programming in Python

Jabba Laci jabba.laci at gmail.com
Thu May 10 08:46:31 EDT 2012


Hi,

Thanks for the answer. I use Linux with CPython 2.7. I plan to work
with CPU bound and I/O bound problems too. Which packages to use in
these cases? Could you redirect me to some guides? When to use
multiprocessing / gevent?

Thanks,

Laszlo


On Thu, May 10, 2012 at 2:34 PM, Dave Angel <d at davea.name> wrote:
> On 05/10/2012 08:14 AM, Jabba Laci wrote:
>> Hi,
>>
>> I would like to do some parallel programming with Python but I don't
>> know how to start. There are several ways to go but I don't know what
>> the differences are between them: threads, multiprocessing, gevent,
>> etc.
>>
>> I want to use a single machine with several cores. I want to solve
>> problems like this: iterate over a loop (with millions of steps) and
>> do some work at each step. The steps are independent, so here I would
>> like to process several steps in parallel. I want to store the results
>> in a global list (which should be "synchronised"). Typical use case:
>> crawl webpages, extract images and collect the images in a list.
>>
>> What's the best way?
>>
>> Thanks,
>>
>> Laszlo
>
> There's no single best-way.  First question is your programming
> environment.  That includes the OS you're running, and the version # and
> implementation of Python.
>
> I'll assume you're using CPython 2.7 on Linux, which is what I have the
> most experience on.  But after you answer, others will probably make
> suggestions appropriate to whatever you're actually using
>
> Next question is whether the problem you're solving at any given moment
> is cpu-bound or i/o bound.  I'll try to answer for both cases, here.
>
> CPU-bound:
> In CPython 2.7, there's a GIL, which is a global lock preventing more
> than one CPU-bound thread from running at the same time.  it's more
> complex than that, but bottom line is that multiple threads won't help
> (and might hurt) a CPU-bound program, even in a multi-core situation.
> So use multiple processes, and cooperate between them with queues or
> shared memory, or even files. In fact, you can use multiple computers,
> and communicate using sockets, in many cases.
>
> IO-bound:
> This is what CPython is good at solving with threads.  Once you make a
> blocking I/O call, usually the C code involves releases the GIL, and
> other threads can run.  For this situation, the fact that you can share
> data structures makes threads a performance win.
>
> Web crawling is likely to be IO-bound, but i wanted to be as complete as
> I could.
>
> --
>
> DaveA
>



More information about the Python-list mailing list