[Python-ideas] Parallel processing with Python

Adam Olsen rhamph at gmail.com
Thu Feb 19 03:34:30 CET 2009


On Wed, Feb 18, 2009 at 4:34 PM, Sturla Molden <sturla at molden.no> wrote:
> Thus it is quite easy to make multiple, independent Python interpreters
> live isolated lives in the same process. As opposed to multiple processes,
> they can communicate without involving any IPC. It would also be possible
> to design proxy objects allowing one interpreter access to an object in
> another. Immutable object such as strings would be particularly easy to
> share.
>
> This very simple scheme should allow parallel processing with Python
> similar to how it's done in Erlang, without the GIL getting in our way. At
> least on Windows this can be done without touching the CPython source at
> all. I am not sure about Linux though. I may be necessary to patch the
> CPython source to make it work there.

To clarify:
* Erlang's modules/classes/functions are not first-class objects, so
it doesn't need a copy of them.  Python does, so each interpreter
would have a memory footprint about the same as a true process.
* Any communication requires a serialize/copy/deserialize sequence.
You don't need a full context switch, but it's still not cheap.
* It's probably not worth sharing even str objects.  You'd need atomic
refcounting and a hack in Py_TYPE to always give the local type, both
of which would slow everything down.

The real use case here is when you have a large, existing library that
you're not willing to modify to use a custom (shared memory)
allocator.  That library must have a large data set (too large to
duplicate in each process), and not an external database, must be
multithreaded in a scalable way, yet be too performance sensitive for
real IPC.  Also, any data shared between interpreters must be part of
that large, existing library, rather than python objects.  Finally,
since each interpreter uses as much memory as a process you must only
need a small, fixed number of interpreters, preferably long running
(or at least a thread pool).

If that describes your use case then I'm happy for you, go ahead and
use this DLL trick.


-- 
Adam Olsen, aka Rhamphoryncus



More information about the Python-ideas mailing list