[Python-ideas] solving multi-core Python

Sun Jun 21 15:09:35 CEST 2015

On 21 June 2015 at 21:41, Sturla Molden <sturla.molden at gmail.com> wrote:
> On 20/06/15 23:42, Eric Snow wrote:
>>
>> tl;dr Let's exploit multiple cores by fixing up subinterpreters,
>> exposing them in Python, and adding a mechanism to safely share
>> objects between them.
>>
>> This proposal is meant to be a shot over the bow, so to speak.  I plan
>> on putting together a more complete PEP some time in the future, with
>> content that is more refined along with references to the appropriate
>> online resources.
>>
>> Feedback appreciated!  Offers to help even more so! :)
>
>
>
> From the perspective of software design, it would be good it the CPython
> interpreter provided an environment instead of using global objects. It
> would mean that all functions in the C API would need to take the
> environment pointer as their first variable, which will be a major rewrite.
> It would also allow the "one interpreter per thread" design similar to tcl
> and .NET application domains.
>
> However, from the perspective of multi-core parallel computing, I am not
> sure what this offers over using multiple processes.
>
> Yes, you avoid the process startup time, but on POSIX systems a fork is very
> fast. An certainly, forking is much more efficient than serializing Python
> objects. It then boils down to a workaround for the fact that Windows cannot
> fork, which makes it particularly bad for running CPython. You also have to
> start up a subinterpreter and a thread, which is not instantaneous. So I am
> not sure there is a lot to gain here over calling os.fork.

Please give Eric and I the courtesy of assuming we know how CPython
works. This article, which is an update of a Python 3 Q&A answer I
wrote some time ago, goes into more detail on the background of this
proposed investigation:
http://python-notes.curiousefficiency.org/en/latest/python3/multicore_python.html

> A non-valid argument for this kind of design is that only code which uses
> threads for parallel computing is "real" multi-core code. So Python does not
> support multi-cores because multiprocessing or os.fork is just faking it.
> This is an argument that belongs in the intellectual junk yard.
> It stems
> from the abuse of threads among Windows and Java developers, and is rooted
> in the absence of fork on Windows and the formerly slow fork on Solaris. And
> thus they are only able to think in terms of threads. If threading.Thread
> does not scale the way they want, they think multicores are out of reach.

Sturla, expressing out and out contempt for entire communities of
capable, competent developers (both the creators of Windows and Java,
and the users of those platforms) has no place on the core Python
mailing lists. Please refrain from casually insulting entire groups of
people merely because you don't approve of their technical choices.

> The reason IPC in multiprocessing is slow is due to calling pickle, it is
> not the IPC in itself. A pipe or an Unix socket (named pipe on Windows) have
> the overhead of a memcpy in the kernel, which is equal to a memcpy plus some
> tiny constant overhead. And if you need two processes to share memory, there
> is something called shared memory. Thus, we can send data between processes
> just as fast as between subinterpreters.

Avoiding object serialisation is indeed the main objective. With
subinterpreters, we have a lot more options for that than we do with
any form of IPC, including shared references to immutable objects, and
the PEP 3118 buffer API.

> All in all, I think we are better off finding a better way to share Python
> objects between processes.

This is not an either/or question, as other folks remain free to work
on improving multiprocessing's IPC efficiency if they want to. We
don't seem to have folks clamouring at the door to work on that,
though.

> P.S. Another thing to note is that with sub-interpreters, you can forget
> about using ctypes or anything else that uses the simplified GIL API (e.g.
> certain Cython generated extensions).

Those aren't fundamental conceptual limitations, they're incidental
limitations of the current design and implementation of the simplified
GIL state API. One of the benefits of introducing a Python level API
for subinterpreters is that it makes it easier to start testing, and
hence fixing, some of those limitations (I actually just suggested to
Eric off list that adding subinterpreter controls to _testcapi might
be a good place to start, as that's beneficial regardless of what, if
anything, ends up happening from a public API perspective)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia