[Python-ideas] Exposing CPython's subinterpreter C-API in the stdlib.

Fri May 26 08:08:30 EDT 2017

Hi all,

Personally I feel that the current subinterpreter support falls short
in the sense that it still requires
a single GIL across interpreters.

If interpreters would have their own individual GIL,
we could have true shared-nothing multi-threaded support similar to
Javascript's "Web Workers".

Here is a point-wise overview of what I am imagining.
I realize the following is very ambitious, but I would like to bring
it to your consideration.

1. Multiple interpreters can be instantiated, each of which is
completely independent.
   To this end, all  global interpreter state needs to go into an
interpreter strucutre, including the GIL
    (which becomes per-interpreter)
   Interpreters share no state whatsoever.

2. PyObject's are tied to a particular interpreter and cannot be
shared between interpreters.
   (This is because each interpreter now has its own GIL.)
   I imagine a special debug build would actually store the
interpreter pointer in the PyObject and would assert everywhere
   that the PyObject is only manipulated by its owning interpreter.

3. Practically all existing APIs, including Py_INCREF and Py_DECREF,
need to get an additional explicit interpreter argument.
    I imagine that we would have a new prefix, say MPy_, because the
existing APIs must be left for backward compatibility.

4. At most one interpreter can be designated the "main" interpreter.
    This is for backward compatibility of existing extension modules ONLY.
    All the existing Py_* APIs operate implicitly on this main interpreter.

5. Extension modules need to explicitly advertise multiple interpreter support.
    If they don't, they can only be imported in the main interpreter.
    However, in that case they can safely use the existing Py_ APIs.

6. Since PyObject's cannot be shared across interpreters, there needs to be an
    explicit function which takes a PyObject in interpreter A and constructs a
    similar object in interpreter B.

    Conceptually this would be equivalent to pickling in A and
unpickling in B, but presumably more efficient.
    It would use the copyreg registry in a similar way to pickle.

7.    Extension modules would also be able to register their function
for copying custom types across interpreters .
      That would allow extension modules to provide custom types where
the underlying C object is in fact not copied
      but shared between interpreters.
      I would imagine we would have a"shared memory" memoryview object
      and also Mutex and other locking constructs which would work
across interpreters.

8. Finally, the main application: functionality similar to the current
`multiprocessing'  module, but with
    multiple interpreters on multiple threads in a single process.
    This would presumably be more efficient than `multiprocessing' and
also allow extra functionality, since the underlying C objects
    can in fact be shared.
    (Imagine two interpreters operating in parallel on a single OpenCL context.)

Stephan

Op 26 mei 2017 10:41 a.m. schreef "Petr Viktorin" <encukou at gmail.com>:
>
> On 05/25/2017 09:01 PM, Eric Snow wrote:
>>
>> On Thu, May 25, 2017 at 11:19 AM, Nathaniel Smith <njs at pobox.com> wrote:
>>>
>>> My impression is that the code to support them inside CPython is fine, but
>>> they're broken and not very useful in the sense that lots of C extensions
>>> don't really support them, so in practice you can't reliably use them to run
>>> arbitrary code. Numpy for example definitely has lots of
>>> subinterpreter-related bugs, and when they get reported we close them as
>>> WONTFIX.
>>>
>>> Based on conversations at last year's pycon, my impression is that numpy
>>> probably *could* support subinterpreters (i.e. the required apis exist), but
>>> none of us really understand the details, it's the kind of problem that
>>> requires a careful whole-codebase audit, and a naive approach might make
>>> numpy's code slower and more complicated for everyone. (For example, there
>>> are lots of places where numpy keeps a little global cache that I guess
>>> should instead be per-subinterpreter caches, which would mean adding an
>>> extra lookup operation to fast paths.)
>>
>>
>> Thanks for pointing this out.  You've clearly described probably the
>> biggest challenge for folks that try to use subinterpreters.  PEP 384
>> was meant to help with this, but seems to have fallen short.  PEP 489
>> can help identify modules that profess subinterpreter support, as well
>> as facilitating future extension module helpers to deal with global
>> state.  However, I agree that *right now* getting extension modules to
>> reliably work with subinterpreters is not easy enough.  Furthermore,
>> that won't change unless there is sufficient benefit tied to
>> subinterpreters, as you point out below.
>
>
> PEP 489 was a first step; the work is not finished. The next step is solving a major reason people are using global state in extension modules: per-module state isn't accessible from all the places it should be, namely in methods of classes. In other words, I don't think Python is ready for big projects like Numpy to start properly supporting subinterpreters.
>
> The work on fixing this has stalled, but it looks like I'll be getting back on track.
> Discussions about this are on the import-sig list, reach out there if you'd like to help.
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/