[Python-ideas] Exposing CPython's subinterpreter C-API in the stdlib.

Fri May 26 09:49:26 EDT 2017

Hi Nick,

As far as I understand, the (to me) essential difference between your
approach and my proposal is that:

Approach 1 (PEP-489):
   * Single (global) GIL.
   * PyObject's may be shared across interpreters (zero-copy transfer)

Approach 2 (mine)
   * Per-interpreter GIL.
   * PyObject's must be copied across interpreters.

To me, the per-interpreter GIL is the essential "target" I am aiming for,
and I am willing to sacrifice the zero-copy for that.
If the GIL is still shared then I don't see much advantage of this
approach over just using the "threading" module
with a single interpreter.
(I realize it still gives you some isolation between interpreters.
 To me personally this is not very interesting, but this may be myopic.)

> For the time being though, a single GIL remains
> much easier to manage.

"For the time being" suggests that you are intending approach 1 to be
ultimately a stepping stone to
something similar to approach 2?

> Yes, something like Rust's ownership model is the gist of what we had
> in mind (i.e. allowing zero-copy transfer of ownership between
> subinterpreters, but only the owning interpreter is allowed to do
> anything else with the object).

This can be emulated in approach 2 by creating a wrapper C-level type which
contains a PyObject and its corresponding interpreter. So that interpreter A can
reference an object in interpreter B.

>> 3. Practically all existing APIs, including Py_INCREF and Py_DECREF,
>> need to get an additional explicit interpreter argument.
>>     I imagine that we would have a new prefix, say MPy_, because the
>> existing APIs must be left for backward compatibility.
>
> This isn't necessary, as the active interpreter is already tracked as
> part of the thread local state (otherwise mod_wsgi et al wouldn't work
> at all).

I realize that it is possible to that it that way.
However this has some disadvantages:

* The interpreter becomes tied to a thread, or you need to have some
way to switch interpeters on a thread.
  (Which makes your code look like OpenGL code;-) )

* Once you are going to write code which manipulates objects in
multiple interpreters
    (e.g. my proposed copy function or the "foreign interpreter
wrapper" I discussed above)
    making the interpreter explicit probably avoids headaches.

* Explicit is better than implicit, as somebody once said. ;-)

Stephan

2017-05-26 15:17 GMT+02:00 Nick Coghlan <ncoghlan at gmail.com>:
> On 26 May 2017 at 22:08, Stephan Houben <stephanh42 at gmail.com> wrote:
>> Hi all,
>>
>> Personally I feel that the current subinterpreter support falls short
>> in the sense that it still requires
>> a single GIL across interpreters.
>>
>> If interpreters would have their own individual GIL,
>> we could have true shared-nothing multi-threaded support similar to
>> Javascript's "Web Workers".
>>
>> Here is a point-wise overview of what I am imagining.
>> I realize the following is very ambitious, but I would like to bring
>> it to your consideration.
>>
>> 1. Multiple interpreters can be instantiated, each of which is
>> completely independent.
>>    To this end, all  global interpreter state needs to go into an
>> interpreter strucutre, including the GIL
>>     (which becomes per-interpreter)
>>    Interpreters share no state whatsoever.
>
> There'd still be true process global state (i.e. anything managed by
> the C runtime), so this would be a tiered setup with a read/write GIL
> and multiple SILs. For the time being though, a single GIL remains
> much easier to manage.
>
>> 2. PyObject's are tied to a particular interpreter and cannot be
>> shared between interpreters.
>>    (This is because each interpreter now has its own GIL.)
>>    I imagine a special debug build would actually store the
>> interpreter pointer in the PyObject and would assert everywhere
>>    that the PyObject is only manipulated by its owning interpreter.
>
> Yes, something like Rust's ownership model is the gist of what we had
> in mind (i.e. allowing zero-copy transfer of ownership between
> subinterpreters, but only the owning interpreter is allowed to do
> anything else with the object).
>
>> 3. Practically all existing APIs, including Py_INCREF and Py_DECREF,
>> need to get an additional explicit interpreter argument.
>>     I imagine that we would have a new prefix, say MPy_, because the
>> existing APIs must be left for backward compatibility.
>
> This isn't necessary, as the active interpreter is already tracked as
> part of the thread local state (otherwise mod_wsgi et al wouldn't work
> at all).
>
>> 4. At most one interpreter can be designated the "main" interpreter.
>>     This is for backward compatibility of existing extension modules ONLY.
>>     All the existing Py_* APIs operate implicitly on this main interpreter.
>
> Yep, this is part of the concept. The PEP 432 draft has more details
> on that: https://www.python.org/dev/peps/pep-0432/#interpreter-initialization-phases
>
>> 5. Extension modules need to explicitly advertise multiple interpreter support.
>>     If they don't, they can only be imported in the main interpreter.
>>     However, in that case they can safely use the existing Py_ APIs.
>
> This is the direction we started moving the with multi-phase
> initialisation PEP for extension modules:
> https://www.python.org/dev/peps/pep-0489/
>
> As Petr noted, the main missing piece there now is the fact that
> object methods (as opposed to module level functions) implemented in C
> currently don't have ready access to the module level state for the
> modules where they're defined.
>
>> 6. Since PyObject's cannot be shared across interpreters, there needs to be an
>>     explicit function which takes a PyObject in interpreter A and constructs a
>>     similar object in interpreter B.
>>
>>     Conceptually this would be equivalent to pickling in A and
>> unpickling in B, but presumably more efficient.
>>     It would use the copyreg registry in a similar way to pickle.
>
> This would be an ownership transfer rather than a copy (which carries
> the implication that all the subinterpreters would still need to share
> a common memory allocator)
>
>> 7.    Extension modules would also be able to register their function
>> for copying custom types across interpreters .
>>       That would allow extension modules to provide custom types where
>> the underlying C object is in fact not copied
>>       but shared between interpreters.
>>       I would imagine we would have a"shared memory" memoryview object
>>       and also Mutex and other locking constructs which would work
>> across interpreters.
>
> We generally don't expect this to be needed given an ownership focused
> approach. Instead, the focus would be on enabling efficient channel
> based communication models that are cost-prohibitive when object
> serialisation is involved.
>
>> 8. Finally, the main application: functionality similar to the current
>> `multiprocessing'  module, but with
>>     multiple interpreters on multiple threads in a single process.
>>     This would presumably be more efficient than `multiprocessing' and
>> also allow extra functionality, since the underlying C objects
>>     can in fact be shared.
>>     (Imagine two interpreters operating in parallel on a single OpenCL context.)
>
> We're not sure how feasible it will be to enable this in general, but
> even without it, zero-copy ownership transfers enable a *lot* of
> interest concurrency models that Python doesn't currently offer great
> primitives to support (they're mainly a matter of using threads in
> certain ways, which means they not only run afoul of the GIL, but you
> also don't get any assistance from the interpreter in strictly
> enforcing object ownership rules).
>
> Cheers,
> Nick.
>
> --
> Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia