[Python-ideas] Exposing CPython's subinterpreter C-API in the stdlib.
Stephan Houben
stephanh42 at gmail.com
Fri May 26 09:49:26 EDT 2017
Hi Nick,
As far as I understand, the (to me) essential difference between your
approach and my proposal is that:
Approach 1 (PEP-489):
* Single (global) GIL.
* PyObject's may be shared across interpreters (zero-copy transfer)
Approach 2 (mine)
* Per-interpreter GIL.
* PyObject's must be copied across interpreters.
To me, the per-interpreter GIL is the essential "target" I am aiming for,
and I am willing to sacrifice the zero-copy for that.
If the GIL is still shared then I don't see much advantage of this
approach over just using the "threading" module
with a single interpreter.
(I realize it still gives you some isolation between interpreters.
To me personally this is not very interesting, but this may be myopic.)
> For the time being though, a single GIL remains
> much easier to manage.
"For the time being" suggests that you are intending approach 1 to be
ultimately a stepping stone to
something similar to approach 2?
> Yes, something like Rust's ownership model is the gist of what we had
> in mind (i.e. allowing zero-copy transfer of ownership between
> subinterpreters, but only the owning interpreter is allowed to do
> anything else with the object).
This can be emulated in approach 2 by creating a wrapper C-level type which
contains a PyObject and its corresponding interpreter. So that interpreter A can
reference an object in interpreter B.
>> 3. Practically all existing APIs, including Py_INCREF and Py_DECREF,
>> need to get an additional explicit interpreter argument.
>> I imagine that we would have a new prefix, say MPy_, because the
>> existing APIs must be left for backward compatibility.
>
> This isn't necessary, as the active interpreter is already tracked as
> part of the thread local state (otherwise mod_wsgi et al wouldn't work
> at all).
I realize that it is possible to that it that way.
However this has some disadvantages:
* The interpreter becomes tied to a thread, or you need to have some
way to switch interpeters on a thread.
(Which makes your code look like OpenGL code;-) )
* Once you are going to write code which manipulates objects in
multiple interpreters
(e.g. my proposed copy function or the "foreign interpreter
wrapper" I discussed above)
making the interpreter explicit probably avoids headaches.
* Explicit is better than implicit, as somebody once said. ;-)
Stephan
2017-05-26 15:17 GMT+02:00 Nick Coghlan <ncoghlan at gmail.com>:
> On 26 May 2017 at 22:08, Stephan Houben <stephanh42 at gmail.com> wrote:
>> Hi all,
>>
>> Personally I feel that the current subinterpreter support falls short
>> in the sense that it still requires
>> a single GIL across interpreters.
>>
>> If interpreters would have their own individual GIL,
>> we could have true shared-nothing multi-threaded support similar to
>> Javascript's "Web Workers".
>>
>> Here is a point-wise overview of what I am imagining.
>> I realize the following is very ambitious, but I would like to bring
>> it to your consideration.
>>
>> 1. Multiple interpreters can be instantiated, each of which is
>> completely independent.
>> To this end, all global interpreter state needs to go into an
>> interpreter strucutre, including the GIL
>> (which becomes per-interpreter)
>> Interpreters share no state whatsoever.
>
> There'd still be true process global state (i.e. anything managed by
> the C runtime), so this would be a tiered setup with a read/write GIL
> and multiple SILs. For the time being though, a single GIL remains
> much easier to manage.
>
>> 2. PyObject's are tied to a particular interpreter and cannot be
>> shared between interpreters.
>> (This is because each interpreter now has its own GIL.)
>> I imagine a special debug build would actually store the
>> interpreter pointer in the PyObject and would assert everywhere
>> that the PyObject is only manipulated by its owning interpreter.
>
> Yes, something like Rust's ownership model is the gist of what we had
> in mind (i.e. allowing zero-copy transfer of ownership between
> subinterpreters, but only the owning interpreter is allowed to do
> anything else with the object).
>
>> 3. Practically all existing APIs, including Py_INCREF and Py_DECREF,
>> need to get an additional explicit interpreter argument.
>> I imagine that we would have a new prefix, say MPy_, because the
>> existing APIs must be left for backward compatibility.
>
> This isn't necessary, as the active interpreter is already tracked as
> part of the thread local state (otherwise mod_wsgi et al wouldn't work
> at all).
>
>> 4. At most one interpreter can be designated the "main" interpreter.
>> This is for backward compatibility of existing extension modules ONLY.
>> All the existing Py_* APIs operate implicitly on this main interpreter.
>
> Yep, this is part of the concept. The PEP 432 draft has more details
> on that: https://www.python.org/dev/peps/pep-0432/#interpreter-initialization-phases
>
>> 5. Extension modules need to explicitly advertise multiple interpreter support.
>> If they don't, they can only be imported in the main interpreter.
>> However, in that case they can safely use the existing Py_ APIs.
>
> This is the direction we started moving the with multi-phase
> initialisation PEP for extension modules:
> https://www.python.org/dev/peps/pep-0489/
>
> As Petr noted, the main missing piece there now is the fact that
> object methods (as opposed to module level functions) implemented in C
> currently don't have ready access to the module level state for the
> modules where they're defined.
>
>> 6. Since PyObject's cannot be shared across interpreters, there needs to be an
>> explicit function which takes a PyObject in interpreter A and constructs a
>> similar object in interpreter B.
>>
>> Conceptually this would be equivalent to pickling in A and
>> unpickling in B, but presumably more efficient.
>> It would use the copyreg registry in a similar way to pickle.
>
> This would be an ownership transfer rather than a copy (which carries
> the implication that all the subinterpreters would still need to share
> a common memory allocator)
>
>> 7. Extension modules would also be able to register their function
>> for copying custom types across interpreters .
>> That would allow extension modules to provide custom types where
>> the underlying C object is in fact not copied
>> but shared between interpreters.
>> I would imagine we would have a"shared memory" memoryview object
>> and also Mutex and other locking constructs which would work
>> across interpreters.
>
> We generally don't expect this to be needed given an ownership focused
> approach. Instead, the focus would be on enabling efficient channel
> based communication models that are cost-prohibitive when object
> serialisation is involved.
>
>> 8. Finally, the main application: functionality similar to the current
>> `multiprocessing' module, but with
>> multiple interpreters on multiple threads in a single process.
>> This would presumably be more efficient than `multiprocessing' and
>> also allow extra functionality, since the underlying C objects
>> can in fact be shared.
>> (Imagine two interpreters operating in parallel on a single OpenCL context.)
>
> We're not sure how feasible it will be to enable this in general, but
> even without it, zero-copy ownership transfers enable a *lot* of
> interest concurrency models that Python doesn't currently offer great
> primitives to support (they're mainly a matter of using threads in
> certain ways, which means they not only run afoul of the GIL, but you
> also don't get any assistance from the interpreter in strictly
> enforcing object ownership rules).
>
> Cheers,
> Nick.
>
> --
> Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia
More information about the Python-ideas
mailing list