[Python-Dev] PEP 554 v3 (new interpreters module)

Thu Sep 14 20:44:40 EDT 2017

On 14 September 2017 at 15:27, Nathaniel Smith <njs at pobox.com> wrote:
> On Sep 13, 2017 9:01 PM, "Nick Coghlan" <ncoghlan at gmail.com> wrote:
>
> On 14 September 2017 at 11:44, Eric Snow <ericsnowcurrently at gmail.com>
> wrote:
>>    send(obj):
>>
>>        Send the object to the receiving end of the channel.  Wait until
>>        the object is received.  If the channel does not support the
>>        object then TypeError is raised.  Currently only bytes are
>>        supported.  If the channel has been closed then EOFError is
>>        raised.
>
> I still expect any form of object sharing to hinder your
> per-interpreter GIL efforts, so restricting the initial implementation
> to memoryview-only seems more future-proof to me.
>
>
> I don't get it. With bytes, you can either share objects or copy them and
> the user can't tell the difference, so you can change your mind later if you
> want.
> But memoryviews require some kind of cross-interpreter strong
> reference to keep the underlying buffer object alive. So if you want to
> minimize object sharing, surely bytes are more future-proof.

Not really, because the only way to ensure object separation (i.e no
refcounted objects accessible from multiple interpreters at once) with
a bytes-based API would be to either:

1. Always copy (eliminating most of the low overhead communications
benefits that subinterpreters may offer over multiple processes)
2. Make the bytes implementation more complicated by allowing multiple
bytes objects to share the same underlying storage while presenting as
distinct objects in different interpreters
3. Make the output on the receiving side not actually a bytes object,
but instead a view onto memory owned by another object in a different
interpreter (a "memory view", one might say)

And yes, using memory views for this does mean defining either a
subclass or a mediating object that not only keeps the originating
object alive until the receiving memoryview is closed, but also
retains a reference to the originating interpreter so that it can
switch to it when it needs to manipulate the source object's refcount
or call one of the buffer methods.

Yury and I are fine with that, since it means that either the sender
*or* the receiver can decide to copy the data (e.g. by calling
bytes(obj) before sending, or bytes(view) after receiving), and in the
meantime, the object holding the cross-interpreter view knows that it
needs to switch interpreters (and hence acquire the sending
interpreter's GIL) before doing anything with the source object.

The reason we're OK with this is that it means that only reading a new
message from a channel (i.e creating a cross-interpreter view) or
discarding a previously read message (i.e. closing a cross-interpreter
view) will be synchronisation points where the receiving interpreter
necessarily needs to acquire the sending interpreter's GIL.

By contrast, if we allow an actual bytes object to be shared, then
either every INCREF or DECREF on that bytes object becomes a
synchronisation point, or else we end up needing some kind of
secondary per-interpreter refcount where the interpreter doesn't drop
its shared reference to the original object in its source interpreter
until the internal refcount in the borrowing interpreter drops to
zero.

>> Handling an exception
>> ---------------------
> It would also be reasonable to simply not return any value/exception from
> run() at all, or maybe just a bool for whether there was an unhandled
> exception. Any high level API is going to be injecting code on both sides of
> the interpreter boundary anyway, so it can do whatever exception and
> traceback translation it wants to.

So any more detailed response would *have* to come back as a channel message?

That sounds like a reasonable option to me, too, especially since
module level code doesn't have a return value as such - you can really
only say "it raised an exception (and this was the exception it
raised)" or "it reached the end of the code without raising an
exception".

Given that, I think subprocess.run() (with check=False) is the right
API precedent here:
https://docs.python.org/3/library/subprocess.html#subprocess.run

That always returns subprocess.CompletedProcess, and then you can call
"cp.check_returncode()" to get it to raise
subprocess.CalledProcessError for non-zero return codes.

For interpreter.run(), we could keep the initial RunResult *really*
simple and only report back:

* source: the source code passed to run()
* shared: the keyword args passed to run() (name chosen to match
functools.partial)
* completed: completed execution without raising an exception? (True
if yes, False otherwise)

Whether or not to report more details for a raised exception, and
provide some mechanism to reraise it in the calling interpreter could
then be deferred until later.

The subprocess.run() comparison does make me wonder whether this might
be a more future-proof signature for Interpreter.run() though:

    def run(source_str, /, *, channels=None):
        ...

That way channels can be a namespace *specifically* for passing in
channels, and can be reported as such on RunResult. If we decide to
allow arbitrary shared objects in the future, or add flag options like
"reraise=True" to reraise exceptions from the subinterpreter in the
current interpreter, we'd have that ability, rather than having the
entire potential keyword namespace taken up for passing shared
objects.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia