[Python-ideas] PEP 550 v2

Wed Aug 16 12:36:24 EDT 2017

On Wed, Aug 16, 2017 at 3:18 AM, Nathaniel Smith <njs at pobox.com> wrote:
> On Tue, Aug 15, 2017 at 4:55 PM, Yury Selivanov <yselivanov.ml at gmail.com> wrote:
>> Hi,
>>
>> Here's the PEP 550 version 2.
>
> Awesome!

Thanks!

[..]
>>
>> * **Local Context**, or LC, is a key/value mapping that stores the
>>   context of a logical thread.
>
> If you're more familiar with dynamic scoping, then you can think of an
> LC as a single dynamic scope...
>
>> * **Execution Context**, or EC, is an OS-thread-specific dynamic
>>   stack of Local Contexts.
>
> ...and an EC as a stack of scopes. Looking up a ContextItem in an EC
> proceeds by checking the first LC (innermost scope), then if it
> doesn't find what it's looking for it checks the second LC (the
> next-innermost scope), etc.

Yes. We touched upon this topic in parallel threads, so I'll just
briefly mention this here: I deliberately avoided using "scope" in PEP
550 naming, as "scoping" in Python is usually associated with
names/globals/locals/nonlocals etc.  Adding another "level" of scoping
will be very confusing for users (IMO).

>
>> ``ContextItem`` objects have the following methods and attributes:
>>
>> * ``.description``: read-only description;
>>
>> * ``.set(o)`` method: set the value to ``o`` for the context item
>>   in the execution context.
>>
>> * ``.get()`` method: return the current EC value for the context item.
>>   Context items are initialized with ``None`` when created, so
>>   this method call never fails.
>
> Two issues here, that both require some expansion of this API to
> reveal a *bit* more information about the EC structure.
>
> 1) For trio's cancel scope use case I described in the last, I
> actually need some way to read out all the values on the LocalContext
> stack. (It would also be helpful if there were some fast way to check
> the depth of the ExecutionContext stack -- or at least tell whether
> it's 1 deep or more-than-1 deep. I know that any cancel scopes that
> are in the bottommost LC will always be attached to the given Task, so
> I can set up the scope->task mapping once and re-use it indefinitely.
> OTOH for scopes that are stored in higher LCs, I have to check at
> every yield whether they're currently in effect. And I want to
> minimize the per-yield workload as much as possible.)

We can add an API for returning the full stack of values for a CI:

   ContextItem.iter_stack() -> Iterator
   # or
   ContextItem.get_stack() -> List

Because some of the LC will be empty, what you'll get is a list with
some None values in it, like:

   [None, val1, None, None, val2]

The length of the list will tell you how deep the stack is.

>
> 2) For classic decimal.localcontext context managers, the idea is
> still that you save/restore the value, so that you can nest multiple
> context managers without having to push/pop LCs all the time. But the
> above API is not actually sufficient to implement a proper
> save/restore, for a subtle reason: if you do
>
> ci.set(ci.get())
>
> then you just (potentially) moved the value from a lower LC up to the top LC.
>
> Here's an example of a case where this can produce user-visible effects:
>
> https://github.com/njsmith/pep-550-notes/blob/master/dynamic-scope-on-top-of-pep-550-draft-2.py
>
> There are probably a bunch of options for fixing this. But basically
> we need some API that makes it possible to temporarily set a value in
> the top LC, and then restore that value to what it was before (either
> the previous value, or 'unset' to unshadow a value in a lower LC). One
> simple option would be to make the idiom be something like:
>
> @contextmanager
> def local_value(new_value):
>     state = ci.get_local_state()
>     ci.set(new_value)
>     try:
>         yield
>     finally:
>         ci.set_local_state(state)
>
> where 'state' is something like a tuple (ci in EC[-1],
> EC[-1].get(ci)). A downside with this is that it's a bit error-prone
> (very easy for an unwary user to accidentally use get/set instead of
> get_local_state/set_local_state). But I'm sure we can come up with
> something.

Yeah, this is tricky. The main issue is indeed the confusion of what
methods you need to call -- "get/set" or
"get_local_state/set_local_state".

On some level the problem is very similar to regular Python scoping rules:

1. we have local hames
2. we have global names
3. we nave 'nonlocal' modifier

IOW scoping isn't easy, and you need to be conscious of what you do.
It's just that we are so used to these scoping rules that they have a
low cognitive effort for us.

One of the ideas that I have in mind is to add another level of
indirection to separate "global get" from "local set/get":

1. Rename ContextItem to ContextKey (reasoning for that in parallel thread)

2. Remove ContextKey.set() method

3. Add a new ContextKey.value() -> ContextValue

    ck = ContextKey()

    with ck.value() as val:
        val.set(spam)
        yield

or

     val = ck.value()
     val.set(spam)
     try:
          yield
     finally:
          val.clear()

Essentially ContextValue will be the only API to set values in
execution context. ContextKey.get() will be used to get them.

Nathaniel, Nick, what do you guys think?

[..]
>> * ``sys.get_execution_context()`` function.  The function returns a
>>   copy of the current EC: an ``ExecutionContext`` instance.
>
> If there are enough of these functions then it might make sense to
> stick them in their own module instead of adding more stuff to sys. I
> guess worrying about that can wait until the API details are more firm
> though.

I'm OK with this idea -- pystate.c becomes way too crowded.

Maybe we should just put this stuff in _contextlib.c and expose in the
contextlib module.

>
>>   * If ``coro.cr_local_context`` is an empty ``LocalContext`` object
>>     that ``coro`` was created with, the interpreter will set
>>     ``coro.cr_local_context`` to ``None``.
>
> I like all the ideas in this section, but this specific point feels a
> bit weird. Coroutine objects need a second hidden field somewhere to
> keep track of whether the object they end up with is the same one they
> were created with?

Yes, I planned to have a second hidden field, as Coroutines will have
their cr_local_context set to NULL, and that will be their empty LC.
So a second internal field is needed to disambiguate NULL -- meaning
an "empty context" and NULL meaning "use outside local context".

I omitted this from the PEP to make it a bit easier to digest, as this
seemed to be a low-level implementation detail.

>
> If I set cr_local_context to something else, and then set it back to
> the original value, does that trigger the magic await behavior or not?
> What if I take the initial LocalContext off of one coroutine and
> attach it to another, does that trigger the magic await behavior?
>
> Maybe it would make more sense to have two sentinel values:
> UNINITIALIZED and INHERIT?

All good questions. I don't like sentinels in general, I'd be more OK
with a "gi_isolated_local_context" flag (we're back to square one
here). But I don't think we should add it.

My thinking is that once you start writing to "gi_local_context" --
all bets are off, and you manage this from now on (meaning that some
internal coroutine flag will be set to 1, and the interpreter will
never touch local_context of this coroutine):

1. If you write None -- it means that the generator/coroutine will not
have its own LC.

2. If you write you own LC object -- the generator/coroutine will use it.

>
>> To enable correct Execution Context propagation into Tasks, the
>> asynchronous framework needs to assist the interpreter:
>>
>> * When ``create_task`` is called, it should capture the current
>>   execution context with ``sys.get_execution_context()`` and save it
>>   on the Task object.
>
> I wonder if it would be useful to have an option to squash this
> execution context down into a single LocalContext, since we know we'll
> be using it for a while and once we've copied an ExecutionContext it
> becomes impossible to tell the difference between one that has lots of
> internal LocalContexts and one that doesn't. This could also be handy
> for trio/curio's semantics where they initialize a new task's context
> to be a shallow copy of the parent task: you could do
>
> new_task_coro.cr_local_context = get_current_context().squash()

I think this would be a bit too low-level.  I'd prefer to defer
solving the "squashing" problem until I have a reference
implementation and we can test this.

Essentially, this is an optimization problem--the EC implementation
can just squash the chain itself, when the chain is longer than 5 LCs.
Or something like this.

But exposing this to Python level would be like letting a program to
tinker GCC -O flags  after it's compiled IMO.

[..]
>> Contrary to coroutines though, ``yield from o`` expression in
>> generators (that are not generator-based coroutines) is semantically
>> equivalent to ``for v in o: yield v``, therefore the interpreter does
>> not attempt to control their ``gi_local_context``.
>
> Hmm. I assume you're simplifying for expository purposes, but 'yield
> from' isn't the same as 'for v in o: yield v'. In fact PEP 380 says:
> "Motivation: [...] a piece of code containing a yield cannot be
> factored out and put into a separate function in the same way as other
> code. [...] If yielding of values is the only concern, this can be
> performed without much difficulty using a loop such as 'for v in g:
> yield v'. However, if the subgenerator is to interact properly with
> the caller in the case of calls to send(), throw() and close(), things
> become considerably more difficult. As will be seen later, the
> necessary code is very complicated, and it is tricky to handle all the
> corner cases correctly."
>
> So it seems to me that the whole idea of 'yield from' is that it's
> supposed to handle all the tricky bits needed to guarantee that if you
> take some code out of a generator and refactor it into a subgenerator,
> then everything works the same as before. This suggests that 'yield
> from' should do the same magic as 'await', where by default the
> subgenerator shares the same LocalContext as the parent generator.
> (And as a bonus it makes things simpler if 'yield from' and 'await'
> work the same.)

I see what you are saying here, but 'yield from' for generators is
still different from awaits, as you can partially iterate the
generator and *then* "yield from" from it:

    def foo():
        g = gen()
        val1 = next(g)
        val2 = next(g)
        # do some computation?
        yield from g
        ...

    def gen():
        # messing with EC between yields

In general, I still think that 'yield from g' is semantically
equivalent to 'for i in g: yield i' for most users.

>
>> Asynchronous Generators
>> -----------------------
>>
>> Asynchronous Generators (AG) interact with the Execution Context
>> similarly to regular generators.
>>
>> They have an ``ag_local_context`` attribute, which, similarly to
>> regular generators, can be set to ``None`` to make them use the outer
>> Local Context.  This is used by the new
>> ``contextlib.asynccontextmanager`` decorator.
>>
>> The EC support of ``await`` expression is implemented using the same
>> approach as in coroutines, see the `Coroutine Object Modifications`_
>> section.
>
> You showed how to make an iterator that acts like a generator. Is it
> also possible to make an async iterator that acts like an async
> generator? It's not immediately obvious, because you need to make sure
> that the local context gets restored each time you re-enter the
> __anext__ generator. I think it's something like:
>
> class AIter:
>     def __init__(self):
>         self._local_context = ...
>
>     # Note: intentionally not async
>     def __anext__(self):
>         coro = self._real_anext()
>         coro.cr_local_context = self._local_context
>         return coro
>
>     async def _real_anext(self):
>         ...
>
> Does that look right?

Yes, seems to be correct.

>
>> ContextItem.get() Cache
>> -----------------------
>>
>> We can add three new fields to ``PyThreadState`` and
>> ``PyInterpreterState`` structs:
>>
>> * ``uint64_t PyThreadState->unique_id``: a globally unique
>>   thread state identifier (we can add a counter to
>>   ``PyInterpreterState`` and increment it when a new thread state is
>>   created.)
>>
>> * ``uint64_t PyInterpreterState->context_item_deallocs``: every time
>>   a ``ContextItem`` is GCed, all Execution Contexts in all threads
>>   will lose track of it.  ``context_item_deallocs`` will simply
>>   count all ``ContextItem`` deallocations.
>>
>> * ``uint64_t PyThreadState->execution_context_ver``: every time
>>   a new item is set, or an existing item is updated, or the stack
>>   of execution contexts is changed in the thread, we increment this
>>   counter.
>
> I think this can be refined further (and I don't understand
> context_item_deallocs -- maybe it's a mistake?).

Now that you highlighted the deallocs counter and I thought about it a
bit more I don't think it's needed :) I'll remove it.

> AFAICT the things
> that invalidate a ContextItem's cache are:
>
> 1) switching threadstates
> 2) popping or pushing a non-empty LocalContext off the current
> threadstate's ExecutionContext
> 3) calling ContextItem.set() on *that* context item
>
> So I'd suggest tracking the thread state id, a counter of how many
> non-empty LocalContexts have been pushed/popped on this thread state,
> and a *per ContextItem* counter of how many times set() has been
> called.

Excellent idea, will be in the next version of the PEP.

>
>> Backwards Compatibility
>> =======================
>>
>> This proposal preserves 100% backwards compatibility.
>
> While this is mostly true in the strict sense, in practice this PEP is
> useless if existing thread-local users like decimal and numpy can't
> migrate to it without breaking backcompat. So maybe this section
> should discuss that?

The main purpose of this section is to tell if some parts of the PEP
are breaking some existing code/patterns or if it imposes a
significant performance penalty. PEP 550 does neither of these things.

If decimal/numpy simply switch to using new APIs, everything should
work as expected for them, with the exception that assigning a new
decimal context (without a context manager) will be isolated in
generators. Which I'd consider as a bug fix. We can add a new section
to discuss the specifics.

Yury