[Python-ideas] PEP 550 v2
Neil Girdhar
mistersheik at gmail.com
Sat Aug 19 15:09:30 EDT 2017
Cool to see this on python-ideas. I'm really looking forward to this PEP
550 or 521.
On Wednesday, August 16, 2017 at 3:19:29 AM UTC-4, Nathaniel Smith wrote:
>
> On Tue, Aug 15, 2017 at 4:55 PM, Yury Selivanov <yseliv... at gmail.com
> <javascript:>> wrote:
> > Hi,
> >
> > Here's the PEP 550 version 2.
>
> Awesome!
>
> Some of the changes from v1 to v2 might be a bit confusing -- in
> particular the thing where ExecutionContext is now a stack of
> LocalContext objects instead of just being a mapping. So here's the
> big picture as I understand it:
>
> In discussions on the mailing list and off-line, we realized that the
> main reason people use "thread locals" is to implement fake dynamic
> scoping. Of course, generators/async/await mean that currently it's
> impossible to *really* fake dynamic scoping in Python -- that's what
> PEP 550 is trying to fix. So PEP 550 v1 essentially added "generator
> locals" as a refinement of "thread locals". But... it turns out that
> "generator locals" aren't enough to properly implement dynamic scoping
> either! So the goal in PEP 550 v2 is to provide semantics strong
> enough to *really* get this right.
>
> I wrote up some notes on what I mean by dynamic scoping, and why
> neither thread-locals nor generator-locals can fake it:
>
>
> https://github.com/njsmith/pep-550-notes/blob/master/dynamic-scope.ipynb
>
> > Specification
> > =============
> >
> > Execution Context is a mechanism of storing and accessing data specific
> > to a logical thread of execution. We consider OS threads,
> > generators, and chains of coroutines (such as ``asyncio.Task``)
> > to be variants of a logical thread.
> >
> > In this specification, we will use the following terminology:
> >
> > * **Local Context**, or LC, is a key/value mapping that stores the
> > context of a logical thread.
>
> If you're more familiar with dynamic scoping, then you can think of an
> LC as a single dynamic scope...
>
> > * **Execution Context**, or EC, is an OS-thread-specific dynamic
> > stack of Local Contexts.
>
> ...and an EC as a stack of scopes. Looking up a ContextItem in an EC
> proceeds by checking the first LC (innermost scope), then if it
> doesn't find what it's looking for it checks the second LC (the
> next-innermost scope), etc.
>
> > ``ContextItem`` objects have the following methods and attributes:
> >
> > * ``.description``: read-only description;
> >
> > * ``.set(o)`` method: set the value to ``o`` for the context item
> > in the execution context.
> >
> > * ``.get()`` method: return the current EC value for the context item.
> > Context items are initialized with ``None`` when created, so
> > this method call never fails.
>
> Two issues here, that both require some expansion of this API to
> reveal a *bit* more information about the EC structure.
>
> 1) For trio's cancel scope use case I described in the last, I
> actually need some way to read out all the values on the LocalContext
> stack. (It would also be helpful if there were some fast way to check
> the depth of the ExecutionContext stack -- or at least tell whether
> it's 1 deep or more-than-1 deep. I know that any cancel scopes that
> are in the bottommost LC will always be attached to the given Task, so
> I can set up the scope->task mapping once and re-use it indefinitely.
> OTOH for scopes that are stored in higher LCs, I have to check at
> every yield whether they're currently in effect. And I want to
> minimize the per-yield workload as much as possible.)
>
> 2) For classic decimal.localcontext context managers, the idea is
> still that you save/restore the value, so that you can nest multiple
> context managers without having to push/pop LCs all the time. But the
> above API is not actually sufficient to implement a proper
> save/restore, for a subtle reason: if you do
>
> ci.set(ci.get())
>
> then you just (potentially) moved the value from a lower LC up to the top
> LC.
>
I agree with Nathaniel that this is an issue with the current API. I don't
think it's a good idea to have set and get methods. It would be much
better to reflect the underlying ExecutionContext *stack* in the API by
exposing a mutating *context manager* on the Context Key object instead of
set. For example,
my_context = sys.new_context_key('my_context')
options = my_context.get()
options.some_mutating_method()
with my_context.mutate(options):
# Do whatever you want with the mutated context
# Now, the context is reverted.
Similarly, instead of
my_context.set('spam')
you would do
with my_context.mutate('spam'):
# Do whatever you want with the mutated context
# Now, the context is reverted.
>
> Here's an example of a case where this can produce user-visible effects:
>
>
> https://github.com/njsmith/pep-550-notes/blob/master/dynamic-scope-on-top-of-pep-550-draft-2.py
>
> There are probably a bunch of options for fixing this. But basically
> we need some API that makes it possible to temporarily set a value in
> the top LC, and then restore that value to what it was before (either
> the previous value, or 'unset' to unshadow a value in a lower LC). One
> simple option would be to make the idiom be something like:
>
> @contextmanager
> def local_value(new_value):
> state = ci.get_local_state()
> ci.set(new_value)
> try:
> yield
> finally:
> ci.set_local_state(state)
>
> where 'state' is something like a tuple (ci in EC[-1],
> EC[-1].get(ci)). A downside with this is that it's a bit error-prone
> (very easy for an unwary user to accidentally use get/set instead of
> get_local_state/set_local_state). But I'm sure we can come up with
> something.
>
> > Manual Context Management
> > -------------------------
> >
> > Execution Context is generally managed by the Python interpreter,
> > but sometimes it is desirable for the user to take the control
> > over it. A few examples when this is needed:
> >
> > * running a computation in ``concurrent.futures.ThreadPoolExecutor``
> > with the current EC;
> >
> > * reimplementing generators with iterators (more on that later);
> >
> > * managing contexts in asynchronous frameworks (implement proper
> > EC support in ``asyncio.Task`` and ``asyncio.loop.call_soon``.)
> >
> > For these purposes we add a set of new APIs (they will be used in
> > later sections of this specification):
> >
> > * ``sys.new_local_context()``: create an empty ``LocalContext``
> > object.
> >
> > * ``sys.new_execution_context()``: create an empty
> > ``ExecutionContext`` object.
> >
> > * Both ``LocalContext`` and ``ExecutionContext`` objects are opaque
> > to Python code, and there are no APIs to modify them.
> >
> > * ``sys.get_execution_context()`` function. The function returns a
> > copy of the current EC: an ``ExecutionContext`` instance.
>
> If there are enough of these functions then it might make sense to
> stick them in their own module instead of adding more stuff to sys. I
> guess worrying about that can wait until the API details are more firm
> though.
>
> > * If ``coro.cr_local_context`` is an empty ``LocalContext`` object
> > that ``coro`` was created with, the interpreter will set
> > ``coro.cr_local_context`` to ``None``.
>
> I like all the ideas in this section, but this specific point feels a
> bit weird. Coroutine objects need a second hidden field somewhere to
> keep track of whether the object they end up with is the same one they
> were created with?
>
> If I set cr_local_context to something else, and then set it back to
> the original value, does that trigger the magic await behavior or not?
> What if I take the initial LocalContext off of one coroutine and
> attach it to another, does that trigger the magic await behavior?
>
> Maybe it would make more sense to have two sentinel values:
> UNINITIALIZED and INHERIT?
>
> > To enable correct Execution Context propagation into Tasks, the
> > asynchronous framework needs to assist the interpreter:
> >
> > * When ``create_task`` is called, it should capture the current
> > execution context with ``sys.get_execution_context()`` and save it
> > on the Task object.
>
> I wonder if it would be useful to have an option to squash this
> execution context down into a single LocalContext, since we know we'll
> be using it for a while and once we've copied an ExecutionContext it
> becomes impossible to tell the difference between one that has lots of
> internal LocalContexts and one that doesn't. This could also be handy
> for trio/curio's semantics where they initialize a new task's context
> to be a shallow copy of the parent task: you could do
>
> new_task_coro.cr_local_context = get_current_context().squash()
>
> and then skip having to wrap every send() call in a run_in_context.
>
> > Generators
> > ----------
> >
> > Generators in Python, while similar to Coroutines, are used in a
> > fundamentally different way. They are producers of data, and
> > they use ``yield`` expression to suspend/resume their execution.
> >
> > A crucial difference between ``await coro`` and ``yield value`` is
> > that the former expression guarantees that the ``coro`` will be
> > executed fully, while the latter is producing ``value`` and
> > suspending the generator until it gets iterated again.
> >
> > Generators, similarly to coroutines, have a ``gi_local_context``
> > attribute, which is set to an empty Local Context when created.
> >
> > Contrary to coroutines though, ``yield from o`` expression in
> > generators (that are not generator-based coroutines) is semantically
> > equivalent to ``for v in o: yield v``, therefore the interpreter does
> > not attempt to control their ``gi_local_context``.
>
> Hmm. I assume you're simplifying for expository purposes, but 'yield
> from' isn't the same as 'for v in o: yield v'. In fact PEP 380 says:
> "Motivation: [...] a piece of code containing a yield cannot be
> factored out and put into a separate function in the same way as other
> code. [...] If yielding of values is the only concern, this can be
> performed without much difficulty using a loop such as 'for v in g:
> yield v'. However, if the subgenerator is to interact properly with
> the caller in the case of calls to send(), throw() and close(), things
> become considerably more difficult. As will be seen later, the
> necessary code is very complicated, and it is tricky to handle all the
> corner cases correctly."
>
> So it seems to me that the whole idea of 'yield from' is that it's
> supposed to handle all the tricky bits needed to guarantee that if you
> take some code out of a generator and refactor it into a subgenerator,
> then everything works the same as before. This suggests that 'yield
> from' should do the same magic as 'await', where by default the
> subgenerator shares the same LocalContext as the parent generator.
> (And as a bonus it makes things simpler if 'yield from' and 'await'
> work the same.)
>
> > Asynchronous Generators
> > -----------------------
> >
> > Asynchronous Generators (AG) interact with the Execution Context
> > similarly to regular generators.
> >
> > They have an ``ag_local_context`` attribute, which, similarly to
> > regular generators, can be set to ``None`` to make them use the outer
> > Local Context. This is used by the new
> > ``contextlib.asynccontextmanager`` decorator.
> >
> > The EC support of ``await`` expression is implemented using the same
> > approach as in coroutines, see the `Coroutine Object Modifications`_
> > section.
>
> You showed how to make an iterator that acts like a generator. Is it
> also possible to make an async iterator that acts like an async
> generator? It's not immediately obvious, because you need to make sure
> that the local context gets restored each time you re-enter the
> __anext__ generator. I think it's something like:
>
> class AIter:
> def __init__(self):
> self._local_context = ...
>
> # Note: intentionally not async
> def __anext__(self):
> coro = self._real_anext()
> coro.cr_local_context = self._local_context
> return coro
>
> async def _real_anext(self):
> ...
>
> Does that look right?
>
> > ContextItem.get() Cache
> > -----------------------
> >
> > We can add three new fields to ``PyThreadState`` and
> > ``PyInterpreterState`` structs:
> >
> > * ``uint64_t PyThreadState->unique_id``: a globally unique
> > thread state identifier (we can add a counter to
> > ``PyInterpreterState`` and increment it when a new thread state is
> > created.)
> >
> > * ``uint64_t PyInterpreterState->context_item_deallocs``: every time
> > a ``ContextItem`` is GCed, all Execution Contexts in all threads
> > will lose track of it. ``context_item_deallocs`` will simply
> > count all ``ContextItem`` deallocations.
> >
> > * ``uint64_t PyThreadState->execution_context_ver``: every time
> > a new item is set, or an existing item is updated, or the stack
> > of execution contexts is changed in the thread, we increment this
> > counter.
>
> I think this can be refined further (and I don't understand
> context_item_deallocs -- maybe it's a mistake?). AFAICT the things
> that invalidate a ContextItem's cache are:
>
> 1) switching threadstates
> 2) popping or pushing a non-empty LocalContext off the current
> threadstate's ExecutionContext
> 3) calling ContextItem.set() on *that* context item
>
> So I'd suggest tracking the thread state id, a counter of how many
> non-empty LocalContexts have been pushed/popped on this thread state,
> and a *per ContextItem* counter of how many times set() has been
> called.
>
> > Backwards Compatibility
> > =======================
> >
> > This proposal preserves 100% backwards compatibility.
>
> While this is mostly true in the strict sense, in practice this PEP is
> useless if existing thread-local users like decimal and numpy can't
> migrate to it without breaking backcompat. So maybe this section
> should discuss that?
>
> (For example, one constraint on the design is that we can't provide
> only a pure push/pop API, even though that's what would be most
> convenient context managers like decimal.localcontext or
> numpy.errstate, because we also need to provide some backcompat story
> for legacy functions like decimal.setcontext and numpy.seterr.)
>
> -n
>
> --
> Nathaniel J. Smith -- https://vorpus.org
> _______________________________________________
> Python-ideas mailing list
> Python... at python.org <javascript:>
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20170819/0a1e24e0/attachment-0001.html>
More information about the Python-ideas
mailing list