[Python-ideas] PEP 550 v2

Sat Aug 19 15:09:30 EDT 2017

Cool to see this on python-ideas.  I'm really looking forward to this PEP 
550 or 521.

On Wednesday, August 16, 2017 at 3:19:29 AM UTC-4, Nathaniel Smith wrote:
>
> On Tue, Aug 15, 2017 at 4:55 PM, Yury Selivanov <yseliv... at gmail.com 
> <javascript:>> wrote: 
> > Hi, 
> > 
> > Here's the PEP 550 version 2. 
>
> Awesome! 
>
> Some of the changes from v1 to v2 might be a bit confusing -- in 
> particular the thing where ExecutionContext is now a stack of 
> LocalContext objects instead of just being a mapping. So here's the 
> big picture as I understand it: 
>
> In discussions on the mailing list and off-line, we realized that the 
> main reason people use "thread locals" is to implement fake dynamic 
> scoping. Of course, generators/async/await mean that currently it's 
> impossible to *really* fake dynamic scoping in Python -- that's what 
> PEP 550 is trying to fix. So PEP 550 v1 essentially added "generator 
> locals" as a refinement of "thread locals". But... it turns out that 
> "generator locals" aren't enough to properly implement dynamic scoping 
> either! So the goal in PEP 550 v2 is to provide semantics strong 
> enough to *really* get this right. 
>
> I wrote up some notes on what I mean by dynamic scoping, and why 
> neither thread-locals nor generator-locals can fake it: 
>
>     
> https://github.com/njsmith/pep-550-notes/blob/master/dynamic-scope.ipynb 
>
> > Specification 
> > ============= 
> > 
> > Execution Context is a mechanism of storing and accessing data specific 
> > to a logical thread of execution.  We consider OS threads, 
> > generators, and chains of coroutines (such as ``asyncio.Task``) 
> > to be variants of a logical thread. 
> > 
> > In this specification, we will use the following terminology: 
> > 
> > * **Local Context**, or LC, is a key/value mapping that stores the 
> >   context of a logical thread. 
>
> If you're more familiar with dynamic scoping, then you can think of an 
> LC as a single dynamic scope... 
>
> > * **Execution Context**, or EC, is an OS-thread-specific dynamic 
> >   stack of Local Contexts. 
>
> ...and an EC as a stack of scopes. Looking up a ContextItem in an EC 
> proceeds by checking the first LC (innermost scope), then if it 
> doesn't find what it's looking for it checks the second LC (the 
> next-innermost scope), etc. 
>
> > ``ContextItem`` objects have the following methods and attributes: 
> > 
> > * ``.description``: read-only description; 
> > 
> > * ``.set(o)`` method: set the value to ``o`` for the context item 
> >   in the execution context. 
> > 
> > * ``.get()`` method: return the current EC value for the context item. 
> >   Context items are initialized with ``None`` when created, so 
> >   this method call never fails. 
>
> Two issues here, that both require some expansion of this API to 
> reveal a *bit* more information about the EC structure. 
>
> 1) For trio's cancel scope use case I described in the last, I 
> actually need some way to read out all the values on the LocalContext 
> stack. (It would also be helpful if there were some fast way to check 
> the depth of the ExecutionContext stack -- or at least tell whether 
> it's 1 deep or more-than-1 deep. I know that any cancel scopes that 
> are in the bottommost LC will always be attached to the given Task, so 
> I can set up the scope->task mapping once and re-use it indefinitely. 
> OTOH for scopes that are stored in higher LCs, I have to check at 
> every yield whether they're currently in effect. And I want to 
> minimize the per-yield workload as much as possible.) 
>
> 2) For classic decimal.localcontext context managers, the idea is 
> still that you save/restore the value, so that you can nest multiple 
> context managers without having to push/pop LCs all the time. But the 
> above API is not actually sufficient to implement a proper 
> save/restore, for a subtle reason: if you do 
>
> ci.set(ci.get()) 
>
> then you just (potentially) moved the value from a lower LC up to the top 
> LC. 
>

I agree with Nathaniel that this is an issue with the current API.  I don't 
think it's a good idea to have set and get methods.  It would be much 
better to reflect the underlying ExecutionContext *stack* in the API by 
exposing a mutating *context manager* on the Context Key object instead of 
set.  For example,

my_context = sys.new_context_key('my_context')

options = my_context.get()
options.some_mutating_method()

with my_context.mutate(options):
    # Do whatever you want with the mutated context
# Now, the context is reverted.

Similarly, instead of 

my_context.set('spam')

you would do

with my_context.mutate('spam'):
    # Do whatever you want with the mutated context
# Now, the context is reverted.

>
> Here's an example of a case where this can produce user-visible effects: 
>
>
> https://github.com/njsmith/pep-550-notes/blob/master/dynamic-scope-on-top-of-pep-550-draft-2.py 
>
> There are probably a bunch of options for fixing this. But basically 
> we need some API that makes it possible to temporarily set a value in 
> the top LC, and then restore that value to what it was before (either 
> the previous value, or 'unset' to unshadow a value in a lower LC). One 
> simple option would be to make the idiom be something like: 
>
> @contextmanager 
> def local_value(new_value): 
>     state = ci.get_local_state() 
>     ci.set(new_value) 
>     try: 
>         yield 
>     finally: 
>         ci.set_local_state(state) 
>
> where 'state' is something like a tuple (ci in EC[-1], 
> EC[-1].get(ci)). A downside with this is that it's a bit error-prone 
> (very easy for an unwary user to accidentally use get/set instead of 
> get_local_state/set_local_state). But I'm sure we can come up with 
> something. 
>
> > Manual Context Management 
> > ------------------------- 
> > 
> > Execution Context is generally managed by the Python interpreter, 
> > but sometimes it is desirable for the user to take the control 
> > over it.  A few examples when this is needed: 
> > 
> > * running a computation in ``concurrent.futures.ThreadPoolExecutor`` 
> >   with the current EC; 
> > 
> > * reimplementing generators with iterators (more on that later); 
> > 
> > * managing contexts in asynchronous frameworks (implement proper 
> >   EC support in ``asyncio.Task`` and ``asyncio.loop.call_soon``.) 
> > 
> > For these purposes we add a set of new APIs (they will be used in 
> > later sections of this specification): 
> > 
> > * ``sys.new_local_context()``: create an empty ``LocalContext`` 
> >   object. 
> > 
> > * ``sys.new_execution_context()``: create an empty 
> >   ``ExecutionContext`` object. 
> > 
> > * Both ``LocalContext`` and ``ExecutionContext`` objects are opaque 
> >   to Python code, and there are no APIs to modify them. 
> > 
> > * ``sys.get_execution_context()`` function.  The function returns a 
> >   copy of the current EC: an ``ExecutionContext`` instance. 
>
> If there are enough of these functions then it might make sense to 
> stick them in their own module instead of adding more stuff to sys. I 
> guess worrying about that can wait until the API details are more firm 
> though. 
>
> >   * If ``coro.cr_local_context`` is an empty ``LocalContext`` object 
> >     that ``coro`` was created with, the interpreter will set 
> >     ``coro.cr_local_context`` to ``None``. 
>
> I like all the ideas in this section, but this specific point feels a 
> bit weird. Coroutine objects need a second hidden field somewhere to 
> keep track of whether the object they end up with is the same one they 
> were created with? 
>
> If I set cr_local_context to something else, and then set it back to 
> the original value, does that trigger the magic await behavior or not? 
> What if I take the initial LocalContext off of one coroutine and 
> attach it to another, does that trigger the magic await behavior? 
>
> Maybe it would make more sense to have two sentinel values: 
> UNINITIALIZED and INHERIT? 
>
> > To enable correct Execution Context propagation into Tasks, the 
> > asynchronous framework needs to assist the interpreter: 
> > 
> > * When ``create_task`` is called, it should capture the current 
> >   execution context with ``sys.get_execution_context()`` and save it 
> >   on the Task object. 
>
> I wonder if it would be useful to have an option to squash this 
> execution context down into a single LocalContext, since we know we'll 
> be using it for a while and once we've copied an ExecutionContext it 
> becomes impossible to tell the difference between one that has lots of 
> internal LocalContexts and one that doesn't. This could also be handy 
> for trio/curio's semantics where they initialize a new task's context 
> to be a shallow copy of the parent task: you could do 
>
> new_task_coro.cr_local_context = get_current_context().squash() 
>
> and then skip having to wrap every send() call in a run_in_context. 
>
> > Generators 
> > ---------- 
> > 
> > Generators in Python, while similar to Coroutines, are used in a 
> > fundamentally different way.  They are producers of data, and 
> > they use ``yield`` expression to suspend/resume their execution. 
> > 
> > A crucial difference between ``await coro`` and ``yield value`` is 
> > that the former expression guarantees that the ``coro`` will be 
> > executed fully, while the latter is producing ``value`` and 
> > suspending the generator until it gets iterated again. 
> > 
> > Generators, similarly to coroutines, have a ``gi_local_context`` 
> > attribute, which is set to an empty Local Context when created. 
> > 
> > Contrary to coroutines though, ``yield from o`` expression in 
> > generators (that are not generator-based coroutines) is semantically 
> > equivalent to ``for v in o: yield v``, therefore the interpreter does 
> > not attempt to control their ``gi_local_context``. 
>
> Hmm. I assume you're simplifying for expository purposes, but 'yield 
> from' isn't the same as 'for v in o: yield v'. In fact PEP 380 says: 
> "Motivation: [...] a piece of code containing a yield cannot be 
> factored out and put into a separate function in the same way as other 
> code. [...] If yielding of values is the only concern, this can be 
> performed without much difficulty using a loop such as 'for v in g: 
> yield v'. However, if the subgenerator is to interact properly with 
> the caller in the case of calls to send(), throw() and close(), things 
> become considerably more difficult. As will be seen later, the 
> necessary code is very complicated, and it is tricky to handle all the 
> corner cases correctly." 
>
> So it seems to me that the whole idea of 'yield from' is that it's 
> supposed to handle all the tricky bits needed to guarantee that if you 
> take some code out of a generator and refactor it into a subgenerator, 
> then everything works the same as before. This suggests that 'yield 
> from' should do the same magic as 'await', where by default the 
> subgenerator shares the same LocalContext as the parent generator. 
> (And as a bonus it makes things simpler if 'yield from' and 'await' 
> work the same.) 
>
> > Asynchronous Generators 
> > ----------------------- 
> > 
> > Asynchronous Generators (AG) interact with the Execution Context 
> > similarly to regular generators. 
> > 
> > They have an ``ag_local_context`` attribute, which, similarly to 
> > regular generators, can be set to ``None`` to make them use the outer 
> > Local Context.  This is used by the new 
> > ``contextlib.asynccontextmanager`` decorator. 
> > 
> > The EC support of ``await`` expression is implemented using the same 
> > approach as in coroutines, see the `Coroutine Object Modifications`_ 
> > section. 
>
> You showed how to make an iterator that acts like a generator. Is it 
> also possible to make an async iterator that acts like an async 
> generator? It's not immediately obvious, because you need to make sure 
> that the local context gets restored each time you re-enter the 
> __anext__ generator. I think it's something like: 
>
> class AIter: 
>     def __init__(self): 
>         self._local_context = ... 
>
>     # Note: intentionally not async 
>     def __anext__(self): 
>         coro = self._real_anext() 
>         coro.cr_local_context = self._local_context 
>         return coro 
>
>     async def _real_anext(self): 
>         ... 
>
> Does that look right? 
>
> > ContextItem.get() Cache 
> > ----------------------- 
> > 
> > We can add three new fields to ``PyThreadState`` and 
> > ``PyInterpreterState`` structs: 
> > 
> > * ``uint64_t PyThreadState->unique_id``: a globally unique 
> >   thread state identifier (we can add a counter to 
> >   ``PyInterpreterState`` and increment it when a new thread state is 
> >   created.) 
> > 
> > * ``uint64_t PyInterpreterState->context_item_deallocs``: every time 
> >   a ``ContextItem`` is GCed, all Execution Contexts in all threads 
> >   will lose track of it.  ``context_item_deallocs`` will simply 
> >   count all ``ContextItem`` deallocations. 
> > 
> > * ``uint64_t PyThreadState->execution_context_ver``: every time 
> >   a new item is set, or an existing item is updated, or the stack 
> >   of execution contexts is changed in the thread, we increment this 
> >   counter. 
>
> I think this can be refined further (and I don't understand 
> context_item_deallocs -- maybe it's a mistake?). AFAICT the things 
> that invalidate a ContextItem's cache are: 
>
> 1) switching threadstates 
> 2) popping or pushing a non-empty LocalContext off the current 
> threadstate's ExecutionContext 
> 3) calling ContextItem.set() on *that* context item 
>
> So I'd suggest tracking the thread state id, a counter of how many 
> non-empty LocalContexts have been pushed/popped on this thread state, 
> and a *per ContextItem* counter of how many times set() has been 
> called. 
>
> > Backwards Compatibility 
> > ======================= 
> > 
> > This proposal preserves 100% backwards compatibility. 
>
> While this is mostly true in the strict sense, in practice this PEP is 
> useless if existing thread-local users like decimal and numpy can't 
> migrate to it without breaking backcompat. So maybe this section 
> should discuss that? 
>
> (For example, one constraint on the design is that we can't provide 
> only a pure push/pop API, even though that's what would be most 
> convenient context managers like decimal.localcontext or 
> numpy.errstate, because we also need to provide some backcompat story 
> for legacy functions like decimal.setcontext and numpy.seterr.) 
>
> -n 
>
> -- 
> Nathaniel J. Smith -- https://vorpus.org 
> _______________________________________________ 
> Python-ideas mailing list 
> Python... at python.org <javascript:> 
> https://mail.python.org/mailman/listinfo/python-ideas 
> Code of Conduct: http://python.org/psf/codeofconduct/ 
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20170819/0a1e24e0/attachment-0001.html>