[Python-Dev] PEP 567 v2

Guido van Rossum guido at python.org
Thu Jan 4 19:48:51 EST 2018


Your suggestions sound reasonable, but we are now running into a logistical
problem -- I don't want to decide this unilaterally but Yury is on vacation
until Jan 15. That gives us at most 2 weeks for approval of the PEP and
review + commit of the implementation (
https://github.com/python/cpython/pull/5027) before the 3.7.0 feature
freeze / beta (Jan 29).

Your approach to defaults *may* have another advantage: perhaps we could
get rid of reset() and Token. Code to set and restore a context var could
just obtain the old value with get() and later restore it with set(). OTOH
this would not be entirely transparent, since it would permanently add the
var to the keys of the Context if the var was previously not set.

[Later] Oh, drat! Writing that paragraph made me realize there was a bug in
my own reasoning that led to this question about variables that have no
value: they aren't present in the keys of _ContextData, so my concern about
keys and values being inconsistent was unfounded. So the only reason to
change the situation with defaults would be that it's more convenient.
Though I personally kind of like that you can cause var.get() to raise
LookupError if the value is not set -- just like all other variables.

[Even later] Re: your other suggestion, why couldn't the threadstate
contain just the Context? It seems one would just write
tstate->current_context->_data everywhere instead of
tstate->current_context_data.

On Thu, Jan 4, 2018 at 4:18 PM, Nathaniel Smith <njs at pobox.com> wrote:

> On Thu, Jan 4, 2018 at 8:30 AM, Guido van Rossum <guido at python.org> wrote:
> > On Wed, Jan 3, 2018 at 6:35 PM, Nathaniel Smith <njs at pobox.com> wrote:
> >> - Context is a mutable object representing a mapping
> >> - BUT it doesn't allow mutation through the MutableMapping interface;
> >> instead, the only way to mutate it is by calling Context.run and then
> >> ContextVar.set(). Funneling all 'set' operations through a single
> >> place makes it easier to do clever caching tricks, and it lets us
> >> avoid dealing with operations that we don't want here (like 'del')
> >> just because they happen to be in the MutableMapping interface.
> >> - OTOH we do implement the (read-only) Mapping interface because
> >> there's no harm in it and it's probably useful for debuggers.
> >
> >
> > I think that in essence what Victor saw is a cache consistency issue.
>
> Yeah, that's a good way to think about it.
>
> > If you
> > look at the implementation section in the PEP, the ContextVar.set()
> > operation mutates _ContextData, which is a private (truly) immutable data
> > structure that stands in for the HAMT, and the threadstate contains one
> of
> > these (not a Context). When you call copy_context() you get a fresh
> Context
> > that wraps the current _ContextData. Because the latter is effectively
> > immutable this is a true clone. ctx.run() manipulates the threadstate to
> > make the current _ContextData the one from ctx, then calls the function.
> If
> > the function calls var.set(), this will create a new _ContextData that is
> > stored in the threadstate, but it doesn't update the ctx. This is where
> the
> > current state and ctx go out of sync. Once the function returns or
> raises,
> > run() takes the _ContextData from the threadstate and stuffs it into ctx,
> > resolving the inconsistency. (It then also restores the previous
> > _ContextData that it had saved before any of this started.)
> >
> > So all in all Context is mutable but the only time it is mutated is when
> > run() returns.
> >
> > I think Yury's POV is that you rarely if ever want to introspect a
> Context
> > object that's not freshly obtained from copy_context(). I'm not sure if
> > that's really true; it means that introspecting the context stored in an
> > asyncio.Task may give incorrect results if it's the currently running
> task.
> >
> > Should we declare it a bug? The fix would be complex given the current
> > implementation (either the PEP's pseudo-code or Yury's actual HAMT-based
> > implementation). I think it would involve keeping track of the current
> > Context in the threadstate rather than just the _ContextData, and
> updating
> > the Context object on each var.set() call. And this is something that
> Yury
> > wants to avoid, so that he can do more caching for var.get() (IIUC).
>
> I think the fix is a little bit cumbersome, but straightforward, and
> actually *simplifies* caching. If we track both the _ContextData and
> the Context in the threadstate, then set() becomes something like:
>
> def set(self, value):
>     # These two lines are like the current implementation
>     new_context_data =
> tstate->current_context_data->hamt_clone_with_new_item(key=self,
> value=value)
>     tstate->current_context_data = new_context_data
>     # Update the Context to have the new _ContextData
>     tstate->current_context->data = new_context_data
>     # New caching: instead of tracking integer ids, we just need to
> track the Context object
>     # This relies on the assumption that self->last_value is updated
> every time any Context is mutated
>     self->last_value = value
>     self->last_context = tstate->current_context
>
> And then the caching in get() becomes:
>
> def get(self):
>     if tstate->current_context != self->last_context:
>         # Update cache
>         self->last_value = tstate->current_context_data->hamt_lookup(self)
>         self->last_context = tstate->current_context
>     return self->last_value
>
> (I think the current cache invalidation logic is necessary for a PEP
> 550 implementation, but until/unless we implement that we can get away
> with something simpler.) So I'd say yeah, let's declare it a bug.
>
> If it turns out that I'm wrong and there's some reason this is really
> difficult, then we could consider making introspection on a
> currently-in-use Context raise an error, instead of returning stale
> data. This should be pretty easy, since Contexts already track whether
> they're currently in use (so they can raise an error if you try to use
> the same Context in two threads simultaneously).
>
> > We could also add extra words to the PEP's spec for run() explaining this
> > temporary inconsistency.
> >
> > I think changing the introspection method from Mapping to something
> custom
> > won't fix the basic issue (which is that there's a difference between the
> > Context and the _ContextData, and ContextVar actually only manipulates
> the
> > latter, always accessing it via the threadstate).
> >
> > However there's another problem with the Mapping interface, which is:
> what
> > should it do with variables that are not set and have no default value?
> > Should they be considered to have a value equal to _NO_DEFAULT or
> > Token.MISSING? Or should they be left out of the keys altogether? The PEP
> > hand-waves on this issue (we didn't think of missing values when we made
> the
> > design).
>
> I've been thinking this over, and I don't *think* there are any design
> constraints that force us towards one approach or another, so it's
> just about what's most convenient for users.
>
> My preference for how missing values / defaults / etc. should be
> handled is, Context acts just like a dict that's missing its mutation
> methods, and ContextVar does:
>
> class ContextVar:
>      # Note: default=None instead of default=_MAGIC_SENTINEL_VALUE
>      # If users want to distinguish between unassigned and None, then they
> can
>      # pass their own sentinel value. IME this is very rare though.
>      def __init__(self, name, *, default=None):
>          self.name = name
>          self.default = default
>
>      # Note: no default= argument here, because merging conflicting
> default= values
>      # is inherently confusing, and not really needed.
>      def get(self):
>          return current_context().get(self, self.default)
>
> Rationale:
>
> I've never seen a thread local use case where you wanted *different*
> default values at different calls to getattr. I've seen lots of thread
> local use cases that jumped through hoops to make sure they used the
> same default everywhere, either by defining a wrapper around getattr()
> or by subclassing local to define fallback values.
>
> Likewise, I've seen lots of cases where having to check for whether a
> thread local attribute was actually defined or not was a nuisance, and
> never any where it was important to distinguish between missing and
> None.
>
> But, just in case someone does fine such a case, we should make it
> possible to distinguish. Allowing users to override the default= is
> enough to do this. And the default= argument is also useful on those
> occasions where someone wants a default value other than None, which
> does happen occasionally. For example,
> django.urls.resolvers.RegexURLResolver._local.populating is
> semantically a bool with a default value of False. Currently, it's
> always accessed by writing getattr(_local, "populating", False). With
> this approach it could instead use ContextVar("populating",
> default=False) and then just call get().
>
> Everything I just said is about the ergonomics for ContextVar users,
> so it makes sense to handle all this inside ContextVar.
>
> OTOH, Context is a low-level interface for people writing task
> schedulers and debuggers, so it makes sense to keep it as simple and
> transparent as possible, and "it's just like a dict" is about as
> simple and transparent as it gets.
>
> Also, this way the pseudocode is really really short.
>
> > Should it be possible to introspect a Context that's not the
> > current context?
>
> I think debuggers will definitely want to be able to do things like
> print Context values from arbitrary tasks.
>
> -n
>
> --
> Nathaniel J. Smith -- https://vorpus.org
>



-- 
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20180104/34e5f8e4/attachment-0001.html>


More information about the Python-Dev mailing list