[Python-Dev] PEP 567 -- Context Variables

Yury Selivanov yselivanov.ml at gmail.com
Tue Dec 12 12:33:24 EST 2017


Hi,

This is a new proposal to implement context storage in Python.

It's a successor of PEP 550 and builds on some of its API ideas and
datastructures.  Contrary to PEP 550 though, this proposal only focuses
on adding new APIs and implementing support for it in asyncio.  There
are no changes to the interpreter or to the behaviour of generator or
coroutine objects.


PEP: 567
Title: Context Variables
Version: $Revision$
Last-Modified: $Date$
Author: Yury Selivanov <yury at magic.io>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 12-Dec-2017
Python-Version: 3.7
Post-History: 12-Dec-2017


Abstract
========

This PEP proposes the new ``contextvars`` module and a set of new
CPython C APIs to support context variables.  This concept is
similar to thread-local variables but, unlike TLS, it allows
correctly keeping track of values per asynchronous task, e.g.
``asyncio.Task``.

This proposal builds directly upon concepts originally introduced
in :pep:`550`.  The key difference is that this PEP is only concerned
with solving the case for asynchronous tasks, and not generators.
There are no proposed modifications to any built-in types or to the
interpreter.


Rationale
=========

Thread-local variables are insufficient for asynchronous tasks which
execute concurrently in the same OS thread.  Any context manager that
needs to save and restore a context value and uses
``threading.local()``, will have its context values bleed to other
code unexpectedly when used in async/await code.

A few examples where having a working context local storage for
asynchronous code is desired:

* Context managers like decimal contexts and ``numpy.errstate``.

* Request-related data, such as security tokens and request
  data in web applications, language context for ``gettext`` etc.

* Profiling, tracing, and logging in large code bases.


Introduction
============

The PEP proposes a new mechanism for managing context variables.
The key classes involved in this mechanism are ``contextvars.Context``
and ``contextvars.ContextVar``.  The PEP also proposes some policies
for using the mechanism around asynchronous tasks.

The proposed mechanism for accessing context variables uses the
``ContextVar`` class.  A module (such as decimal) that wishes to
store a context variable should:

* declare a module-global variable holding a ``ContextVar`` to
  serve as a "key";

* access the current value via the ``get()`` method on the
  key variable;

* modify the current value via the ``set()`` method on the
  key variable.

The notion of "current value" deserves special consideration:
different asynchronous tasks that exist and execute concurrently
may have different values.  This idea is well-known from thread-local
storage but in this case the locality of the value is not always
necessarily to a thread.  Instead, there is the notion of the
"current ``Context``" which is stored in thread-local storage, and
is accessed via ``contextvars.get_context()`` function.
Manipulation of the current ``Context`` is the responsibility of the
task framework, e.g. asyncio.

A ``Context`` is conceptually a mapping, implemented using an
immutable dictionary.  The ``ContextVar.get()`` method does a
lookup in the current ``Context`` with ``self`` as a key, raising a
``LookupError``  or returning a default value specified in
the constructor.

The ``ContextVar.set(value)`` method clones the current ``Context``,
assigns the ``value`` to it with ``self`` as a key, and sets the
new ``Context`` as a new current.  Because ``Context`` uses an
immutable dictionary, cloning it is O(1).


Specification
=============

A new standard library module ``contextvars`` is added with the
following APIs:

1. ``get_context() -> Context`` function is used to get the current
   ``Context`` object for the current OS thread.

2. ``ContextVar`` class to declare and access context variables.

3. ``Context`` class encapsulates context state.  Every OS thread
   stores a reference to its current ``Context`` instance.
   It is not possible to control that reference manually.
   Instead, the ``Context.run(callable, *args)`` method is used to run
   Python code in another context.


contextvars.ContextVar
----------------------

The ``ContextVar`` class has the following constructor signature:
``ContextVar(name, *, default=no_default)``.  The ``name`` parameter
is used only for introspection and debug purposes.  The ``default``
parameter is optional.  Example::

    # Declare a context variable 'var' with the default value 42.
    var = ContextVar('var', default=42)

``ContextVar.get()`` returns a value for context variable from the
current ``Context``::

    # Get the value of `var`.
    var.get()

``ContextVar.set(value) -> Token`` is used to set a new value for
the context variable in the current ``Context``::

    # Set the variable 'var' to 1 in the current context.
    var.set(1)

``contextvars.Token`` is an opaque object that should be used to
restore the ``ContextVar`` to its previous value, or remove it from
the context if it was not set before.  The ``ContextVar.reset(Token)``
is used for that::

    old = var.set(1)
    try:
        ...
    finally:
        var.reset(old)

The ``Token`` API exists to make the current proposal forward
compatible with :pep:`550`, in case there is demand to support
context variables in generators and asynchronous generators in the
future.

``ContextVar`` design allows for a fast implementation of
``ContextVar.get()``, which is particularly important for modules
like ``decimal`` an ``numpy``.


contextvars.Context
-------------------

``Context`` objects are mappings of ``ContextVar`` to values.

To get the current ``Context`` for the current OS thread, use
``contextvars.get_context()`` method::

    ctx = contextvars.get_context()

To run Python code in some ``Context``, use ``Context.run()``
method::

    ctx.run(function)

Any changes to any context variables that ``function`` causes, will
be contained in the ``ctx`` context::

    var = ContextVar('var')
    var.set('spam')

    def function():
        assert var.get() == 'spam'

        var.set('ham')
        assert var.get() == 'ham'

    ctx = get_context()
    ctx.run(function)

    assert var.get('spam')

Any changes to the context will be contained and persisted in the
``Context`` object on which ``run()`` is called on.

``Context`` objects implement the ``collections.abc.Mapping`` ABC.
This can be used to introspect context objects::

    ctx = contextvars.get_context()

    # Print all context variables in their values in 'ctx':
    print(ctx.items())

    # Print the value of 'some_variable' in context 'ctx':
    print(ctx[some_variable])


asyncio
-------

``asyncio`` uses ``Loop.call_soon()``, ``Loop.call_later()``,
and ``Loop.call_at()`` to schedule the asynchronous execution of a
function.  ``asyncio.Task`` uses ``call_soon()`` to run the
wrapped coroutine.

We modify ``Loop.call_{at,later,soon}`` to accept the new
optional *context* keyword-only argument, which defaults to
the current context::

    def call_soon(self, callback, *args, context=None):
        if context is None:
            context = contextvars.get_context()

        # ... some time later
        context.run(callback, *args)

Tasks in asyncio need to maintain their own isolated context.
``asyncio.Task`` is modified as follows::

    class Task:
        def __init__(self, coro):
            ...
            # Get the current context snapshot.
            self._context = contextvars.get_context()
            self._loop.call_soon(self._step, context=self._context)

        def _step(self, exc=None):
            ...
            # Every advance of the wrapped coroutine is done in
            # the task's context.
            self._loop.call_soon(self._step, context=self._context)
            ...


CPython C API
-------------

TBD


Implementation
==============

This section explains high-level implementation details in
pseudo-code.  Some optimizations are omitted to keep this section
short and clear.

The internal immutable dictionary for ``Context`` is implemented
using Hash Array Mapped Tries (HAMT).  They allow for O(log N) ``set``
operation, and for O(1) ``get_context()`` function.  For the purposes
of this section, we implement an immutable dictionary using
``dict.copy()``::

    class _ContextData:

        def __init__(self):
            self.__mapping = dict()

        def get(self, key):
            return self.__mapping[key]

        def set(self, key, value):
            copy = _ContextData()
            copy.__mapping = self.__mapping.copy()
            copy.__mapping[key] = value
            return copy

        def delete(self, key):
            copy = _ContextData()
            copy.__mapping = self.__mapping.copy()
            del copy.__mapping[key]
            return copy

Every OS thread has a reference to the current ``_ContextData``.
``PyThreadState`` is updated with a new ``context_data`` field that
points to a ``_ContextData`` object::

    PyThreadState:
        context : _ContextData

``contextvars.get_context()`` is implemented as follows:

    def get_context():
        ts : PyThreadState = PyThreadState_Get()

        if ts.context_data is None:
            ts.context_data = _ContextData()

        ctx = Context()
        ctx.__data = ts.context_data
        return ctx

``contextvars.Context`` is a wrapper around ``_ContextData``::

    class Context(collections.abc.Mapping):

        def __init__(self):
            self.__data = _ContextData()

        def run(self, callable, *args):
            ts : PyThreadState = PyThreadState_Get()
            saved_data : _ContextData = ts.context_data

            try:
                ts.context_data = self.__data
                callable(*args)
            finally:
                self.__data = ts.context_data
                ts.context_data = saved_data

        # Mapping API methods are implemented by delegating
        # `get()` and other Mapping calls to `self.__data`.

``contextvars.ContextVar`` interacts with
``PyThreadState.context_data`` directly::

    class ContextVar:

        def __init__(self, name, *, default=NO_DEFAULT):
            self.__name = name
            self.__default = default

        @property
        def name(self):
            return self.__name

        def get(self, default=NO_DEFAULT):
            ts : PyThreadState = PyThreadState_Get()
            data : _ContextData = ts.context_data

            try:
                return data.get(self)
            except KeyError:
                pass

            if default is not NO_DEFAULT:
                return default

            if self.__default is not NO_DEFAULT:
                return self.__default

            raise LookupError

        def set(self, value):
            ts : PyThreadState = PyThreadState_Get()
            data : _ContextData = ts.context_data

            try:
                old_value = data.get(self)
            except KeyError:
                old_value = NO_VALUE

            ts.context_data = data.set(self, value)
            return Token(self, old_value)

        def reset(self, token):
            if token.__used:
                return

            if token.__old_value is NO_VALUE:
                ts.context_data = data.delete(token.__var)
            else:
                ts.context_data = data.set(token.__var,
                                           token.__old_value)

            token.__used = True


    class Token:

        def __init__(self, var, old_value):
            self.__var = var
            self.__old_value = old_value
            self.__used = False


Backwards Compatibility
=======================

This proposal preserves 100% backwards compatibility.

Libraries that use ``threading.local()`` to store context-related
values, currently work correctly only for synchronous code.  Switching
them to use the proposed API will keep their behavior for synchronous
code unmodified, but will automatically enable support for
asynchronous code.


Appendix: HAMT Performance Analysis
===================================

.. figure:: pep-0550-hamt_vs_dict-v2.png
   :align: center
   :width: 100%

   Figure 1.  Benchmark code can be found here: [1]_.

The above chart demonstrates that:

* HAMT displays near O(1) performance for all benchmarked
  dictionary sizes.

* ``dict.copy()`` becomes very slow around 100 items.

.. figure:: pep-0550-lookup_hamt.png
   :align: center
   :width: 100%

   Figure 2.  Benchmark code can be found here: [2]_.

Figure 2 compares the lookup costs of ``dict`` versus a HAMT-based
immutable mapping.  HAMT lookup time is 30-40% slower than Python dict
lookups on average, which is a very good result, considering that the
latter is very well optimized.

The reference implementation of HAMT for CPython can be found here:
[3]_.


References
==========

.. [1] https://gist.github.com/1st1/9004813d5576c96529527d44c5457dcd

.. [2] https://gist.github.com/1st1/dbe27f2e14c30cce6f0b5fddfc8c437e

.. [3] https://github.com/1st1/cpython/tree/hamt


Copyright
=========

This document has been placed in the public domain.


..
   Local Variables:
   mode: indented-text
   indent-tabs-mode: nil
   sentence-end-double-space: t
   fill-column: 70
   coding: utf-8
   End:


More information about the Python-Dev mailing list