Notice: While Javascript is not essential for this website, your interaction with the content will be limited. Please turn Javascript on for the full experience.

PEP 550 -- Execution Context

PEP:550
Title:Execution Context
Author:Yury Selivanov <yury at magic.io>
Status:Draft
Type:Standards Track
Created:11-Aug-2017
Python-Version:3.7
Post-History:11-Aug-2017, 15-Aug-2017

Abstract

This PEP proposes a new mechanism to manage execution state--the logical environment in which a function, a thread, a generator, or a coroutine executes in.

A few examples of where having a reliable state storage is required:

  • Context managers like decimal contexts, numpy.errstate, and warnings.catch_warnings;
  • Storing request-related data such as security tokens and request data in web applications, implementing i18n;
  • Profiling, tracing, and logging in complex and large code bases.

The usual solution for storing state is to use a Thread-local Storage (TLS), implemented in the standard library as threading.local(). Unfortunately, TLS does not work for the purpose of state isolation for generators or asynchronous code, because such code executes concurrently in a single thread.

Rationale

Traditionally, a Thread-local Storage (TLS) is used for storing the state. However, the major flaw of using the TLS is that it works only for multi-threaded code. It is not possible to reliably contain the state within a generator or a coroutine. For example, consider the following generator:

def calculate(precision, ...):
    with decimal.localcontext() as ctx:
        # Set the precision for decimal calculations
        # inside this block
        ctx.prec = precision

        yield calculate_something()
        yield calculate_something_else()

Decimal context is using a TLS to store the state, and because TLS is not aware of generators, the state can leak. If a user iterates over the calculate() generator with different precisions one by one using a zip() built-in, the above code will not work correctly. For example:

g1 = calculate(precision=100)
g2 = calculate(precision=50)

items = list(zip(g1, g2))

# items[0] will be a tuple of:
#   first value from g1 calculated with 100 precision,
#   first value from g2 calculated with 50 precision.
#
# items[1] will be a tuple of:
#   second value from g1 calculated with 50 precision (!!!),
#   second value from g2 calculated with 50 precision.

An even scarier example would be using decimals to represent money in an async/await application: decimal calculations can suddenly lose precision in the middle of processing a request. Currently, bugs like this are extremely hard to find and fix.

Another common need for web applications is to have access to the current request object, or security context, or, simply, the request URL for logging or submitting performance tracing data:

async def handle_http_request(request):
    context.current_http_request = request

    await ...
    # Invoke your framework code, render templates,
    # make DB queries, etc, and use the global
    # 'current_http_request' in that code.

    # This isn't currently possible to do reliably
    # in asyncio out of the box.

These examples are just a few out of many, where a reliable way to store context data is absolutely needed.

The inability to use TLS for asynchronous code has lead to proliferation of ad-hoc solutions, which are limited in scope and do not support all required use cases.

Current status quo is that any library, including the standard library, that uses a TLS, will likely not work as expected in asynchronous code or with generators (see [3] as an example issue.)

Some languages that have coroutines or generators recommend to manually pass a context object to every function, see [1] describing the pattern for Go. This approach, however, has limited use for Python, where we have a huge ecosystem that was built to work with a TLS-like context. Moreover, passing the context explicitly does not work at all for libraries like decimal or numpy, which use operator overloading.

.NET runtime, which has support for async/await, has a generic solution of this problem, called ExecutionContext (see [2]). On the surface, working with it is very similar to working with a TLS, but the former explicitly supports asynchronous code.

Goals

The goal of this PEP is to provide a more reliable alternative to threading.local(). It should be explicitly designed to work with Python execution model, equally supporting threads, generators, and coroutines.

An acceptable solution for Python should meet the following requirements:

  • Transparent support for code executing in threads, coroutines, and generators with an easy to use API.
  • Negligible impact on the performance of the existing code or the code that will be using the new mechanism.
  • Fast C API for packages like decimal and numpy.

Explicit is still better than implicit, hence the new APIs should only be used when there is no acceptable way of passing the state explicitly.

Specification

Execution Context is a mechanism of storing and accessing data specific to a logical thread of execution. We consider OS threads, generators, and chains of coroutines (such as asyncio.Task) to be variants of a logical thread.

In this specification, we will use the following terminology:

  • Local Context, or LC, is a key/value mapping that stores the context of a logical thread.
  • Execution Context, or EC, is an OS-thread-specific dynamic stack of Local Contexts.
  • Context Item, or CI, is an object used to set and get values from the Execution Context.

Please note that throughout the specification we use simple pseudo-code to illustrate how the EC machinery works. The actual algorithms and data structures that we will use to implement the PEP are discussed in the Implementation Strategy section.

Context Item Object

The sys.new_context_item(description) function creates a new ContextItem object. The description parameter is a str, explaining the nature of the context key for introspection and debugging purposes.

ContextItem objects have the following methods and attributes:

  • .description: read-only description;
  • .set(o) method: set the value to o for the context item in the execution context.
  • .get() method: return the current EC value for the context item. Context items are initialized with None when created, so this method call never fails.

The below is an example of how context items can be used:

my_context = sys.new_context_item(description='mylib.context')
my_context.set('spam')

# Later, to access the value of my_context:
print(my_context.get())

Thread State and Multi-threaded code

Execution Context is implemented on top of Thread-local Storage. For every thread there is a separate stack of Local Contexts -- mappings of ContextItem objects to their values in the LC. New threads always start with an empty EC.

For CPython:

PyThreadState:
    execution_context: ExecutionContext([
        LocalContext({ci1: val1, ci2: val2, ...}),
        ...
    ])

The ContextItem.get() and .set() methods are defined as follows (in pseudo-code):

class ContextItem:

    def get(self):
        tstate = PyThreadState_Get()

        for local_context in reversed(tstate.execution_context):
            if self in local_context:
                return local_context[self]

        return None

    def set(self, value):
        tstate = PyThreadState_Get()

        if not tstate.execution_context:
            tstate.execution_context = [LocalContext()]

        tstate.execution_context[-1][self] = value

With the semantics defined so far, the Execution Context can already be used as an alternative to threading.local():

def print_foo():
    print(ci.get() or 'nothing')

ci = sys.new_context_item(description='test')
ci.set('foo')

# Will print "foo":
print_foo()

# Will print "nothing":
threading.Thread(target=print_foo).start()

Manual Context Management

Execution Context is generally managed by the Python interpreter, but sometimes it is desirable for the user to take the control over it. A few examples when this is needed:

  • running a computation in concurrent.futures.ThreadPoolExecutor with the current EC;
  • reimplementing generators with iterators (more on that later);
  • managing contexts in asynchronous frameworks (implement proper EC support in asyncio.Task and asyncio.loop.call_soon.)

For these purposes we add a set of new APIs (they will be used in later sections of this specification):

  • sys.new_local_context(): create an empty LocalContext object.

  • sys.new_execution_context(): create an empty ExecutionContext object.

  • Both LocalContext and ExecutionContext objects are opaque to Python code, and there are no APIs to modify them.

  • sys.get_execution_context() function. The function returns a copy of the current EC: an ExecutionContext instance.

    The runtime complexity of the actual implementation of this function can be O(1), but for the purposes of this section it is equivalent to:

    def get_execution_context():
        tstate = PyThreadState_Get()
        return copy(tstate.execution_context)
    
  • sys.run_with_execution_context(ec: ExecutionContext, func, *args, **kwargs) runs func(*args, **kwargs) in the provided execution context:

    def run_with_execution_context(ec, func, *args, **kwargs):
        tstate = PyThreadState_Get()
    
        old_ec = tstate.execution_context
    
        tstate.execution_context = ExecutionContext(
            ec.local_contexts + [LocalContext()]
        )
    
        try:
            return func(*args, **kwargs)
        finally:
            tstate.execution_context = old_ec
    

    Any changes to Local Context by func will be ignored. This allows to reuse one ExecutionContext object for multiple invocations of different functions, without them being able to affect each other's environment:

    ci = sys.new_context_item('example')
    ci.set('spam')
    
    def func():
        print(ci.get())
        ci.set('ham')
    
    ec = sys.get_execution_context()
    
    sys.run_with_execution_context(ec, func)
    sys.run_with_execution_context(ec, func)
    
    # Will print:
    #   spam
    #   spam
    
  • sys.run_with_local_context(lc: LocalContext, func, *args, **kwargs) runs func(*args, **kwargs) in the current execution context using the specified local context.

    Any changes that func does to the local context will be persisted in lc. This behaviour is different from the run_with_execution_context() function, which always creates a new throw-away local context.

    In pseudo-code:

    def run_with_local_context(lc, func, *args, **kwargs):
        tstate = PyThreadState_Get()
    
        old_ec = tstate.execution_context
    
        tstate.execution_context = ExecutionContext(
            old_ec.local_contexts + [lc]
        )
    
        try:
            return func(*args, **kwargs)
        finally:
            tstate.execution_context = old_ec
    

    Using the previous example:

    ci = sys.new_context_item('example')
    ci.set('spam')
    
    def func():
        print(ci.get())
        ci.set('ham')
    
    ec = sys.get_execution_context()
    lc = sys.new_local_context()
    
    sys.run_with_local_context(lc, func)
    sys.run_with_local_context(lc, func)
    
    # Will print:
    #   spam
    #   ham
    

As an example, let's make a subclass of concurrent.futures.ThreadPoolExecutor that preserves the execution context for scheduled functions:

class Executor(concurrent.futures.ThreadPoolExecutor):

    def submit(self, fn, *args, **kwargs):
        context = sys.get_execution_context()

        fn = functools.partial(
            sys.run_with_execution_context, context,
            fn, *args, **kwargs)

        return super().submit(fn)

EC Semantics for Coroutines

Python PEP 492 coroutines are used to implement cooperative multitasking. For a Python end-user they are similar to threads, especially when it comes to sharing resources or modifying the global state.

An event loop is needed to schedule coroutines. Coroutines that are explicitly scheduled by the user are usually called Tasks. When a coroutine is scheduled, it can schedule other coroutines using an await expression. In async/await world, awaiting a coroutine is equivalent to a regular function call in synchronous code. Thus, Tasks are similar to threads.

By drawing a parallel between regular multithreaded code and async/await, it becomes apparent that any modification of the execution context within one Task should be visible to all coroutines scheduled within it. Any execution context modifications, however, must not be visible to other Tasks executing within the same OS thread.

Coroutine Object Modifications

To achieve this, a small set of modifications to the coroutine object is needed:

  • New cr_local_context attribute. This attribute is readable and writable for Python code.

  • When a coroutine object is instantiated, its cr_local_context is initialized with an empty Local Context.

  • Coroutine's .send() and .throw() methods are modified as follows (in pseudo-C):

    if coro.cr_local_context is not None:
        tstate = PyThreadState_Get()
    
        tstate.execution_context.push(coro.cr_local_context)
    
        try:
            # Perform the actual `Coroutine.send()` or
            # `Coroutine.throw()` call.
            return coro.send(...)
        finally:
            coro.cr_local_context = tstate.execution_context.pop()
    else:
        # Perform the actual `Coroutine.send()` or
        # `Coroutine.throw()` call.
        return coro.send(...)
    
  • When Python interpreter sees an await instruction, it inspects the cr_local_context attribute of the coroutine that is about to be awaited. For await coro:

    • If coro.cr_local_context is an empty LocalContext object that coro was created with, the interpreter will set coro.cr_local_context to None.
    • If coro.cr_local_context was modified by Python code, the interpreter will leave it as is.

    This makes any changes to execution context made by nested coroutine calls within a Task to be visible throughout the Task:

    ci = sys.new_context_item('example')
    
    async def nested():
        ci.set('nested')
    
    async def main():
        ci.set('main')
        print('before:', ci.get())
        await nested()
        print('after:', ci.get())
    
    # Will print:
    #   before: main
    #   after: nested
    

    Essentially, coroutines work with Execution Context items similarly to threads, and await expression acts like a function call.

    This mechanism also works for yield from in generators decorated with @types.coroutine or @asyncio.coroutine, which are called "generator-based coroutines" according to PEP 492, and should be fully compatible with native async/await coroutines.

Tasks

In asynchronous frameworks like asyncio, coroutines are run by an event loop, and need to be explicitly scheduled (in asyncio coroutines are run by asyncio.Task.)

With the currently defined semantics, the interpreter makes coroutines linked by an await expression share the same Local Context.

The interpreter, however, is not aware of the Task concept, and cannot help with ensuring that new Tasks started in coroutines, use the correct EC:

current_request = sys.new_context_item(description='request')

async def child():
    print('current request:', repr(current_request.get()))

async def handle_request(request):
    current_request.set(request)
    event_loop.create_task(child)

run(top_coro())

# Will print:
#   current_request: None

To enable correct Execution Context propagation into Tasks, the asynchronous framework needs to assist the interpreter:

  • When create_task is called, it should capture the current execution context with sys.get_execution_context() and save it on the Task object.
  • When the Task object runs its coroutine object, it should execute .send() and .throw() methods within the captured execution context, using the sys.run_with_execution_context() function.

With help from the asynchronous framework, the above snippet will run correctly, and the child() coroutine will be able to access the current request object through the current_request Context Item.

Event Loop Callbacks

Similarly to Tasks, functions like asyncio's loop.call_soon() should capture the current execution context with sys.get_execution_context() and execute callbacks within it with sys.run_with_execution_context().

This way the following code will work:

current_request = sys.new_context_item(description='request')

def log():
    request = current_request.get()
    print(request)

async def request_handler(request):
    current_request.set(request)
    get_event_loop.call_soon(log)

Generators

Generators in Python, while similar to Coroutines, are used in a fundamentally different way. They are producers of data, and they use yield expression to suspend/resume their execution.

A crucial difference between await coro and yield value is that the former expression guarantees that the coro will be executed fully, while the latter is producing value and suspending the generator until it gets iterated again.

Generators, similarly to coroutines, have a gi_local_context attribute, which is set to an empty Local Context when created.

Contrary to coroutines though, yield from o expression in generators (that are not generator-based coroutines) is semantically equivalent to for v in o: yield v, therefore the interpreter does not attempt to control their gi_local_context.

EC Semantics for Generators

Every generator object has its own Local Context that stores only its own local modifications of the context. When a generator is being iterated, its local context will be put in the EC stack of the current thread. This means that the generator will be able to see access items from the surrounding context:

local = sys.new_context_item("local")
global = sys.new_context_item("global")

def generator():
    local.set('inside gen:')
    while True:
        print(local.get(), global.get())
        yield

g = gen()

local.set('hello')
global.set('spam')
next(g)

local.set('world')
global.set('ham')
next(g)

# Will print:
#   inside gen: spam
#   inside gen: ham

Any changes to the EC in nested generators are invisible to the outer generator:

local = sys.new_context_item("local")

def inner_gen():
    local.set('spam')
    yield

def outer_gen():
    local.set('ham')
    yield from gen()
    print(local.get())

list(outer_gen())

# Will print:
#   ham

Running generators without LC

Similarly to coroutines, generators with gi_local_context set to None simply use the outer Local Context.

The @contextlib.contextmanager decorator uses this mechanism to allow its generator to affect the EC:

item = sys.new_context_item('test')

@contextmanager
def context(x):
    old = item.get()
    item.set('x')
    try:
        yield
    finally:
        item.set(old)

with context('spam'):

    with context('ham'):
        print(1, item.get())

    print(2, item.get())

# Will print:
#   1 ham
#   2 spam

Implementing Generators with Iterators

The Execution Context API allows to fully replicate EC behaviour imposed on generators with a regular Python iterator class:

class Gen:

    def __init__(self):
        self.local_context = sys.new_local_context()

    def __iter__(self):
        return self

    def __next__(self):
        return sys.run_with_local_context(
            self.local_context, self._next_impl)

    def _next_impl(self):
        # Actual __next__ implementation.
        ...

Asynchronous Generators

Asynchronous Generators (AG) interact with the Execution Context similarly to regular generators.

They have an ag_local_context attribute, which, similarly to regular generators, can be set to None to make them use the outer Local Context. This is used by the new contextlib.asynccontextmanager decorator.

The EC support of await expression is implemented using the same approach as in coroutines, see the Coroutine Object Modifications section.

Greenlets

Greenlet is an alternative implementation of cooperative scheduling for Python. Although greenlet package is not part of CPython, popular frameworks like gevent rely on it, and it is important that greenlet can be modified to support execution contexts.

In a nutshell, greenlet design is very similar to design of generators. The main difference is that for generators, the stack is managed by the Python interpreter. Greenlet works outside of the Python interpreter, and manually saves some PyThreadState fields and pushes/pops the C-stack. Thus the greenlet package can be easily updated to use the new low-level C API to enable full support of EC.

New APIs

Python

Python APIs were designed to completely hide the internal implementation details, but at the same time provide enough control over EC and LC to re-implement all of Python built-in objects in pure Python.

  1. sys.new_context_item(description='...'): create a ContextItem object used to access/set values in EC.
  2. ContextItem:
    • .description: read-only attribute.
    • .get(): return the current value for the item.
    • .set(o): set the current value in the EC for the item.
  3. sys.get_execution_context(): return the current ExecutionContext.
  4. sys.new_execution_context(): create a new empty ExecutionContext.
  5. sys.new_local_context(): create a new empty LocalContext.
  6. sys.run_with_execution_context(ec: ExecutionContext, func, *args, **kwargs).
  7. sys.run_with_local_context(lc:LocalContext, func, *args, **kwargs).

C API

  1. PyContextItem * PyContext_NewItem(char *desc): create a PyContextItem object.
  2. PyObject * PyContext_GetItem(PyContextItem *): get the current value for the context item.
  3. int PyContext_SetItem(PyContextItem *, PyObject *): set the current value for the context item.
  4. PyLocalContext * PyLocalContext_New(): create a new empty PyLocalContext.
  5. PyLocalContext * PyExecutionContext_New(): create a new empty PyExecutionContext.
  6. PyExecutionContext * PyExecutionContext_Get(): get the EC for the active thread state.
  7. int PyExecutionContext_Set(PyExecutionContext *): set the passed EC object as the current for the active thread state.
  8. int PyExecutionContext_SetWithLocalContext(PyExecutionContext *, PyLocalContext *): allows to implement sys.run_with_local_context Python API.

Implementation Strategy

LocalContext is a Weak Key Mapping

Using a weak key mapping for LocalContext implementation enables the following properties with regards to garbage collection:

  • ContextItem objects are strongly-referenced only from the application code, not from any of the Execution Context machinery or values they point to. This means that there are no reference cycles that could extend their lifespan longer than necessary, or prevent their garbage collection.
  • Values put in the Execution Context are guaranteed to be kept alive while there is a ContextItem key referencing them in the thread.
  • If a ContextItem is garbage collected, all of its values will be removed from all contexts, allowing them to be GCed if needed.
  • If a thread has ended its execution, its thread state will be cleaned up along with its ExecutionContext, cleaning up all values bound to all Context Items in the thread.

ContextItem.get() Cache

We can add three new fields to PyThreadState and PyInterpreterState structs:

  • uint64_t PyThreadState->unique_id: a globally unique thread state identifier (we can add a counter to PyInterpreterState and increment it when a new thread state is created.)
  • uint64_t PyInterpreterState->context_item_deallocs: every time a ContextItem is GCed, all Execution Contexts in all threads will lose track of it. context_item_deallocs will simply count all ContextItem deallocations.
  • uint64_t PyThreadState->execution_context_ver: every time a new item is set, or an existing item is updated, or the stack of execution contexts is changed in the thread, we increment this counter.

The above two fields allow implementing a fast cache path in ContextItem.get(), in pseudo-code:

class ContextItem:

    def get(self):
        tstate = PyThreadState_Get()

        if (self.last_tstate_id == tstate.unique_id and
            self.last_ver == tstate.execution_context_ver
            self.last_deallocs ==
                tstate.iterp.context_item_deallocs):
            return self.last_value

        value = None
        for mapping in reversed(tstate.execution_context):
            if self in mapping:
                value = mapping[self]
                break

        self.last_value = value  # borrowed ref
        self.last_tstate_id = tstate.unique_id
        self.last_ver = tstate.execution_context_ver
        self.last_deallocs = tstate.interp.context_item_deallocs

        return value

Note that last_value is a borrowed reference. The assumption is that if all counters tests are OK, the object will be alive. This allows the CI values to be properly GCed.

This is similar to the trick that decimal C implementation uses for caching the current decimal context, and will have the same performance characteristics, but available to all Execution Context users.

Approach #1: Use a dict for LocalContext

The straightforward way of implementing the proposed EC mechanisms is to create a WeakKeyDict on top of Python dict type.

To implement the ExecutionContext type we can use Python list (or a custom stack implementation with some pre-allocation optimizations).

This approach will have the following runtime complexity:

  • O(M) for ContextItem.get(), where M is the number of Local Contexts in the stack.

    It is important to note that ContextItem.get() will implement a cache making the operation O(1) for packages like decimal and numpy.

  • O(1) for ContextItem.set().

  • O(N) for sys.get_execution_context(), where N is the total number of items in the current execution context.

Approach #2: Use HAMT for LocalContext

Languages like Clojure and Scala use Hash Array Mapped Tries (HAMT) to implement high performance immutable collections [5], [6].

Immutable mappings implemented with HAMT have O(log32N) performance for both set(), get(), and merge() operations, which is essentially O(1) for relatively small mappings (read about HAMT performance in CPython in the Appendix: HAMT Performance section.)

In this approach we use the same design of the ExecutionContext as in Approach #1, but we will use HAMT backed weak key Local Context implementation. With that we will have the following runtime complexity:

  • O(M * log32N) for ContextItem.get(), where M is the number of Local Contexts in the stack, and N is the number of items in the EC. The operation will essentially be O(M), because execution contexts are normally not expected to have more than a few dozen of items.

    (ContextItem.get() will have the same caching mechanism as in Approach #1.)

  • O(log32N) for ContextItem.set() where N is the number of items in the current local context. This will essentially be an O(1) operation most of the time.

  • O(log32N) for sys.get_execution_context(), where N is the total number of items in the current execution context.

Essentially, using HAMT for Local Contexts instead of Python dicts, allows to bring down the complexity of sys.get_execution_context() from O(N) to O(log32N) because of the more efficient merge algorithm.

Approach #3: Use HAMT and Immutable Linked List

We can make an alternative ExecutionContext design by using a linked list. Each LocalContext in the ExecutionContext object will be wrapped in a linked-list node.

LocalContext objects will use an HAMT backed weak key implementation described in the Approach #2.

Every modification to the current LocalContext will produce a new version of it, which will be wrapped in a new linked list node. Essentially this means, that ExecutionContext is an immutable forest of LocalContext objects, and can be safely copied by reference in sys.get_execution_context() (eliminating the expensive "merge" operation.)

With this approach, sys.get_execution_context() will be an O(1) operation.

Summary

We believe that approach #3 enables an efficient and complete Execution Context implementation, with excellent runtime performance.

ContextItem.get() Cache enables fast retrieval of context items for performance critical libraries like decimal and numpy.

Fast sys.get_execution_context() enables efficient management of execution contexts in asynchronous libraries like asyncio.

Design Considerations

Can we fix PyThreadState_GetDict()?

PyThreadState_GetDict is a TLS, and some of its existing users might depend on it being just a TLS. Changing its behaviour to follow the Execution Context semantics would break backwards compatibility.

PEP 521

PEP 521 proposes an alternative solution to the problem: enhance Context Manager Protocol with two new methods: __suspend__ and __resume__. To make it compatible with async/await, the Asynchronous Context Manager Protocol will also need to be extended with __asuspend__ and __aresume__.

This allows to implement context managers like decimal context and numpy.errstate for generators and coroutines.

The following code:

class Context:

    def __enter__(self):
        self.old_x = get_execution_context_item('x')
        set_execution_context_item('x', 'something')

    def __exit__(self, *err):
        set_execution_context_item('x', self.old_x)

would become this:

local = threading.local()

class Context:

    def __enter__(self):
        self.old_x = getattr(local, 'x', None)
        local.x = 'something'

    def __suspend__(self):
        local.x = self.old_x

    def __resume__(self):
        local.x = 'something'

    def __exit__(self, *err):
        local.x = self.old_x

Besides complicating the protocol, the implementation will likely negatively impact performance of coroutines, generators, and any code that uses context managers, and will notably complicate the interpreter implementation.

PEP 521 also does not provide any mechanism to propagate state in a local context, like storing a request object in an HTTP request handler to have better logging. Nor does it solve the leaking state problem for greenlet/gevent.

Can Execution Context be implemented outside of CPython?

Because async/await code needs an event loop to run it, an EC-like solution can be implemented in a limited way for coroutines.

Generators, on the other hand, do not have an event loop or trampoline, making it impossible to intercept their yield points outside of the Python interpreter.

Backwards Compatibility

This proposal preserves 100% backwards compatibility.

Appendix: HAMT Performance

First, while investigating possibilities of how to implement an immutable mapping in CPython, we were able to improve the efficiency of dict.copy() up to 5 times: [4]. All benchmarks in this section were run against the optimized dict.

To assess if HAMT can be used for Execution Context, we implemented it in CPython [7].

pep-0550-hamt_vs_dict.png

Figure 1. Benchmark code can be found here: [9].

Figure 1 shows that HAMT indeed displays O(1) performance for all benchmarked dictionary sizes. For dictionaries with less than 100 items, HAMT is a bit slower than Python dict/shallow copy.

pep-0550-lookup_hamt.png

Figure 2. Benchmark code can be found here: [10].

Figure 2 shows comparison of lookup costs between Python dict and an HAMT immutable mapping. HAMT lookup time is 30-40% worse than Python dict lookups on average, which is a very good result, considering how well Python dicts are optimized.

Note, that according to [8], HAMT design can be further improved.

There is a limitation of Python dict design which makes HAMT the preferred choice for immutable mapping implementation: dicts need to be resized periodically, and resize is expensive. The dict.copy() optimization we were able to do (see [4]) will only work for dicts that had no deleted items. Dicts that had deleted items need to be resized during copy(), which makes it much slower.

Because adding and deleting items from LocalContext is a very common operation, we would not be able to always use the optimized dict.copy() for LocalContext, frequently resorting to use the slower version of it.

Acknowledgments

I thank Elvis Pranskevichus and Victor Petrovykh for countless discussions around the topic and PEP proof reading and edits.

Thanks to Nathaniel Smith for proposing the ContextItem design [17] [18], for pushing the PEP towards a more complete design, and coming up with the idea of having a stack of contexts in the thread state.

Thanks to Nick Coghlan for numerous suggestions and ideas on the mailing list, and for coming up with a case that cause the complete rewrite of the initial PEP version [19].

Version History

  1. Posted on 11-Aug-2017, view it here: [20].

    The fundamental limitation that caused a complete redesign of the PEP was that it was not possible to implement an iterator that would interact with the EC in the same way as generators (see [19].)

  2. Posted on 15-Aug-2017: the current version.

Source: https://github.com/python/peps/blob/master/pep-0550.txt