[Python-checkins] peps: Add PEP 512 by Nathaniel Smith
berker.peksag
python-checkins at python.org
Thu Jun 9 05:08:50 EDT 2016
https://hg.python.org/peps/rev/c244d09c7874
changeset: 6359:c244d09c7874
user: Berker Peksag <berker.peksag at gmail.com>
date: Thu Jun 09 12:08:48 2016 +0300
summary:
Add PEP 512 by Nathaniel Smith
files:
pep-0521.txt | 387 +++++++++++++++++++++++++++++++++++++++
1 files changed, 387 insertions(+), 0 deletions(-)
diff --git a/pep-0521.txt b/pep-0521.txt
new file mode 100644
--- /dev/null
+++ b/pep-0521.txt
@@ -0,0 +1,387 @@
+PEP: 521
+Title: Managing global context via 'with' blocks in generators and coroutines
+Version: $Revision$
+Last-Modified: $Date$
+Author: Nathaniel J. Smith <njs at pobox.com>
+Status: Deferred
+Type: Standards Track
+Content-Type: text/x-rst
+Created: 27-Apr-2015
+Python-Version: 3.6
+Post-History: 29-Apr-2015
+
+
+Abstract
+========
+
+While we generally try to avoid global state when possible, there
+nonetheless exist a number of situations where it is agreed to be the
+best approach. In Python, the standard way of handling such cases is
+to store the global state in global or thread-local storage, and then
+use ``with`` blocks to limit modifications of this global state to a
+single dynamic scope. Examples where this pattern is used include the
+standard library's ``warnings.catch_warnings`` and
+``decimal.localcontext``, NumPy's ``numpy.errstate`` (which exposes
+the error-handling settings provided by the IEEE 754 floating point
+standard), and the handling of logging context or HTTP request context
+in many server application frameworks.
+
+However, there is currently no ergonomic way to manage such local
+changes to global state when writing a generator or coroutine. For
+example, this code::
+
+ def f():
+ with warnings.catch_warnings():
+ for x in g():
+ yield x
+
+may or may not successfully catch warnings raised by ``g()``, and may
+or may not inadverdantly swallow warnings triggered elsewhere in the
+code. The context manager, which was intended to apply only to ``f``
+and its callees, ends up having a dynamic scope that encompasses
+arbitrary and unpredictable parts of its call\ **ers**. This problem
+becomes particularly acute when writing asynchronous code, where
+essentially all functions become coroutines.
+
+Here, we propose to solve this problem by notifying context managers
+whenever execution is suspended or resumed within their scope,
+allowing them to restrict their effects appropriately.
+
+
+Specification
+=============
+
+Two new, optional, methods are added to the context manager protocol:
+``__suspend__`` and ``__resume__``. If present, these methods will be
+called whenever a frame's execution is suspended or resumed from
+within the context of the ``with`` block.
+
+More formally, consider the following code::
+
+ with EXPR as VAR:
+ PARTIAL-BLOCK-1
+ f((yield foo))
+ PARTIAL-BLOCK-2
+
+Currently this is equivalent to the following code copied from PEP 343::
+
+ mgr = (EXPR)
+ exit = type(mgr).__exit__ # Not calling it yet
+ value = type(mgr).__enter__(mgr)
+ exc = True
+ try:
+ try:
+ VAR = value # Only if "as VAR" is present
+ PARTIAL-BLOCK-1
+ f((yield foo))
+ PARTIAL-BLOCK-2
+ except:
+ exc = False
+ if not exit(mgr, *sys.exc_info()):
+ raise
+ finally:
+ if exc:
+ exit(mgr, None, None, None)
+
+This PEP proposes to modify ``with`` block handling to instead become::
+
+ mgr = (EXPR)
+ exit = type(mgr).__exit__ # Not calling it yet
+ ### --- NEW STUFF ---
+ if the_block_contains_yield_points: # known statically at compile time
+ suspend = getattr(type(mgr), "__suspend__", lambda: None)
+ resume = getattr(type(mgr), "__resume__", lambda: None)
+ ### --- END OF NEW STUFF ---
+ value = type(mgr).__enter__(mgr)
+ exc = True
+ try:
+ try:
+ VAR = value # Only if "as VAR" is present
+ PARTIAL-BLOCK-1
+ ### --- NEW STUFF ---
+ suspend()
+ tmp = yield foo
+ resume()
+ f(tmp)
+ ### --- END OF NEW STUFF ---
+ PARTIAL-BLOCK-2
+ except:
+ exc = False
+ if not exit(mgr, *sys.exc_info()):
+ raise
+ finally:
+ if exc:
+ exit(mgr, None, None, None)
+
+Analogous suspend/resume calls are also wrapped around the ``yield``
+points embedded inside the ``yield from``, ``await``, ``async with``,
+and ``async for`` constructs.
+
+
+Nested blocks
+-------------
+
+Given this code::
+
+ def f():
+ with OUTER:
+ with INNER:
+ yield VALUE
+
+then we perform the following operations in the following sequence::
+
+ INNER.__suspend__()
+ OUTER.__suspend__()
+ yield VALUE
+ OUTER.__resume__()
+ INNER.__resume__()
+
+Note that this ensures that the following is a valid refactoring::
+
+ def f():
+ with OUTER:
+ yield from g()
+
+ def g():
+ with INNER
+ yield VALUE
+
+Similarly, ``with`` statements with multiple context managers suspend
+from right to left, and resume from left to right.
+
+
+Other changes
+-------------
+
+``__suspend__`` and ``__resume__`` methods are added to
+``warnings.catch_warnings`` and ``decimal.localcontext``.
+
+
+Rationale
+=========
+
+In the abstract, we gave an example of plausible but incorrect code::
+
+ def f():
+ with warnings.catch_warnings():
+ for x in g():
+ yield x
+
+To make this correct in current Python, we need to instead write
+something like::
+
+ def f():
+ with warnings.catch_warnings():
+ it = iter(g())
+ while True:
+ with warnings.catch_warnings():
+ try:
+ x = next(it)
+ except StopIteration:
+ break
+ yield x
+
+OTOH, if this PEP is accepted then the original code will become
+correct as-is. Or if this isn't convincing, then here's another
+example of broken code; fixing it requires even greater gyrations, and
+these are left as an exercise for the reader::
+
+ def f2():
+ with warnings.catch_warnings(record=True) as w:
+ for x in g():
+ yield x
+ assert len(w) == 1
+ assert "xyzzy" in w[0].message
+
+And notice that this last example isn't artificial at all -- if you
+squint, it turns out to be exactly how you write a test that an
+asyncio-using coroutine ``g`` correctly raises a warning. Similar
+issues arise for pretty much any use of ``warnings.catch_warnings``,
+``decimal.localcontext``, or ``numpy.errstate`` in asyncio-using code.
+So there's clearly a real problem to solve here, and the growing
+prominence of async code makes it increasingly urgent.
+
+
+Alternative approaches
+----------------------
+
+The main alternative that has been proposed is to create some kind of
+"task-local storage", analogous to "thread-local storage"
+[#yury-task-local-proposal]_. In essence, the idea would be that the
+event loop would take care to allocate a new "task namespace" for each
+task it schedules, and provide an API to at any given time fetch the
+namespace corresponding to the currently executing task. While there
+are many details to be worked out [#task-local-challenges]_, the basic
+idea seems doable, and it is an especially natural way to handle the
+kind of global context that arises at the top-level of async
+application frameworks (e.g., setting up context objects in a web
+framework). But it also has a number of flaws:
+
+* It only solves the problem of managing global state for coroutines
+ that ``yield`` back to an asynchronous event loop. But there
+ actually isn't anything about this problem that's specific to
+ asyncio -- as shown in the examples above, simple generators run
+ into exactly the same issue.
+
+* It creates an unnecessary coupling between event loops and code that
+ needs to manage global state. Obviously an async web framework needs
+ to interact with some event loop API anyway, so it's not a big deal
+ in that case. But it's weird that ``warnings`` or ``decimal`` or
+ NumPy should have to call into an async library's API to access
+ their internal state when they themselves involve no async code.
+ Worse, since there are multiple event loop APIs in common use, it
+ isn't clear how to choose which to integrate with. (This could be
+ somewhat mitigated by CPython providing a standard API for creating
+ and switching "task-local domains" that asyncio, Twisted, tornado,
+ etc. could then work with.)
+
+* It's not at all clear that this can be made acceptably fast. NumPy
+ has to check the floating point error settings on every single
+ arithmetic operation. Checking a piece of data in thread-local
+ storage is absurdly quick, because modern platforms have put massive
+ resources into optimizing this case (e.g. dedicating a CPU register
+ for this purpose); calling a method on an event loop to fetch a
+ handle to a namespace and then doing lookup in that namespace is
+ much slower.
+
+ More importantly, this extra cost would be paid on *every* access to
+ the global data, even for programs which are not otherwise using an
+ event loop at all. This PEP's proposal, by contrast, only affects
+ code that actually mixes ``with`` blocks and ``yield`` statements,
+ meaning that the users who experience the costs are the same users
+ who also reap the benefits.
+
+On the other hand, such tight integration between task context and the
+event loop does potentially allow other features that are beyond the
+scope of the current proposal. For example, an event loop could note
+which task namespace was in effect when a task called ``call_soon``,
+and arrange that the callback when run would have access to the same
+task namespace. Whether this is useful, or even well-defined in the
+case of cross-thread calls (what does it mean to have task-local
+storage accessed from two threads simultaneously?), is left as a
+puzzle for event loop implementors to ponder -- nothing in this
+proposal rules out such enhancements as well. It does seem though
+that such features would be useful primarily for state that already
+has a tight integration with the event loop -- while we might want a
+request id to be preserved across ``call_soon``, most people would not
+expect::
+
+ with warnings.catch_warnings():
+ loop.call_soon(f)
+
+to result in ``f`` being run with warnings disabled, which would be
+the result if ``call_soon`` preserved global context in general.
+
+
+Backwards compatibility
+=======================
+
+Because ``__suspend__`` and ``__resume__`` are optional and default to
+no-ops, all existing context managers continue to work exactly as
+before.
+
+Speed-wise, this proposal adds additional overhead when entering a
+``with`` block (where we must now check for the additional methods;
+failed attribute lookup in CPython is rather slow, since it involves
+allocating an ``AttributeError``), and additional overhead at
+suspension points. Since the position of ``with`` blocks and
+suspension points is known statically, the compiler can
+straightforwardly optimize away this overhead in all cases except
+where one actually has a ``yield`` inside a ``with``.
+
+
+Interaction with PEP 492
+========================
+
+PEP 492 added new asynchronous context managers, which are like
+regular context managers but instead of having regular methods
+``__enter__`` and ``__exit__`` they have coroutine methods
+``__aenter__`` and ``__aexit__``.
+
+There are a few options for how to handle these:
+
+1) Add ``__asuspend__`` and ``__aresume__`` coroutine methods.
+
+ One potential difficulty here is that this would add a complication
+ to an already complicated part of the bytecode
+ interpreter. Consider code like::
+
+ async def f():
+ async with MGR:
+ await g()
+
+ @types.coroutine
+ def g():
+ yield 1
+
+ In 3.5, ``f`` gets desugared to something like::
+
+ @types.coroutine
+ def f():
+ yield from MGR.__aenter__()
+ try:
+ yield from g()
+ finally:
+ yield from MGR.__aexit__()
+
+ With the addition of ``__asuspend__`` / ``__aresume__``, the
+ ``yield from`` would have to replaced by something like::
+
+ for SUBVALUE in g():
+ yield from MGR.__asuspend__()
+ yield SUBVALUE
+ yield from MGR.__aresume__()
+
+ Notice that we've had to introduce a new temporary ``SUBVALUE`` to
+ hold the value yielded from ``g()`` while we yield from
+ ``MGR.__asuspend__()``. Where does this temporary go? Currently
+ ``yield from`` is a single bytecode that doesn't modify the stack
+ while looping. Also, the above code isn't even complete, because it
+ skips over the issue of how to direct ``send``/``throw`` calls to
+ the right place at the right time...
+
+2) Add plain ``__suspend__`` and ``__resume__`` methods.
+
+3) Leave async context managers alone for now until we have more
+ experience with them.
+
+It isn't entirely clear what use cases even exist in which an async
+context manager would need to set coroutine-local-state (= like
+thread-local-state, but for a coroutine stack instead of an OS
+thread), and couldn't do so via coordination with the coroutine
+runner. So this draft tentatively goes with option (3) and punts on
+this question until later.
+
+
+References
+==========
+
+.. [#yury-task-local-proposal] https://groups.google.com/forum/#!topic/python-tulip/zix5HQxtElg
+ https://github.com/python/asyncio/issues/165
+
+.. [#task-local-challenges] For example, we would have to decide
+ whether there is a single task-local namespace shared by all users
+ (in which case we need a way for multiple third-party libraries to
+ adjudicate access to this namespace), or else if there are multiple
+ task-local namespaces, then we need some mechanism for each library
+ to arrange for their task-local namespaces to be created and
+ destroyed at appropriate moments. The preliminary patch linked
+ from the github issue above doesn't seem to provide any mechanism
+ for such lifecycle management.
+
+
+Copyright
+=========
+
+This document has been placed in the public domain.
+
+
+
+..
+ Local Variables:
+ mode: indented-text
+ indent-tabs-mode: nil
+ sentence-end-double-space: t
+ fill-column: 70
+ coding: utf-8
+ End:
--
Repository URL: https://hg.python.org/peps
More information about the Python-checkins
mailing list