[Python-checkins] peps: PEP 509: dict version is now global
victor.stinner
python-checkins at python.org
Thu Apr 14 11:13:47 EDT 2016
https://hg.python.org/peps/rev/4335314e45ef
changeset: 6280:4335314e45ef
user: Victor Stinner <victor.stinner at gmail.com>
date: Thu Apr 14 17:13:07 2016 +0200
summary:
PEP 509: dict version is now global
* Rename ma_version to ma_version_tag
* Add links to fatoptimizer and fat projects
files:
pep-0509.txt | 130 ++++++++++++++++++++++++--------------
1 files changed, 81 insertions(+), 49 deletions(-)
diff --git a/pep-0509.txt b/pep-0509.txt
--- a/pep-0509.txt
+++ b/pep-0509.txt
@@ -13,8 +13,9 @@
Abstract
========
-Add a new private version to builtin ``dict`` type, incremented at each
-change, to implement fast guards on namespaces.
+Add a new private version to the builtin ``dict`` type, incremented at
+each dictionary creation and at each dictionary change, to implement
+fast guards on namespaces.
Rationale
@@ -38,7 +39,9 @@
on namespaces.
Dictionary lookups can be skipped if the version does not change which
-is the common case for most namespaces. The performance of a guard does
+is the common case for most namespaces. Since the version is globally
+unique, the version is also enough to check if the namespace dictionary
+was not replaced with a new dictionary. The performance of a guard does
not depend on the number of watched dictionary entries, complexity of
O(1), if the dictionary version does not change.
@@ -71,9 +74,10 @@
self.version = dict_get_version(dict)
def check(self):
- """Return True if the dictionary entry did not changed."""
+ """Return True if the dictionary entry did not changed
+ and the dictionary was not replaced."""
- # read the version field of the dict structure
+ # read the version of the dict structure
version = dict_get_version(self.dict)
if version == self.version:
# Fast-path: dictionary lookup avoided
@@ -94,6 +98,22 @@
Usage of the dict version
=========================
+Speedup method calls 1.2x
+-------------------------
+
+Yury Selivanov wrote a patch to optimize method calls. The patch depends
+on the `implement per-opcode cache in ceval
+<https://bugs.python.org/issue26219>`_ patch which requires dictionary
+versions to invalidate the cache if the globals dictionary or the
+builtins dictionary has been modified.
+
+The cache also requires that the dictionary version is globally unique.
+It is possible to define a function in a namespace and call it
+in a different namespace: using ``exec()`` with the *globals* parameter
+for example. In this case, the globals dictionary was changed and the
+cache must be invalidated.
+
+
Specialized functions using guards
----------------------------------
@@ -102,8 +122,9 @@
specialized functions with guards. It allows to implement static
optimizers for Python without breaking the Python semantics.
-Example of a static Python optimizer: the astoptimizer of the `FAT
-Python <http://faster-cpython.readthedocs.org/fat_python.html>`_ project
+Example of a static Python optimizer: the `fatoptimizer
+<http://fatoptimizer.readthedocs.org/>`_ of the `FAT Python
+<http://faster-cpython.readthedocs.org/fat_python.html>`_ project
implements many optimizations which require guards on namespaces.
Examples:
@@ -128,7 +149,7 @@
Unladen Swallow
---------------
-Even if dictionary version was not explicitly mentioned, optimization
+Even if dictionary version was not explicitly mentioned, optimizing
globals and builtins lookup was part of the Unladen Swallow plan:
"Implement one of the several proposed schemes for speeding lookups of
globals and builtins." Source: `Unladen Swallow ProjectPlan
@@ -143,9 +164,12 @@
Changes
=======
-Add a ``ma_version`` field to the ``PyDictObject`` structure with the C
-type ``PY_INT64_T``, 64-bit unsigned integer. New empty dictionaries are
-initilized to version ``0``. The version is incremented at each change:
+Add a ``ma_version_tag`` field to the ``PyDictObject`` structure with
+the C type ``PY_INT64_T``, 64-bit unsigned integer. Add also a global
+dictionary version. Each time a dictionary is created, the global
+version is incremented and the dictionary version is initialized to the
+global version. The global version is also incremented and copied to the
+dictionary version at each dictionary change:
* ``clear()`` if the dict was non-empty
* ``pop(key)`` if the key exists
@@ -158,30 +182,27 @@
values are compared by identity, not by their content; the version can
be incremented multiple times
-.. note::
- The ``PyDictObject`` structure is not part of the stable ABI.
+The ``PyDictObject`` structure is not part of the stable ABI.
+
+The field is called ``ma_version_tag`` rather than ``ma_version`` to
+suggest to compare it using ``version_tag == old_version_tag`` rather
+than ``version <= old_version`` which makes the integer overflow much
+likely.
Example using an hypothetical ``dict_get_version(dict)`` function::
>>> d = {}
>>> dict_get_version(d)
- 0
+ 100
>>> d['key'] = 'value'
>>> dict_get_version(d)
- 1
+ 101
>>> d['key'] = 'new value'
>>> dict_get_version(d)
- 2
+ 102
>>> del d['key']
>>> dict_get_version(d)
- 3
-
-If a dictionary is created with items, the version is also incremented
-at each dictionary insertion. Example::
-
- >>> d = dict(x=7, y=33)
- >>> dict_get_version(d)
- 2
+ 103
The version is not incremented if an existing key is set to the same
value. For efficiency, values are compared by their identity:
@@ -192,10 +213,10 @@
>>> value = object()
>>> d['key'] = value
>>> dict_get_version(d)
- 1
+ 40
>>> d['key'] = value
>>> dict_get_version(d)
- 1
+ 40
.. note::
CPython uses some singleton like integers in the range [-5; 257],
@@ -204,10 +225,10 @@
singleton, the version is not modified.
-Implementation
-==============
+Implementation and Performance
+==============================
-The `issue #26058: PEP 509: Add ma_version to PyDictObject
+The `issue #26058: PEP 509: Add ma_version_tag to PyDictObject
<https://bugs.python.org/issue26058>`_ contains a patch implementing
this PEP.
@@ -221,23 +242,26 @@
ns, whereas the guard still only costs 3.8 ns when the version does not
change (39x as fast).
+The `fat module
+<http://fatoptimizer.readthedocs.org/en/latest/fat.html>`_ implements
+such guards: ``fat.GuardDict`` is based on the dictionary version.
+
Integer overflow
================
-The implementation uses the C unsigned integer type ``PY_UINT64_T`` to
-store the version, a 64 bits unsigned integer. The C code uses
-``version++``. On integer overflow, the version is wrapped to ``0`` (and
-then continue to be incremented) according to the C standard.
+The implementation uses the C type ``PY_UINT64_T`` to store the version:
+a 64 bits unsigned integer. The C code uses ``version++``. On integer
+overflow, the version is wrapped to ``0`` (and then continue to be
+incremented) according to the C standard.
After an integer overflow, a guard can succeed whereas the watched
-dictionary key was modified. The bug occurs if the dictionary is
-modified at least ``2 ** 64`` times between two checks of the guard and
-if the new version (theoretical value with no integer overflow) is equal
-to the old version modulo ``2 ** 64``.
+dictionary key was modified. The bug only occurs at a guard check if
+there are exaclty ``2 ** 64`` dictionary creations or modifications
+since the previous guard check.
-If a dictionary is modified each nanosecond, an overflow takes longer
-than 584 years. Using a 32-bit version, the overflow occurs only after 4
+If a dictionary is modified every nanosecond, ``2 ** 64`` modifications
+takes longer than 584 years. Using a 32-bit version, it only takes 4
seconds. That's why a 64-bit unsigned type is also used on 32-bit
systems. A dictionary lookup at the C level takes 14.8 ns.
@@ -264,11 +288,6 @@
* All Python implementations must implement this new property, it gives
more work to other implementations, whereas they may not use the
dictionary version at all.
-* The ``__version__`` can be wrapped on integer overflow. It is error
- prone: using ``dict.__version__ <= guard_version`` is wrong,
- ``dict.__version__ == guard_version`` must be used instead to reduce
- the risk of bug on integer overflow (even if the integer overflow is
- unlikely in practice).
* Exposing the dictionary version at Python level can lead the
false assumption on performances. Checking ``dict.__version__`` at
the Python level is not faster than a dictionary lookup. A dictionary
@@ -281,7 +300,13 @@
$ ./python -m timeit -s 'd = {str(i):i for i in range(100)}' 'd.__version__ == 100'
10000000 loops, best of 3: 0.0475 usec per loop
-Bikeshedding on the property name:
+* The ``__version__`` can be wrapped on integer overflow. It is error
+ prone: using ``dict.__version__ <= guard_version`` is wrong,
+ ``dict.__version__ == guard_version`` must be used instead to reduce
+ the risk of bug on integer overflow (even if the integer overflow is
+ unlikely in practice).
+
+Mandatory bikeshedding on the property name:
* ``__cache_token__``: name proposed by Nick Coghlan, name coming from
`abc.get_cache_token()
@@ -318,9 +343,10 @@
self.entry_version = dict_get_entry_version(dict, key)
def check(self):
- """Return True if the dictionary entry did not changed."""
+ """Return True if the dictionary entry did not changed
+ and the dictionary was not replaced."""
- # read the version field of the dict structure
+ # read the version of the dict structure
dict_version = dict_get_version(self.dict)
if dict_version == self.version:
# Fast-path: dictionary lookup avoided
@@ -486,8 +512,14 @@
Discussion
==========
-Thread on the python-ideas mailing list: `RFC: PEP: Add dict.__version__
-<https://mail.python.org/pipermail/python-ideas/2016-January/037702.html>`_.
+Thread on the mailing lists:
+
+* python-dev: `PEP 509: Add a private version to dict
+ <https://mail.python.org/pipermail/python-dev/2016-January/142685.html>`_
+ (january 2016)
+* python-ideas: `RFC: PEP: Add dict.__version__
+ <https://mail.python.org/pipermail/python-ideas/2016-January/037702.html>`_
+ (january 2016)
Copyright
--
Repository URL: https://hg.python.org/peps
More information about the Python-checkins
mailing list