[Python-Dev] PEP 509: Add a private version to dict (version 3)

Wed Apr 20 19:34:55 EDT 2016

Hi,

Guido van Rossum and Jim J. Jewett suggested me to *not require* to
always increase the dict version if a dict method does not modify its
content. I modified the Changes section to only require that the
version is increased when the dictionary content is modified.

I also explained the nice side effect of having an unique identifier
for two empty dictionaries: it avoids a strong reference when checking
if a namespace (dictionary) was replaced. Yury Selivanov's opcode
cache uses this property to avoid a strong refence on builtin and
global namespaces. It's also a written reply to Armin Rigo's
suggestion to use the version 0 for empty dictionaries (new empty dict
or for dict.clear()).

The modified Changes section:

Changes
=======

Add a ``ma_version_tag`` field to the ``PyDictObject`` structure with
the C type ``PY_UINT64_T``, 64-bit unsigned integer. Add also a global
dictionary version.

Each time a dictionary is created, the global version is incremented and
the dictionary version is initialized to the global version.

Each time the dictionary content is modified, the global version must be
incremented and copied to the dictionary version. Dictionary methods
which can modify its content:

* ``clear()``
* ``pop(key)``
* ``popitem()``
* ``setdefault(key, value)``
* ``__delitem__(key)``
* ``__setitem__(key, value)``
* ``update(...)``

The choice of increasing or not the version when a dictionary method
does not change its content is left to the Python implementation. A
Python implementation can decide to not increase the version to avoid
dictionary lookups in guards. Examples of cases when dictionary methods
don't modify its content:

* ``clear()`` if the dict is already empty
* ``pop(key)`` if the key does not exist
* ``popitem()`` if the dict is empty
* ``setdefault(key, value)`` if the key already exists
* ``__delitem__(key)`` if the key does not exist
* ``__setitem__(key, value)`` if the new value is identical to the
  current value
* ``update()`` if called without argument or if new values are identical
  to current values

Setting a key to a new value equals to the old value is also considered
as an operation modifying the dictionary content.

Two different empty dictionaries must have a different version to be
able to identify a dictionary just by its version. It allows to verify
in a guard that a namespace was not replaced without storing a strong
reference to the dictionary. Using a borrowed reference does not work:
if the old dictionary is destroyed, it is possible that a new dictionary
is allocated at the same memory address. By the way, dictionaries don't
support weak references.

The version increase must be atomic. In CPython, the Global Interpreter
Lock (GIL) already protects ``dict`` methods to make changes atomic.

Example using an hypothetical ``dict_get_version(dict)`` function::

    >>> d = {}
    >>> dict_get_version(d)
    100
    >>> d['key'] = 'value'
    >>> dict_get_version(d)
    101
    >>> d['key'] = 'new value'
    >>> dict_get_version(d)
    102
    >>> del d['key']
    >>> dict_get_version(d)
    103

The field is called ``ma_version_tag``, rather than ``ma_version``, to
suggest to compare it using ``version_tag == old_version_tag``, rather
than ``version <= old_version`` which becomes wrong after an integer
overflow.

--
Victor