[Python-Dev] RFC: PEP 509: Add a private version to dict

Fri Apr 15 13:54:59 EDT 2016

On Thu Apr 14 11:19:42 EDT 2016, Victor Stinner posted the latest
draft of PEP 509; dict version_tag

(1)  Meta Question:  If this is really only for CPython, then is
"Standards Track" the right classification?

(2)  Why *promise* not to update the version_tag when replacing a
value with itself?  Isn't that the sort of quality-of-implementation
issue that got pushed to a note for objects that happen to be
represented as singletons, such as small integers or ASCII chars?

I think it is a helpful optimization, and worth documenting ... I
just think it should be at the layer of "this particular patch",
rather than something that sounds like part of the contract.

e.g.,

... The global version is also incremented and copied to the
dictionary version at each dictionary change.  The following
dict methods can trigger changes:

* ``clear()`` 
* ``pop(key)``
* ``popitem()`` 
* ``setdefault(key, value)`` 
* ``__detitem__(key)`` 
* ``__setitem__(key, value)`` 
* ``update(...)``

.. note::  As a quality of implementation issue, the actual patch
does not increment the version_tag when it can prove that there
was no actual change.  For example, clear() on an already-empty
dict will not trigger a version_tag change, nor will updating a
dict with itself, since the values will be unchanged.  For efficiency,
the analysis considers only object identity (not equality) when
deciding whether to increment the version_tag.

[2A] Do you want to promise that replacing a value with a
non-identical object *will* trigger a version_tag update *even*
if the objects are equal?

I would vote no, but I realize backwards-compatibility may create
such a promise implicitly.

(3)  It is worth being explicit on whether empty dicts can share
a version_tag of 0.  If this PEP is about dict content, then that
seems fine, and it may well be worth optimizing dict creation.

There are times when it is important to keep the same empty dict;
I can't think of any use cases where it is important to verify
that some *other* code has done so, *and* I can't get a reference
to the correct dict for an identity check.

(4)  Please be explicit about the locking around version++; it
is enough to say that the relevant methods already need to hold
the GIL (assuming that is true).

(5)  I'm not sure I understand the arguments around a per-entry
version.

On the one hand, you never need a strong reference to the value;
if it has been collected, then it has obviously been removed from
the dict and should trigger a change even with per-dict.

On the other hand, I'm not sure per-entry would really allow
finer-grained guards to avoid lookups; just because an entry hasn't
been modified doesn't prove it hasn't been moved to another location,
perhaps by replacing a dummy in a slot it would have preferred.

(6)  I'm also not sure why version_tag *doesn't* solve the problem
of dicts that fool the iteration guards by mutating without changing
size ( https://bugs.python.org/issue19332 ) ... are you just saying
that the iterator views aren't allowed to rely on the version-tag
remaining stable, because replacing a value (as opposed to a
key-value pair) is allowed?

I had always viewed the failing iterators as a supporting-this-case-
makes-the-code-too-slow-and-ugly limitation, rather than a data
integrity check.  When I do care about the data not changing,
(an exposed variant of) version_tag is as likely to be what I want as
a hypothetical keys_version_tag would be. 

-jJ

--

If there are still threading problems with my replies, please
email me with details, so that I can try to resolve them.  -jJ