[Python-ideas] RFC: PEP: Add dict.__version__

Victor Stinner victor.stinner at gmail.com
Sat Jan 9 08:24:07 EST 2016


2016-01-09 13:09 GMT+01:00 M.-A. Lemburg <mal at egenix.com>:
> * How would the implementation deal with wrap around of the
>   version number for fast changing dicts (esp. on 32-bit platforms) ?

Let me try to do some maths.

haypo at selma$ python3 -m timeit 'd={}' 'for i in range(2**16): d[i]=i'
100 loops, best of 3: 7.01 msec per loop

haypo at selma$ python3
Python 3.4.3 (default, Jun 29 2015, 12:16:01)
>>> t=7.01e-3 / 2**16
>>> t*1e9
106.964111328125

It looks like __setitem__() takes 107 in average. I guess that the
number depends a lot on the dictionary size, the number of required
resize (rehash all keys), etc. But well, it's just to have an
estimation.

>>> print(datetime.timedelta(seconds=2**32 * t))
0:07:39.407360

With a 32-bit version, less than 8 minutes are enough to hit the
integer overflow if each dict operation changes the dict version and
you modify a dict in a loop.

>>> print(2016 + datetime.timedelta(seconds=2**64 * t) / datetime.timedelta(days=365.25))
64541.02049400773

With a 64-bit version, the situation is very different: the next
overflow will not occur before the year 64 541 :-)

Maybe it's worth to use a 64-bit version on 32-bit platforms? Python
3.5 already uses a 64-bit integer on 32-bit platforms to store a
timestamp in the private "pytime" API.

Guard has only a bug on integer overflow if the new version modulo
2^32 (or modulo 2^64) is equal to the old version. The bet is also
that it's "unlikely".

> * Given that this is an optimization and not meant to be exact
>   science, why would we need 64 bits worth of version information ?

If a guard says that nothing changes where something changes, it is a
real issue for me. It means that the optimization changes the Python
semantic.

>   For an optimization it's good enough to get an answer "yes"
>   for slow changing dicts and "no" for all other cases. False
>   negatives don't really hurt. False positives are not allowed.

False negative means that you loose the optimization. It would be
annoying to see server performance degrades after N days before of an
integer overflow :-/ It can be a big issue. How do you choose the
number of servers if performances are not stable?

Victor


More information about the Python-ideas mailing list