[Python-ideas] RFC: PEP: Add dict.version

Sat Jan 9 08:32:21 EST 2016

On Sun, Jan 10, 2016 at 12:21 AM, Steven D'Aprano <steve at pearwood.info> wrote:
> On Sat, Jan 09, 2016 at 01:09:13PM +0100, M.-A. Lemburg wrote:
>
>> * Given that this is an optimization and not meant to be exact
>>   science, why would we need 64 bits worth of version information ?
>>
>>   AFAIK, you only need the version information to be able to
>>   answer the question "did anything change compared to last time
>>   I looked ?".
>>
>>   For an optimization it's good enough to get an answer "yes"
>>   for slow changing dicts and "no" for all other cases.
>
> I don't understand this. The question has nothing to do with
> how quickly or slowly the dict has changed, but only on whether or not
> it actually has changed. Maybe your dict has been stable for three
> hours, except for one change; or it changes a thousand times a second.
> Either way, it has still changed.
>
>
>>   False
>>   negatives don't really hurt. False positives are not allowed.
>
> I think you have this backwards. False negatives potentially will
> introduce horrible bugs. A false negative means that you fail to notice
> when the dict has changed, when it actually has. ("Has the dict
> changed?" "No.") The result of that will be to apply the optimization
> when you shouldn't, and that is potentially catastrophic (the entirely
> wrong function is mysteriously called).
>
> A false positive means you wrongly think the dict has changed when it
> hasn't. ("Has the dict changed?" "Yes.") That's still bad, because you
> miss out on the possibility of applying the optimization when you
> actually could have, but it's not so bad. So false positives (wrongly
> thinking the dict has changed when it hasn't) can be permitted, but
> false negatives shouldn't be.

I think we're getting caught in terminology a bit. The original
question was "why a 64-bit counter". Here's my take on it:

* If the dict has changed but we say it hasn't, this is a critical
failure. M-A L called this a "false positive", which works if the
question is "may we use the optimized version".

* If the dict has changed exactly N times since it was last checked,
where N is the integer wrap-around period of the counter, a naive
counter comparison will show that it has not changed.

Consequently, a small counter is more problematic than a large one. If
the counter has 2**8 states, then collisions will be frequent, and
that would be bad. If it has 2**32 states, then a slow-changing dict
will last longer than any typical run of a program (if it changes,
say, once per second, you get over a century of uptime before it's a
problem), but a fast-changing dict could run into issues (change every
millisecond and you'll run into trouble after a couple of months). A
64-bit counter could handle ridiculously fast mutation (say, every
nanosecond) for a ridiculously long time (hundreds of years).

That's the only way that fast-changing and slow-changing have any meaning.

ChrisA

[Python-ideas] RFC: PEP: Add dict.__version__

[Python-ideas] RFC: PEP: Add dict.version