[Python-Dev] RFC: PEP 509: Add a private version to dict

Victor Stinner victor.stinner at gmail.com
Fri Apr 15 19:31:45 EDT 2016


.2016-04-15 23:45 GMT+02:00 Jim J. Jewett <jimjjewett at gmail.com>:
>> It's an useful property. For example, let's say that you have a guard
>> on globals()['value']. The guard is created with value=3. An unit test
>> replaces the value with 50, but then restore the value to its previous
>> value (3). Later, the guard is checked to decide if an optimization
>> can be used.
>
>> If the dict version is increased, you need a lookup. If the dict
>> version is not increased, the guard is cheap.
>
> I would expect the version to be increased twice, and therefore to
> require a lookup.  Are you suggesting that unittest should provide an
> example of resetting the version back to the original value when it
> cleans up after itself?

Sorry, as I wrote in another email that I was wrong. If you modify the
value, the version is increased. The discussed case is really a corner
case: the version does not change if the key is set again to exactly
the same value.

d[key] = value
d[key] = value

It's just that it's cheap to implement it :-)


>> In C, it's very cheap to implement the test "new_value == old_value",
>> it just compares two pointers.
>
> Yeah, I understand that it is likely a win in terms of performance,
> and a good way to start off (given that you're willing to do the
> work).
>
> I just worry that you may end up closing off even better optimizations
> later, if you make too many promises about exactly how you will do
> which ones.
>
> Today, dict only cares about ==, and you (reasonably) think that full
> == isn't always worth running ... but when it comes to which tests
> *are* worth running, I'm not confident that the answers won't change
> over the years.

I checked, currently there is no unit test for a==b, only for a is b.
I will add add a test for a==b but a is not b, and ensure that the
version is increased.


>>> [2A] Do you want to promise that replacing a value with a
>>> non-identical object *will* trigger a version_tag update *even*
>>> if the objects are equal?
>
>> It's already written in the PEP:
>
> I read that as a description of what the code does, rather than a spec
> for what it should do... so it isn't clear whether I could count on
> that remaining true.
>
> For example, if I know that my dict values are all 4-digit integers,
> can I write:
>
>     d[k]  = d[k] + 0
>
> and be assured that the version_tag will bump?  Or is that something
> that a future optimizer might optimize out?

Hum, I will try to clarify that.


>>> (4)  Please be explicit about the locking around version++; it
>>> is enough to say that the relevant methods already need to hold
>>> the GIL (assuming that is true).
>
>> I don't think that it's important to mention it in the PEP. It's more
>> an implementation detail. The version can be protected by atomic
>> operations.
>
> Now I'm the one arguing from a specific implementation.  :D
>
> My thought was that any sort of locking (including atomic operations)
> is slow, but if the GIL is already held, then there is no *extra*
> locking cost. (Well, a slightly longer hold on the lock, but...)

Hum, since the PEP clarify targets CPython, I will simply described
its implementation, so explain that the GIL ensures that version++ is
atomic.


>>> On the one hand, you never need a strong reference to the value;
>>> if it has been collected, then it has obviously been removed from
>>> the dict and should trigger a change even with per-dict.
>>
>> Let's say that you watch the key1 of a dict. The key2 is modified, it
>> increases the version. Later, you test the guard: to check if the key1
>> was modified, you need to lookup the key and compare the value. You
>> need the value to compare it.
>
> And the value for key1 is still there, so you can.

Sorry, how do you want to compare that dict[key1] value didn't change,
using the value identifier? dict[key1] is old_value_id?

The problem with storing an identifier (a pointer in C) with no strong
reference is when the object is destroyed, a new object can likely get
the same identifier. So it's likely that "dict[key] is old_value_id"
can be true even if dict[key] is now a new object.


> The only reason you would notice that the key2 value had gone away is
> if you also care about key2 -- in which case the cached value is out
> of date, regardless of what specific value it used to hold.

I don't understand, technically, what do you mean by "out of date" for
an object?


>> If the dictionary values are modified during the loop, the dict
>> version is increased. But it's allowed to modify values when you
>> iterate on *keys*.
>
> Sure.  So?
>
> I see three cases:
>
> (A)  I don't care that the collection changed.  The python
> implementation might, but I don't.  (So no bug even today.)

I'm sorry, I don't understand your description. What do you mean by
"collection"? It's different if you modify dict *keys*, or dict
*values*, or both.

Serhiy opened an issue because he wants to raise an exception if keys
are modified while you iterate on keys:
https://bugs.python.org/issue19332

But only modifying values must *not* raise an exception.


> (B)  I want to process exactly the collection that I started with.  If
> some of the values get replaced, then I want to complain, even if
> python doesn't.  version_tag is what I want.

This is not the issue #19332.


> (C)  I want to process exactly the original keys, but go ahead and use
> updated values.  The bug still bites, but ... I don't think this case
> is any more common than B.

I don't understand exaclty your definition neither. Maybe you need to
provide an example of code.

Sorry, I don't understand why do you want to discuss the issue #19332
here. I only mentioned the issue in "Prior Work" because the
implementation is *similar*, but the PEP 509 is different and so it
doesn't help to fix this issue.

Do you want to modify the PEP 509 to fix this issue? Or you don't
understand why the PEP 509 cannot be used to fix the issue? I'm
lost...

Victor


More information about the Python-Dev mailing list