[pypy-dev] Object identity and dict strategies

exarkun at twistedmatrix.com exarkun at twistedmatrix.com
Fri Jul 8 17:04:50 CEST 2011


On 02:17 pm, fijall at gmail.com wrote:
>On Fri, Jul 8, 2011 at 4:14 PM, Amaury Forgeot d'Arc 
><amauryfa at gmail.com> wrote:
>>2011/7/8 Cesare Di Mauro <cesare.di.mauro at gmail.com>:
>>>I fully agree. It's not an issue, but an implementation-specific 
>>>detail
>>>which programmers don't have to assume always true.
>>>
>>>CPython can be compiled without "smallints" (-5..256, if I remember
>>>correctly) caching. There's a #DEFINE that can be disabled, so EVERY 
>>>int (or
>>>long) will be allocated, so using the is operator will return False 
>>>most of
>>>the time (unless you are just copied exactly the same object).
>>>
>>>The same applies for 1 character strings, which are USUALLY cached by
>>>CPython.
>>
>>But the problem here is not object cache, but preservation of object 
>>identity,
>>which is quite different.
>>Python containers are supposed to keep the objects you put inside:
>
>[citation needed] array.array does not for one

Yes, and array.array is weird. :)  It either exists as a memory 
optimization (ie, I don't want objects) or a way to directly lay out 
memory (to pass to a C API).  Either way, you can't put arbitrary 
objects into it either - so it's already a little special, even if you 
disregard the fact that it doesn't preserve the identify the objects you 
can put into it.

However, you're right.  It exists, and it has this non-identity- 
preserving behavior.  Is it a good thing, though?  Or just an accident 
of how someone tried to let CPython be faster for some types of 
problems?
>>
>>myList.append(x)
>>assert myList(-1) is x
>>
>>myDict[x] = 1
>>for key in myDict:
>>   if key is x:
>>       ...
>
>also dict doesn't work if you overwrite the key:
>
>d = {1003: None}
>x = 1003
>d[x] = None
>d.keys()[0] is x

This doesn't invalidate the original point, as far as I can tell.  It 
just demonstrates again that you can have two instances of 1003. 
Whether dict guarantees to always use the new key or the old key when an 
update is made is a separate question.

I think it would be better if object identity didn't depend on this 
mysterious quality of "immutability".  The language is easier to 
understand (particularly for new programmers) if one can talk about 
objects and references without having to also explain that _some_ data 
types are represented using things that are sort of like objects but not 
quite (and worse if it depends on what types the JIT feels like playing 
with in any particular version of the interpreter).

Jean-Paul


More information about the pypy-dev mailing list