[Numpy-discussion] numarray: Possible hash collision problem
David M. Cooke
cookedm at physics.mcmaster.ca
Wed Sep 28 11:46:40 EDT 2005
"Edward C. Jones" <edcjones at comcast.net> writes:
> hash(numarray.arange(1000)) == hash(numarray.arange(10000))
>
> The hash value changes each time I enter the Python interpreter. I have
> always assumed that hashing was deterministic. Is it?
Not suprising: I also get this:
hash(object()) == hash(object())
Looking through the source, I think the hash for an array is
determined by the object base class, and hence is the id() of the
array. The code above can be written long hand as
a = numarray.arange(1000)
ha = hash(a) # in this case, hash(a) == id(a)
del a
b = numarray.arange(10000)
hb = hash(b) # in this case, hash(b) == id(b)
del b
ha == hb
It's those (implicit) del statements that mean that a and b are stored
to the same location in memory, and hence have the same id(): there's
no other object created in the interpreter between when a is deleted
and b is created.
Basically, id() of a object is guaranteed to be unique *amongst all
active objects*. It is _not_ guaranteed to be different from objects
that have been created and destroyed.
This will return false:
a = numarray.arange(1000)
b = numarray.arange(10000)
hash(a) == hash(b)
as a and b still both exist.
Since arrays are mutable, there's no good way to get a content-based hash.
--
|>|\/|<
/--------------------------------------------------------------------------\
|David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/
|cookedm at physics.mcmaster.ca
More information about the NumPy-Discussion
mailing list