Caculate age

Wed Feb 5 04:30:12 EST 2003

On Tue, 04 Feb 2003 09:51:45 GMT, Alex Martelli <aleax at aleax.it> wrote:

>Bengt Richter wrote:
>   ...
>> """
>> Numeric types used for keys obey the normal rules for numeric comparison:
>> if two numbers compare equal (e.g., 1 and 1.0) then they can be used
>> interchangeably to index the same dictionary entry.
>> """
>> 
>> <speculation>
>> IWT this would create a slight performance hit looking for a match when
>> the hash value is not there from a previous operation (i.e., do you
>> potentially have to hash an integer, convert to float and hash that, and
>
>No, there's a constraint on hash(): two objects that compare equal, a==b, 
>MUST ensure hash(a)==hash(b) if they're both usable as dict keys.
>
That constraint is not necessary for the internal hash in the implementation
I was describing, but it would be an optimization.

>>>> hash(1)
>1
>>>> hash(1.0)
>1
>>>>
>
>> This is compared to strings, that can't have equivalent values with
>> different hashes AFAIK, unless maybe there is a unicode equivalency that
>
>NO two objects can compare equal, be both usable as dict keys,
>AND return different hashes.
>
Well, I was thinking of a possible internal use of hashing which definitely
would not hash 1 and 1.0 to the same value. I am pretty sure it's a feasible
implementation, but special-casing hashing of numeric values probably speeds
things. I was thinking of the general method that hashes an object effectively
into a uniformly distributed random number modulo some prime to use as an index
into an array hash table to look up information. In the table is then a pointer
to either a single key-value/data-value pair of references, or a linked list
of them, if there has been (random style as mentioned) hash collisions (I know
there are other ways of dealing with collisions, this is just easy to explain).

So I was musing on how to use this mechanism with numeric objects that have different
representational bit patterns, and random-hashing those and getting different array
indices. The approach was to take the normal associated key-value/data-value reference
pair and substitute a reference to the first-established numerically equivalent numeric
key-value. The problem was then how to find the latter, and I thought that could be
done by re-hashing after converting to alternate numeric representations when feasible
such as float->int or long, or int/long to float up to +-2**53.

This could work, but it would have extra overhead for new keys, so it is on reflection
more efficient to do a conversion up front within the hash algorithm to make equivalent numbers
hash consistently to equivalent hash values. The ultimate semantics of the dict implementation
would be the same, though of course the meaning of hash would be different if you just
exposed the internally used algorithm more or less directly.

Anyway, this is the kind of internal hashing strategy I was thinking about.

As you know, but which might help clarify the picture for others, equal objects hash
(via the builtin hash function) to the same hash value, but unequal objects can also
hash to equal hash values. Here are some unequal keys with equal hashes:

 >>> d = {1540938112: 'x', 1125901447518592.0: 'f', 'abcd': 'z', 5835905407L: 'y'}
 >>> map(hash,d.keys())
 [1540938112, 1540938112, 1540938112, 1540938112]

Regards,
Bengt Richter