Structures

Steven D'Aprano steve at REMOVE-THIS-cybersource.com.au
Mon Nov 3 21:49:25 EST 2008


On Mon, 03 Nov 2008 17:06:07 -0800, Aaron Brady wrote:

>> For all practical purposes, dicts have almost constant access time (at
>> least with any half-decent __hash__  method).
> 
> Hash tables are slick, but their hash function is their weakest link.
> 
>>>> [ hash( 2**x ) for x in range( 0, 256, 32 ) ]
> [1, 1, 1, 1, 1, 1, 1, 1]

That's not the hash function of a dict. Dicts don't have a hash function:

>>> hash({})
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: dict objects are unhashable


What you're showing is the hash function of ints, and you've discovered 
that there are some collisions. This is hardly surprising. There are an 
infinite number of ints, and only a finite number of values that they can 
hash to, so there have to be some collisions. It's worth looking at the 
values that collide:

>>> [ 2**x for x in range( 0, 256, 32 ) ]
[1, 4294967296L, 18446744073709551616L, 79228162514264337593543950336L, 
340282366920938463463374607431768211456L, 
1461501637330902918203684832716283019655932542976L, 
6277101735386680763835789423207666416102355444464034512896L, 
26959946667150639794667015087019630673637144422540572481103610249216L]

So if you have a dict with keys equal to those numbers, and do a lookup 
on one of them, you'll get O(N) performance instead of O(1) *for those 
keys that collide*. I think I can live with that.

If you think you can come up with a better hash function, feel free to 
subclass int.


> Such an index gives you linear-time performance in finding elements. Ha!

In any real application, all your keys are highly unlikely to collide in 
that fashion.



>  Plus, your worst-case insertion will cause a copy of the entire table. 

Explain please.


> There aren't any mappings based on balanced trees, unfortunately.

I'm not sure why you think O(log N) is better than O(1). The major 
advantage of tree structures is that, unlike hash tables, they are 
ordered, not that they don't have collisions.



-- 
Steven



More information about the Python-list mailing list