finding items that occur more than once in a list

Tue Mar 18 11:59:04 EDT 2008

Hrvoje Niksic wrote:

> This doesn't apply to Python, which implements dict storage as an
> open-addressed table and automatically (and exponentially) grows the
> table when the number of entries approaches 2/3 of the table size.
> Assuming a good hash function, filling the dict should yield amortized
> constant time for individual additions.

OK. I obviously need to look up open-addressed tables. I thought this
was just, in effect, implicit linked listing - ie it still needs a
linear search to handle collisions, it just avoids the need for
explicitly stored link fields. Perhaps I'm mistaken.

As for the growth pattern, each time you grow the table you have to
redistribute all the items previously inserted to new locations.
Resizes would get rarer as more items are added due to the exponential
growth, but every table resize would take longer too since there are
more items to move. Inserting n items still intuitively looks like
O(n^2) to me.

That said, it does remind me of that old exponential realloc trick for
array resizing. Same thing, I suppose, since a hash table is basically
an array. Maybe my math "intuition" is just wrong.