R: Re: Sorted list as an alternative to dictionary for when youonly need keys?

Mon Jul 19 18:25:36 EDT 2004

> Sad that sets are really just dicts.
Why does this make you sad? :) In Python everything is a dict. Classes are,
and they do need quick lookups. Chances are that they are pretty well
optimized, arent they?

> All bisect should be doing is taking len() and dividing it by two ( //
The implementation is what you sketched here and you can read it in
bisect.py. It's relatively fast: you use it each time you look for a
telephone number (but inserting a new number is much more expensive). Anyway
the lookup complexity is o(log(n)).

Looking into an hash table has about a constant time. Take a look, for
example, at http://www.sparknotes.com/cs/searching/hashtables/section1.html
to see how it works. Its efficiency depends on how you manage collisions and
how you hash elements.

So using sets or also keys in a dict makes your search leverage on the
collisions management  used for all the internal Python machinery, including
class attributes lookup. What you need is an hashing function for your
objects. If you look up for immutables (strings, numbers and tuples of
immutables) you already have it.

The bisection algorithm is always o(log(n)), that means that on an average
it takes only one step more each time the size of your problem doubees. It
doesn't sound too bad, but the hash lookup is o(1) - about constant time
regardless of the problem size. So, it doesn't matter how fast is your
language and how clever you are in optimizing a bisect algorithm: there will
always be a problem big enough to make bisect slower.

Take a couple of benchmark with the timeit module to convince yourself: it
will be instructive.