Balanced trees

Marko Rauhamaa marko at pacujo.net
Sun Mar 9 18:24:01 EDT 2014


Dan Stromberg <drsalists at gmail.com>:

> On Sun, Mar 9, 2014 at 2:43 PM, Marko Rauhamaa <marko at pacujo.net> wrote:
>>>> There is no O(1) hash table.
> [...]
>
> it's still amortized O(1).

So we are in perfect agreement.

Hash tables are a useful family of techniques but involve quite a bit of
cost/benefit heuristics. You can only claim O(1) if your hash table is
at least as large as the number of elements.

As for comparing them with balanced trees, I think one of the main
advantages hash tables have is better CPU cache performance. A tree
involves much more jumping around the RAM than a hash table.

Still, trees have many things going for them:

 * no need to find a good hashing function

 * no need to spend time on calculating the hash value

 * no need to find optimal hash table sizes -- no costly resizing

And of course, the main point:

 * trees are ordered

Note that most key comparisons tend to be very quick as you don't need
to traverse the whole key to locate the element. Also, it is hard to say
if going O(log n) levels deep in the tree is slower than calculating the
hash value, although, as I said, the latter operation tends to benefit
from a cache locality.

Trees are also crucial when you don't have exact matches, but for
example, when you are looking for prefix or wild-card matches.


Marko



More information about the Python-list mailing list