Balanced trees

Chris Angelico rosuav at gmail.com
Mon Mar 10 13:17:15 EDT 2014


On Tue, Mar 11, 2014 at 4:08 AM, Marko Rauhamaa <marko at pacujo.net> wrote:
> Chris Angelico <rosuav at gmail.com>:
>
>> The only difference between a tree and a hash here is that the tree
>> might be able to short-cut the comparisons. But if there are a whole
>> bunch of songs with the same "song" and "user", then the tree has to
>> compare (song->song? same; user->user? same; add_time->add_time?
>> left/right) multiple times. So what you're saying is that the hash
>> table has consistency but the tree could do better or could do worse.
>> (Imagine, worst case, all one million records have the same
>> song/user/add_time and you need to do twenty comparisons involving
>> four fields. That's gotta be worse than one hashing of five fields.)
>
> You are right that in the end there probably is an equality test
> involving all fields of the key. However, the intermediate comparisons
> are often dealt with much more immediately. For example, to compare the
> words "hello" and "world", you only need to look at the first letter.

Right, which is what I meant about the consistency of a hash table.
You know, with a hash table, that you're going to need to calculate
the hash across everything. The tree lets you cheaply add
disambiguation fields, knowing they'll be used only if they're needed.

Oddly enough, the consistency in design is inverted by the
predictability under attack. If an attacker might be able to generate
hash collisions, the tree (assuming it's self-balancing) will have its
worst-case much closer to its average/typical cases, which means the
tree is more consistent. Strange how that goes, sometimes.

> Anyway, this whole debate is rather unnecessary since every developer is
> supposed to have both weapons in their arsenal.

Supposed to have? What does that mean, a language isn't ISO-compliant
unless it provides both? With a high level language like Python, using
the provided hash table will almost always cream any hand-built tree,
no matter how advantageous the data is to the tree.

ChrisA



More information about the Python-list mailing list