Balanced trees

Wed Mar 19 09:42:04 EDT 2014

On Wed, 19 Mar 2014 10:49:33 +0200, Marko Rauhamaa wrote:

> Steven D'Aprano <steve+comp.lang.python at pearwood.info>:
> 
>> If you are in a position to randomize the data before storing it in the
>> tree, an unbalanced binary tree is a solid contender.
> 
> Applications that can assume randomly distributed data are exceedingly
> rare 

Please re-read what I wrote. I didn't say "if your data comes to you 
fully randomized". I said "if you are in a position to randomize the data 
before storing it". In other words, if you actively randomize the data 
yourself.

> and even then, you can't easily discount the possibility of
> worst-case ordering.

Of course you can. If you have a million items, then the odds that those 
million items happen to be sorted (the worst-case order) are 1 in a 
million factorial. That's a rather small number, small enough that we can 
discount it from ever happening, not in a million lifetimes of the 
Universe.

Of course, you would be right to point out that one would also like to 
avoid *nearly* worst-case ordering. Nevertheless, there are Vastly more 
ways that the data will be sufficiently randomized as to avoid degenerate 
performance than the worst-case poor performance.

> In fact, Dan himself said unbalanced trees performed so badly he
> couldn't test them satisfactorily.

You're misrepresenting what Dan said. What he actually said was that 
unbalanced trees perform second only to dicts with random data, only with 
sorted data do they perform badly. His exact words were:

"For a series of random keys, it would do quite well (probably second only
to dict), but for a series of sequential keys it would take longer than
anyone would reasonably want to wait"

-- 
Steven D'Aprano
http://import-that.dreamwidth.org/