Balanced trees

Mon Mar 10 12:20:05 EDT 2014

On Tue, Mar 11, 2014 at 12:59 AM, Roy Smith <roy at panix.com> wrote:
> The hash vs. tree argument can get very complicated.  For example, if
> your tree is not completely resident in memory, the cost of paging in a
> node will swamp everything else, and improving lookup speed will boil
> down to reducing the number of I/O operations you need.  An expensive
> hash plus a linear walk through a collision chain which was resident in
> a single memory block will beat traversing two nodes, each of which had
> to be paged in separately.

Indeed, which is broadly an extension of the "cache locality" argument.

I've never actually yearned for any of the advanced operations a tree
can give (over a hash table). Usually, by the time I'm looking for
that sort of thing, I really want an on-disk database - that solves
all the problems of paging (the DBMS will make sure it reads in a
minimum of index and data pages), and a good SQL database can handle
multiple indexes in a space-efficient way. Plus, what you said about
log 1,000,000? By the time you're looking at a million records, you
probably (a) need to have them on disk somewhere anyway, and (b) don't
want to read them all into RAM before you begin.

ChrisA