Balanced trees

Chris Angelico rosuav at gmail.com
Mon Mar 10 12:20:05 EDT 2014


On Tue, Mar 11, 2014 at 12:59 AM, Roy Smith <roy at panix.com> wrote:
> The hash vs. tree argument can get very complicated.  For example, if
> your tree is not completely resident in memory, the cost of paging in a
> node will swamp everything else, and improving lookup speed will boil
> down to reducing the number of I/O operations you need.  An expensive
> hash plus a linear walk through a collision chain which was resident in
> a single memory block will beat traversing two nodes, each of which had
> to be paged in separately.

Indeed, which is broadly an extension of the "cache locality" argument.

I've never actually yearned for any of the advanced operations a tree
can give (over a hash table). Usually, by the time I'm looking for
that sort of thing, I really want an on-disk database - that solves
all the problems of paging (the DBMS will make sure it reads in a
minimum of index and data pages), and a good SQL database can handle
multiple indexes in a space-efficient way. Plus, what you said about
log 1,000,000? By the time you're looking at a million records, you
probably (a) need to have them on disk somewhere anyway, and (b) don't
want to read them all into RAM before you begin.

ChrisA



More information about the Python-list mailing list