Balanced trees

Chris Angelico rosuav at gmail.com
Mon Mar 10 12:24:31 EDT 2014


On Tue, Mar 11, 2014 at 12:59 AM, Roy Smith <roy at panix.com> wrote:
> Looking at the Songza source, I see we have one user-defined hash
> function:
>
>     def __hash__(self):
>         return hash((self.song,
>                      self.user,
>                      self.add_time,
>                      self.delete_time,
>                      self.play_first))
>
> where those things are 2 bson.ObjectId's, 2 datetimes, and a boolean.  I
> wouldn't be surprised if that falls into the 20x node traversal bucket.
> In this case, convenience was more important than speed, so it doesn't
> matter.

The only difference between a tree and a hash here is that the tree
might be able to short-cut the comparisons. But if there are a whole
bunch of songs with the same "song" and "user", then the tree has to
compare (song->song? same; user->user? same; add_time->add_time?
left/right) multiple times. So what you're saying is that the hash
table has consistency but the tree could do better or could do worse.
(Imagine, worst case, all one million records have the same
song/user/add_time and you need to do twenty comparisons involving
four fields. That's gotta be worse than one hashing of five fields.)

ChrisA



More information about the Python-list mailing list