Python Tree datastructure comparison

Dan Stromberg drsalists at gmail.com
Tue Jun 5 20:05:14 EDT 2012


I've put together a comparison of some tree datastructures for Python, with
varied runtime and varied workload:

http://stromberg.dnsalias.org/~strombrg/python-tree-and-heap-comparison/

I hope to find time to add heaps to the article at some point, but for now,
it only covers trees and the treap.

The short version:

   1. The 2-3 tree gave incorrect results, so I eliminated it.  This may or
   may not have been my fault.
   2. The Red-black tree was performing so slowly that it was making the
   graphs hard to read, so I eliminated it.  It actually seemed to be doing
   pretty well at first, until I realized that garbage collections were
   becoming very slow with red-black trees.
   3. The top three performers were, under various workloads and with
   various runtimes: The AVL Tree, The Splay Tree and The Treap.
   4. The Treap was consistently either in first or second place.
   5. All the datastructures examined were ported to python 3 in such a way
   that they would continue to work on python 2 - I've at times taken
   liberties like allowing log(n) range's, which of course are eagerly
   evaluated in python 2.  The modules were also modified to pass pylint.

These are all trees (and the treap), for now, with dictionary-like
interfaces, that allow you to do things like look up the least (or
greatest) value in log(n) time.  The need for such a datastructure is
uncommon, but does come up from time to time - implementing a cache is
foremost in my mind.  If you just need fast lookups without any particular
ordering, you're almost always going to be better off with a hash table
(dictionary).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20120605/47d1e996/attachment.html>


More information about the Python-list mailing list