[Python-Dev] binary trees. Review obmalloc.c

Josiah Carlson jcarlson at uci.edu
Fri May 5 20:27:42 CEST 2006


"Vladimir 'Yu' Stepanov" <vys at renet.ru> wrote:
> Josiah Carlson wrote:
> > However, as I just said, people usually don't remove items from
> > just-sorted lists, they tend to iterate over them via 'for i in list:' .
> >   
> Such problem arises at creation of the list of timers.

I've never seen that particular use-case.

Please understand something.  I believe that trees can be useful in some
cases. However, I don't believe that they are generally useful
enough in Python, given that we already have key,value dictionaries and
sortable lists.  They become even less worthwhile in Python, given the
total ordering problem that I describe later in this message.


> Except for that function of binary search is absent in standard
> package Python.

Binary search exists in the standard library as the bisect module.


> Here that I have found through Google on a line " order statistic binary 
> tree ":
> 
>     http://www.cs.mcgill.ca/~cs251/OldCourses/1997/topic20/

Yes, that link does describe what an 'order statistic tree' is.  One
thing to note is that you can add order statistics to any tree wih one
more field on each tree node.  That is, you can have your AVL or
Red-Black tree, and add order statistics to it.  Allowing tree['x'] to
return the associated value for the key 'x' (the same as a dictionary),
as well as offer the ability to do tree.select(10) to get the 10th (or
11th if one counts from 0) smallest key (or key,value), or get the 'rank'
of a key with tree.rank('x').

> > Of course all of this discussion about trees in Python is fine, though
> > there is one major problem.  With minimal work, one can construct 3
> > values such that ((a < b < c) and (c < a)) is true.
> >   
> The same problem is inherent in the standard dictionary - dict (), only
> roots at it, it is others. For example:

This problem has nothing to do with dictionaries and hashing, it has to
do with the fact that there may not be a total ordering on the elements
of a sequence.

>>> 'b' < (0,) < u'a' < 'b'
True
>>> x = ['b', (0,), u'a']
>>> random.shuffle(x)
>>> x.sort()
>>> x
[u'a', 'b', (0,)]
>>> random.shuffle(x)
>>> x.sort()
>>> x
['b', (0,), u'a']
>>>


What effect does this have on trees?  Imagine, for a moment, that your
tree looked like:

    (0,)
   /    \
 'b'    u'a'

Then you added sufficient data so that your tree was rebalanced to be
something like:

    u'a'
    /   \
  (0,) (stuff)
  /
'b'

If you were searching for 'b', you would never find it, because in your
search, you would compare 'b' against u'a', and take the right branch,
where 'b' isn't.

 - Josiah



More information about the Python-Dev mailing list