Linear Time Tree Traversal Generator

Tue Sep 20 20:31:44 EDT 2016

On Tue, Sep 20, 2016 at 9:46 AM, ROGER GRAYDON CHRISTMAN <<a target=""
href="https://webmail.psu.edu/webmail/retrieve.cgi?mailbox=inbox&mid=CAO1D73Eho37guUZkM6Kk9Nu5WB43oN721G33XGNis0LhSu7xVQ%40mail%2egmail%2ecom&cmd=view&start_num=10800&limit=50&sort=0&display=4&headers=default&safe_view=1&tab=1&status=0&timestamp=20160920201523#">dvl at psu.edu</a>> wrote:
>I am trying to find a better (i.e. more efficient) way to implement a generator
>
that traverses a tree.
>
>
The current model of the code (which is also used by a textbook I am teaching
>
from does this)
>
>
   def __iter__(node):
>
         for x in iter(node._left):
>
              yield x
>
         yield node._value
>
         for x in iter(node._right)
>
              yield x
>
>
This is nice, simple, and straightforward, but has an O(n log n) running time,
>
since
>
values from the leaves of the tree have to be yielded multiple times to the top
>
of the tree.
>

Justin Walters <walters.justin01 at gmail.com> responded:

> Are the left and right attributes collections of more nodes or are they simply references to the node's position in the tree? 
> From the code provided it seems like the former is true and a node's left attribute is a reference to another node?

They are references to other nodes, which in turn have values and left and right references to more nodes, etc.
as in a particular tree.  There are no 'collections' in the sense of set or map or Python list, just recursive sub-trees.

> I don't know how flexible you are with the structure of your tree, 
but you could try taking the modified pre-order tree traversal approach.

> This article explains it in the context of a database, but the idea is the same: <https://www.sitepoint.com/hierarchical-data-database-2/> 

> Each node would then have a parent attribute as well as left and 
right attributes. The parent would be a reference to a parent node, and 
the left and right would be integers that position the element in the 
tree. 

> The upside to this is that traversal and lookup is much faster 
since you do not need to have an iterator nested in an iterator. This is
 because the top level node will always have the lowest integer as it's 
left attribute and the highest integer as it's right attribute. That 
means that you can instead have a single iterator that iterates through a
 range of integers to yield the node with the specified left and right 
values.

I am in academia, and the current unit is the binary search tree in its conventional sense, instead of the tabular version.
Indeed that table would be faster for iteration, but at the increased cost of slowing down adding and removing elements.
If I were to use any array-based model, as this database link shows, all insertions and deletions would be linear-time
operations instead of logarithmic time.  

I would contrast the time to insert N elements and then to traverse the tree once, where the N elements arrive in any order:
Tabular approach N linear-time insertions -> quadratic time plus a linear-time tree traversal afterwards.
Tree approach:   N logarithm-time insertions plus an iteration that takes a time proportional to all that -- it still wins.
Since traversal is 'rare', I don't want to penalize the common behaviors to accelerate this traversal.

But I would still like to see if there is a linear traversal possible with the help if the itertools chain

Roger Christman
instructor
Pennsylvania State University