Linear Time Tree Traversal Generator

ROGER GRAYDON CHRISTMAN dvl at psu.edu
Wed Sep 21 10:39:53 EDT 2016


Since I was asked how I got different running times for my two code
fragments, I'll try to apply them to a very simple tree:
      4
    2   6
   1 3 5 7

>    def __iter__(node):
>          for x in iter(node._left):
>               yield x
>          yield node._value
>          for x in iter(node._right)
>               yield x

Nodes 1, 3, 5, 7 each yield exactly 1 value -- total 4 yield statements
Node  2  yields  1, 2, 3                             3 yield statements
Node  6  yields  5, 6, 7                             3 yield statements
Node  4  with its for loops yields all 7 values      7 yield statements

Total number of yields is 17
4 values (approx n/2) had to be yielded 3 times (approx log n)
Hence total time is O(n log n)

>     def to_list(node,result):
>           """append node information to result"""
>           result = to_list(node._left, result)
>           result.append(node._value)
>           return to_list(node,_right, result)

Initially called at root (Node 4) with result = []
Empty result is based from Node 4 to Node 2 to Node 1 -> [1]
Node 2 appends 2, then Node 3 appends 3
These are returned to Node 4, who appends 4, etc.

Total number of append operations is 7, far less than 17

Now I suppose my new lists are also being 'sent upward'
with those return statements, but it is also very clear that
the total number of return statements executed is also 7,
far less than 17.

Hence total time is still O(n)

---

I saw one or two responses that said that it was inherently
flawed to do this as a binary tree in the first place.
Well, certainly updating the binary tree is faster than
a regular array or relational database, since it does not
require linear-time insertions and removals.

And for those who favor using dict for its very fast access,
you display your search keys in sorted order while they
are still in the dict.  The sorted() function copies all 
those out into an array and sorts the array, which
is of course, expected to be O(n log n).

Which only highlights my disappointment that my tree
traversal itself was O(n log n), unless I gave up on yield.
As a teacher I'm stuck with either
a) a data structure that can only get its maximal speed
   by not having any iterator in the normal sense
   (which in my mind sounds non-Pythonic)
b) can support iteration, but then is no better than a
   dict in any way, except for maybe some rare operation like
   "identify the minimal value"

Roger Christman
instructor
Pennsylvania State Universtiy





More information about the Python-list mailing list