Linear Time Tree Traversal Generator

Steve D'Aprano steve+python at pearwood.info
Tue Sep 20 22:34:13 EDT 2016


On Wed, 21 Sep 2016 02:46 am, ROGER GRAYDON CHRISTMAN wrote:

> I am trying to find a better (i.e. more efficient) way to implement a
> generator that traverses a tree.
> 
> The current model of the code (which is also used by a textbook I am
> teaching from does this)
> 
>    def __iter__(node):
>          for x in iter(node._left):
>               yield x
>          yield node._value
>          for x in iter(node._right)
>               yield x
> 
> This is nice, simple, and straightforward, but has an O(n log n) running
> time, since
> values from the leaves of the tree have to be yielded multiple times to
> the top of the tree.


I'm afraid I don't understand this. This is a standard binary tree inorder
traversal. Each node is visited once, and there are N nodes, so I make that
out to be O(N) not O(N log N). I'm afraid I can't parse your final clause:

    "since values from the leaves of the tree have to be yielded 
    multiple times to the top of the tree"

Each leaf is yielded once, and I don't understand what you mean by yielding
to the top of the tree.


I thought that maybe I missed something obvious, so I knocked up a quick and
dirty tree structure, monkey-patched the iter() built-in to count how many
times it was called, and ran your traversal code. Here's my code:


# ----- cut -----


class Node:
    def __init__(self, value):
        self._left = []  # Sentinel for an empty leaf.
        self._right = []
        self._value = value
    def __iter__(node):
        # By the way, you don't have to explicitly call iter() here.
        for x in iter(node._left):
            yield x
        yield node._value
        for x in iter(node._right):
            yield x


def insert(tree, value):
    if value < tree._value:
        if tree._left == []:
            # add new leaf node
            tree._left = Node(value)
        else:
            insert(tree._left, value)
    elif value > tree._value:
        if tree._right == []:
            tree._right = Node(value)
        else:
            insert(tree._right, value)


data = list(range(10000))
import random
random.shuffle(data)

tree = Node(data[0])
for value in data[1:]:
    insert(tree, value)

_iter = iter  # grab a reference to the built-in
count = 0
def iter(obj):  # Monkey-patch the built-in
    global count
    count += 1
    return _iter(obj)


assert list(tree) == sorted(data)
print(count)


# ----- cut -----


which prints 20000, as expected: for each node, iter() gets called twice.

So I don't understand where your O(N log N) behaviour is coming from.







-- 
Steve
“Cheer up,” they said, “things could be worse.” So I cheered up, and sure
enough, things got worse.




More information about the Python-list mailing list