parsing tree from excel sheet

Sat Jan 31 04:07:55 EST 2015

alb wrote:

> > 
> > def read_tree(rows, levelnames):
> >    root = Node("#ROOT", "#ROOT")
> >    old_level = 0
> >    stack = [root]
> >    for i, row in enumerate(rows, 1):
> 
> I'm not quite sure I understand what is the stack for. As of now is a 
> list whose only element is root.

The stack is used to emulate the function call stack in a recursive 
solution. Every nested call produces a new set of local variables and the 
innermost call corresponds to the top of the stack or the end of the `stack` 
list in my code.

Given a tree

A
  A1
    A11
    A12
  A2

or in another notation

0 A
1 A1
2 A11
2 A12
1 A2

we start with a stack (with the node's children in parens)

[A()]

then look at A1 and see that the level is one deeper than that of A and 
append it to A, the current TOS (top of stack)

[A(A1)]

Next is A11 which is two levels deeper than A, so we take the last child of 
A and put it on the stack, 

[A(A1), A1()]

then add A11 as a child to the new TOS

[A(A1), A1(A11)]

Next is A12 which is one level deeper than A1, so we add it to the current 
TOS

[A(A1), A1(A11, A12)]

Next is A2 which is one level higher than A1, so we keep removing nodes from 
the stack until the current TOS is one level higher than A2. Here we only 
have to remove A1 from the stack to get

[A(A1)]

and after adding A2 to the TOS

[A(A1, A2)]

I have ommitted the children of the children so far, but the actual 
structure is now

[A(A1(A11(), A12()), A2())]

Therefore I just need to return A which contains the complete layout of the 
tree.

> >        new_level = column_index(row)
> >        node = Node(row[new_level], levelnames[new_level])
> 
> here you are getting the node based on the current row, with its level.
> 
> >        if new_level == old_level:
> >            stack[-1].append(node)
> 
> I'm not sure I understand here. Why the end of the list and not the 
> beginning?
> 
> >        elif new_level > old_level:
> >            if new_level - old_level != 1:
> >                raise ValueError
> 
> here you avoid having a node which is distant more than one level from 
> its parent.
> 
> >            stack.append(stack[-1].children[-1])
> 
> here I get a crash: IndexError: list index out of range!
> 
> >            stack[-1].append(node)
> >            old_level = new_level
> >        else:
> >            while new_level < old_level:
> >                stack.pop(-1)
> >                old_level -= 1
> >            stack[-1].append(node)
> 
> Why do I need to pop something from the stack??? Here you are saying 
> that if current row has a depth (new_level) that is smaller than the 
> previous one (old_level) I decrement by one the old_level (even if I may 
> have a bigger jump) and pop something from the stack...???
> 
> >    return root
> 
> once filled, the tree is returned. I thought the tree would have been 
> the stack, but instead is root...nice surprise.

> Why do I need to pop something from the stack???

That got me thinking, and I found that my code is indeed much more complex 
than necessary. Sorry for that ;)

As a compensation here is the simplified non-popping version of read_tree():

def read_tree(rows, levelnames):
    root = Node("#ROOT", "#ROOT")
    parents = [root] + [None] * len(levelnames)

    for row in rows:
        level = column_index(row)
        node = Node(row[level], levelnames[level])

        parents[level].append(node)
        parents[level+1] = node

    return root