parsing tree from excel sheet

Fri Jan 30 12:11:28 EST 2015

alb wrote:

> Hi Peter, I'll try to comment the code below to verify if I understood
> it correctly or missing some major parts. Comments are just below code
> with the intent to let you read the code first and my understanding
> afterwards.

Let's start with the simplest:

> Peter Otten <__peter__ at web.de> wrote:

>>    def show2(self):
>>        yield str(self)
>>        for child in self.children:
>>            yield from child.show2()
> 
> ok, this as well requires some explanation. Kinda lost again. From what
> I can naively deduce is that it is a generator that returns the str
> defined in the node as __str__ and it shows it for the whole tree.

Given a tree

A --> A1
      A2 --> A21
             A22
      A3

assume a slightly modified show2():

def append_nodes(node, nodes):
    nodes.append(node)
    for child in node.children:
        append_nodes(child, nodes)

When you invoke this with the root node in the above sample tree and an 
empty list

nodes = []
append_nodes(A, nodes)

the first thing it will do is append the root node to the nodes list

[A]

Then it iterates over A's children:

append_nodes(A1, nodes) will append A1 and return immediately because A1 
itself has not children.

[A, A1]

append_nodes(A2, nodes) will append A2 and then iterate over A2's children.
As A21 and A22 don't have any children append_nodes(A21, nodes) and 
append_nodes(A22, nodes) will just append the respective node with no 
further nested ("recursive") invocation, and thus the list is now

[A, A1, A21, A22]

Finally the append_nodes(A3, nodes) will append A3 and then return because 
it has no children, and we end up with

nodes = [A, A1, A21, A22, A3]

Now why the generator? For such a small problem it doesn't matter, for large 
datasets it is convenient that you can process the first item immmediately, 
when the following ones may not yet be available. It also becomes easier to 
implement different treatment of the items or to stop in the process:

for deer in hunt():
    kill(deer)
    if have_enough_food():
        break

for animal in hunt():
    take_photograph(animal)
    if not_enough_light():
        break

Also, you never need more than one item in memory instead of the whole list 
for many problems.

Ok, how to get from the recursive list building to yielding nodes as they 
are encountered? The basic process is always the same:

def f(items)
   items.append(3)
   items.append(6)
   for i in range(10):
       items.append(i)

items = []
f(items)
for item in items:
   print(item)

becomes

def g():
    yield 3
    yield 6
    for i in range(10):
        yield i

for item in g():
    print(items)

In Python 3.3 there was added some syntactic sugar so that you can write

def g():
    yield 3
    yield 6
    yield from range(10)

Thus

def append_nodes(node, nodes):
    nodes.append(node)
    for child in node.children:
        append_nodes(child, nodes)

becomes

def generate_nodes(node):
    yield node
    for child in node.children:
        yield from generate_nodes(child)

This looks a lot like show2() except that it's not a method and thus the 
node not called self and that the node itself is yielded rather than 
str(node). The latter makes the function a bit more flexible and is what I 
should have done in the first place.

The show() method is basically the the same, but there are varying prefixes 
before the node name. Here's a simpler variant that just adds some 
indentation. We start with generate_nodes() without the syntactic sugar. 
This is because we need a name for the nodes yielded from the nested 
generator call so that we can modify them:

def indented_nodes(node):
    yield node
    for child in node.children:
        for desc in from indented_nodes(child):
            yield desc

Now let's modify the yielded nodes:

def indented_nodes(node):
    yield [node]
    for child in node.children:
        for desc in indented_nodes(child):
            yield ["***"] + desc

How does it fare on the example tree? 

A --> A1
      A2 --> A21
             A22
      A3

The lists will have an "***" entry for every nesting level, so we get

[A]
["***", A1]
["***", A2]
["***", "***", A21]
["***", "***", A22]
["***", A3]

With "".join() we can print it nicely:

for item in indented_nodes(tree):
    print("".join(item))

But wait, "".join() only accepts strings so let's change

    yield [node]

to 
    yield [node.name] # str(node) would also work

A
***A1
***A2
******A21
******A22
***A3

>> def show2(root):
>>    for line in root.show2():
>>        print(line)

> Here we implement the functions to print a node, but I'm not sure I 
> understand why do I have to iterate if the main() iterates again over the 
> nodes.

Your example had the structure

A
 A1
   A11
   A12
 A2

and I was unsure if there could be data files that have multiple root nodes, 
e. g.

A
 A1
   A11
   A12
 A2
B
 B1
 B2

To simplify the handling of these I introduced an artificial root R

R
 A
  A1
    A11
    A12
  A2
 B
  B1
  B2

which makes all toplevel nodes in the data file children of R. In the
main() function I iterate over R's children to hide R from the user.

You can replace

        for node in tree.children:
            show_tree(node)
            print("")

in my original code with

        show_tree(tree)

to see the hidden node.

I may address the rest of your post later unless someone else does. In the 
mean time, can you please provide the data file that triggers the IndexError 
to help me with the debugging?