parsing tree from excel sheet

alb al.basili at gmail.com
Fri Jan 30 10:05:13 EST 2015


Hi Peter, I'll try to comment the code below to verify if I understood 
it correctly or missing some major parts. Comments are just below code 
with the intent to let you read the code first and my understanding 
afterwards.

Peter Otten <__peter__ at web.de> wrote:
[]
> $ cat parse_column_tree.py
> import csv
> 
> def column_index(row):
>    for result, cell in enumerate(row, 0):
>        if cell:
>            return result
>    raise ValueError

Here you get the depth of your first node in this row.

> class Node:
>    def __init__(self, name, level):
>        self.name = name
>        self.level = level
>        self.children = []
> 
>    def append(self, child):
>        self.children.append(child)
> 
>    def __str__(self):
>        return "\%s{%s}" % (self.level, self.name)

Up to here everything is fine, essentially defining the basic methods 
for the node object. A node is represented univocally with its name and 
the level. Here I could say that two nodes with the same name cannot be 
on the same level but this is cosmetic.

The important part would be that 'Name' can be also 'Attributes', with a 
dictionary instead. This would allow to store more information on each 
node.

>    def show(self):
>        yield [self.name]

Here I'm lost in translation! Why using yield in the first place?
What this snippet is used for?


>        for i, child in enumerate(self.children):
>            lastchild = i == len(self.children)-1
>            first = True
>            for c in child.show():
>                if first:
>                    yield ["\---> " if lastchild else "+---> "] + c
>                    first = False
>                else:
>                    yield ["      " if lastchild else "|     "] + c

Here I understand more, essentially 'yield' returns a string that would 
be used further down in the show(root) function. Yet I doubt that I 
grasp the true meaning of the code. It seems those 'show' functions have 
lots of iterations that I'm not quite able to trace. Here you loop over 
children, as well as in the main()...

>    def show2(self):
>        yield str(self)
>        for child in self.children:
>            yield from child.show2()

ok, this as well requires some explanation. Kinda lost again. From what 
I can naively deduce is that it is a generator that returns the str 
defined in the node as __str__ and it shows it for the whole tree.

> def show(root):
>    for row in root.show():
>        print("".join(row))
> 
> def show2(root):
>    for line in root.show2():
>        print(line)

Here we implement the functions to print a node, but I'm not sure I 
understand why do I have to iterate if the main() iterates again over the 
nodes.

> 
> def read_tree(rows, levelnames):
>    root = Node("#ROOT", "#ROOT")
>    old_level = 0
>    stack = [root]
>    for i, row in enumerate(rows, 1):

I'm not quite sure I understand what is the stack for. As of now is a 
list whose only element is root.

>        new_level = column_index(row)
>        node = Node(row[new_level], levelnames[new_level])

here you are getting the node based on the current row, with its level.

>        if new_level == old_level:
>            stack[-1].append(node)

I'm not sure I understand here. Why the end of the list and not the 
beginning?

>        elif new_level > old_level:
>            if new_level - old_level != 1:
>                raise ValueError

here you avoid having a node which is distant more than one level from 
its parent.

>            stack.append(stack[-1].children[-1])

here I get a crash: IndexError: list index out of range!

>            stack[-1].append(node)
>            old_level = new_level
>        else:
>            while new_level < old_level:
>                stack.pop(-1)
>                old_level -= 1
>            stack[-1].append(node)

Why do I need to pop something from the stack??? Here you are saying 
that if current row has a depth (new_level) that is smaller than the 
previous one (old_level) I decrement by one the old_level (even if I may 
have a bigger jump) and pop something from the stack...???

>    return root

once filled, the tree is returned. I thought the tree would have been 
the stack, but instead is root...nice surprise.

> 
> def main():
[strip arg parsing]

>    with open(args.infile) as f:
>        rows = csv.reader(f)
>        levelnames = next(rows) # skip header
>        tree = read_tree(rows, levelnames)

filling the tree with the data in the csv.

> 
>        show_tree = show2 if args.latex else show
>        for node in tree.children:
>            show_tree(node)
>            print("")

It's nice to define show_tree as a function of the argument. The for 
loop now is more than clear, traversing each node of the tree.

As I said earlier in the thread there's a lot of food for a newbie, but 
better going through these sort of exercises than dumb tutorial which 
don't teach you much.

Al



More information about the Python-list mailing list