parsing tree from excel sheet
alb
al.basili at gmail.com
Fri Jan 30 10:05:13 EST 2015
Hi Peter, I'll try to comment the code below to verify if I understood
it correctly or missing some major parts. Comments are just below code
with the intent to let you read the code first and my understanding
afterwards.
Peter Otten <__peter__ at web.de> wrote:
[]
> $ cat parse_column_tree.py
> import csv
>
> def column_index(row):
> for result, cell in enumerate(row, 0):
> if cell:
> return result
> raise ValueError
Here you get the depth of your first node in this row.
> class Node:
> def __init__(self, name, level):
> self.name = name
> self.level = level
> self.children = []
>
> def append(self, child):
> self.children.append(child)
>
> def __str__(self):
> return "\%s{%s}" % (self.level, self.name)
Up to here everything is fine, essentially defining the basic methods
for the node object. A node is represented univocally with its name and
the level. Here I could say that two nodes with the same name cannot be
on the same level but this is cosmetic.
The important part would be that 'Name' can be also 'Attributes', with a
dictionary instead. This would allow to store more information on each
node.
> def show(self):
> yield [self.name]
Here I'm lost in translation! Why using yield in the first place?
What this snippet is used for?
> for i, child in enumerate(self.children):
> lastchild = i == len(self.children)-1
> first = True
> for c in child.show():
> if first:
> yield ["\---> " if lastchild else "+---> "] + c
> first = False
> else:
> yield [" " if lastchild else "| "] + c
Here I understand more, essentially 'yield' returns a string that would
be used further down in the show(root) function. Yet I doubt that I
grasp the true meaning of the code. It seems those 'show' functions have
lots of iterations that I'm not quite able to trace. Here you loop over
children, as well as in the main()...
> def show2(self):
> yield str(self)
> for child in self.children:
> yield from child.show2()
ok, this as well requires some explanation. Kinda lost again. From what
I can naively deduce is that it is a generator that returns the str
defined in the node as __str__ and it shows it for the whole tree.
> def show(root):
> for row in root.show():
> print("".join(row))
>
> def show2(root):
> for line in root.show2():
> print(line)
Here we implement the functions to print a node, but I'm not sure I
understand why do I have to iterate if the main() iterates again over the
nodes.
>
> def read_tree(rows, levelnames):
> root = Node("#ROOT", "#ROOT")
> old_level = 0
> stack = [root]
> for i, row in enumerate(rows, 1):
I'm not quite sure I understand what is the stack for. As of now is a
list whose only element is root.
> new_level = column_index(row)
> node = Node(row[new_level], levelnames[new_level])
here you are getting the node based on the current row, with its level.
> if new_level == old_level:
> stack[-1].append(node)
I'm not sure I understand here. Why the end of the list and not the
beginning?
> elif new_level > old_level:
> if new_level - old_level != 1:
> raise ValueError
here you avoid having a node which is distant more than one level from
its parent.
> stack.append(stack[-1].children[-1])
here I get a crash: IndexError: list index out of range!
> stack[-1].append(node)
> old_level = new_level
> else:
> while new_level < old_level:
> stack.pop(-1)
> old_level -= 1
> stack[-1].append(node)
Why do I need to pop something from the stack??? Here you are saying
that if current row has a depth (new_level) that is smaller than the
previous one (old_level) I decrement by one the old_level (even if I may
have a bigger jump) and pop something from the stack...???
> return root
once filled, the tree is returned. I thought the tree would have been
the stack, but instead is root...nice surprise.
>
> def main():
[strip arg parsing]
> with open(args.infile) as f:
> rows = csv.reader(f)
> levelnames = next(rows) # skip header
> tree = read_tree(rows, levelnames)
filling the tree with the data in the csv.
>
> show_tree = show2 if args.latex else show
> for node in tree.children:
> show_tree(node)
> print("")
It's nice to define show_tree as a function of the argument. The for
loop now is more than clear, traversing each node of the tree.
As I said earlier in the thread there's a lot of food for a newbie, but
better going through these sort of exercises than dumb tutorial which
don't teach you much.
Al
More information about the Python-list
mailing list