On text processing

Fri Mar 23 18:30:00 EDT 2007

Hi list,

I'm in a process of rewriting a bash/awk/sed script -- that grew to
big -- in python. I can rewrite it in a simple line-by-line way but
that results in ugly python code and I'm sure there is a simple
pythonic way.

The bash script processed text files of the form:

###############################
key1    value1
key2    value2
key3    value3

key4    value4
spec11  spec12   spec13   spec14
spec21  spec22   spec23   spec24
spec31  spec32   spec33   spec34

key5    value5
key6    value6

key7    value7
more11   more12   more13
more21   more22   more23

key8    value8
###################################

I guess you get the point. If a line has two entries it is a key/value
pair which should end up in a dictionary. If a key/value pair is
followed by consequtive lines with more then two entries, it is a
matrix that should end up in a list of lists (matrix) that can be
identified by the key preceeding it. The empty line after the last
line of a matrix signifies that the matrix is finished and we are back
to a key/value situation. Note that a matrix is always preceeded by a
key/value pair so that it can really be identified by the key.

Any elegant solution for this?