On text processing

Fri Mar 23 18:57:48 EDT 2007

Daniel Nogradi:
> Any elegant solution for this?

This is my first try:

ddata = {}

inside_matrix = False
for row in file("data.txt"):
    if row.strip():
        fields = row.split()
        if len(fields) == 2:
            inside_matrix = False
            ddata[fields[0]] = [fields[1]]
            lastkey = fields[0]
        else:
            if inside_matrix:
                ddata[lastkey][1].append(fields)
            else:
                ddata[lastkey].append([fields])
            inside_matrix = True

# This gives some output for testing only:
for k in sorted(ddata):
    print k, ddata[k]

Input file data.txt:

key1    value1
key2    value2
key3    value3

key4    value4
spec11  spec12   spec13   spec14
spec21  spec22   spec23   spec24
spec31  spec32   spec33   spec34

key5    value5
key6    value6

key7    value7
more11   more12   more13
more21   more22   more23

key8    value8

The output:

key1 ['value1']
key2 ['value2']
key3 ['value3']
key4 ['value4', [['spec11', 'spec12', 'spec13', 'spec14'], ['spec21',
'spec22', 'spec23', 'spec24'], ['spec31', 'spec32', 'spec33',
'spec34']]]
key5 ['value5']
key6 ['value6']
key7 ['value7', [['more11', 'more12', 'more13'], ['more21', 'more22',
'more23']]]
key8 ['value8']

If there are many simple keys, then you can avoid creating a single
element list for them, but then you have to tell apart the two cases
on the base of the key (while now the presence of the second element
is able  to tell apart the two situations). You can also use two
different dicts to keep the two different kinds of data.

Bye,
bearophile