On text processing
Daniel Nogradi
nogradi at gmail.com
Fri Mar 23 19:17:55 EDT 2007
> This is my first try:
>
> ddata = {}
>
> inside_matrix = False
> for row in file("data.txt"):
> if row.strip():
> fields = row.split()
> if len(fields) == 2:
> inside_matrix = False
> ddata[fields[0]] = [fields[1]]
> lastkey = fields[0]
> else:
> if inside_matrix:
> ddata[lastkey][1].append(fields)
> else:
> ddata[lastkey].append([fields])
> inside_matrix = True
>
> # This gives some output for testing only:
> for k in sorted(ddata):
> print k, ddata[k]
>
>
> Input file data.txt:
>
> key1 value1
> key2 value2
> key3 value3
>
> key4 value4
> spec11 spec12 spec13 spec14
> spec21 spec22 spec23 spec24
> spec31 spec32 spec33 spec34
>
> key5 value5
> key6 value6
>
> key7 value7
> more11 more12 more13
> more21 more22 more23
>
> key8 value8
>
>
> The output:
>
> key1 ['value1']
> key2 ['value2']
> key3 ['value3']
> key4 ['value4', [['spec11', 'spec12', 'spec13', 'spec14'], ['spec21',
> 'spec22', 'spec23', 'spec24'], ['spec31', 'spec32', 'spec33',
> 'spec34']]]
> key5 ['value5']
> key6 ['value6']
> key7 ['value7', [['more11', 'more12', 'more13'], ['more21', 'more22',
> 'more23']]]
> key8 ['value8']
>
>
> If there are many simple keys, then you can avoid creating a single
> element list for them, but then you have to tell apart the two cases
> on the base of the key (while now the presence of the second element
> is able to tell apart the two situations). You can also use two
> different dicts to keep the two different kinds of data.
>
> Bye,
> bearophile
Thanks very much, it's indeed quite simple. I was lost in the
itertools documentation :)
More information about the Python-list
mailing list