On text processing

Daniel Nogradi nogradi at gmail.com
Fri Mar 23 19:17:55 EDT 2007


> This is my first try:
>
> ddata = {}
>
> inside_matrix = False
> for row in file("data.txt"):
>     if row.strip():
>         fields = row.split()
>         if len(fields) == 2:
>             inside_matrix = False
>             ddata[fields[0]] = [fields[1]]
>             lastkey = fields[0]
>         else:
>             if inside_matrix:
>                 ddata[lastkey][1].append(fields)
>             else:
>                 ddata[lastkey].append([fields])
>             inside_matrix = True
>
> # This gives some output for testing only:
> for k in sorted(ddata):
>     print k, ddata[k]
>
>
> Input file data.txt:
>
> key1    value1
> key2    value2
> key3    value3
>
> key4    value4
> spec11  spec12   spec13   spec14
> spec21  spec22   spec23   spec24
> spec31  spec32   spec33   spec34
>
> key5    value5
> key6    value6
>
> key7    value7
> more11   more12   more13
> more21   more22   more23
>
> key8    value8
>
>
> The output:
>
> key1 ['value1']
> key2 ['value2']
> key3 ['value3']
> key4 ['value4', [['spec11', 'spec12', 'spec13', 'spec14'], ['spec21',
> 'spec22', 'spec23', 'spec24'], ['spec31', 'spec32', 'spec33',
> 'spec34']]]
> key5 ['value5']
> key6 ['value6']
> key7 ['value7', [['more11', 'more12', 'more13'], ['more21', 'more22',
> 'more23']]]
> key8 ['value8']
>
>
> If there are many simple keys, then you can avoid creating a single
> element list for them, but then you have to tell apart the two cases
> on the base of the key (while now the presence of the second element
> is able  to tell apart the two situations). You can also use two
> different dicts to keep the two different kinds of data.
>
> Bye,
> bearophile

Thanks very much, it's indeed quite simple. I was lost in the
itertools documentation :)



More information about the Python-list mailing list