On text processing

Paddy paddy3118 at googlemail.com
Fri Mar 23 20:43:57 EDT 2007


On Mar 23, 10:30 pm, "Daniel Nogradi" <nogr... at gmail.com> wrote:
> Hi list,
>
> I'm in a process of rewriting a bash/awk/sed script -- that grew to
> big -- in python. I can rewrite it in a simple line-by-line way but
> that results in ugly python code and I'm sure there is a simple
> pythonic way.
>
> The bash script processed text files of the form:
>
> ###############################
> key1    value1
> key2    value2
> key3    value3
>
> key4    value4
> spec11  spec12   spec13   spec14
> spec21  spec22   spec23   spec24
> spec31  spec32   spec33   spec34
>
> key5    value5
> key6    value6
>
> key7    value7
> more11   more12   more13
> more21   more22   more23
>
> key8    value8
> ###################################
>
> I guess you get the point. If a line has two entries it is a key/value
> pair which should end up in a dictionary. If a key/value pair is
> followed by consequtive lines with more then two entries, it is a
> matrix that should end up in a list of lists (matrix) that can be
> identified by the key preceeding it. The empty line after the last
> line of a matrix signifies that the matrix is finished and we are back
> to a key/value situation. Note that a matrix is always preceeded by a
> key/value pair so that it can really be identified by the key.
>
> Any elegant solution for this?


My solution expects correctly formatted input and parses it into
separate key/value and matrix holding dicts:


from StringIO import StringIO

fileText = '''\
 key1    value1
key2    value2
key3    value3

key4    value4
spec11  spec12   spec13   spec14
spec21  spec22   spec23   spec24
spec31  spec32   spec33   spec34

key5    value5
key6    value6

key7    value7
more11   more12   more13
more21   more22   more23

key8    value8
'''
infile = StringIO(fileText)

keyvalues = {}
matrices  = {}
for line in infile:
    fields = line.strip().split()
    if len(fields) == 2:
        keyvalues[fields[0]] = fields[1]
        lastkey = fields[0]
    elif fields:
        matrices.setdefault(lastkey, []).append(fields)

==============
Here is the sample output:

>>> from pprint import pprint as pp
>>> pp(keyvalues)
{'key1': 'value1',
 'key2': 'value2',
 'key3': 'value3',
 'key4': 'value4',
 'key5': 'value5',
 'key6': 'value6',
 'key7': 'value7',
 'key8': 'value8'}
>>> pp(matrices)
{'key4': [['spec11', 'spec12', 'spec13', 'spec14'],
          ['spec21', 'spec22', 'spec23', 'spec24'],
          ['spec31', 'spec32', 'spec33', 'spec34']],
 'key7': [['more11', 'more12', 'more13'], ['more21', 'more22',
'more23']]}
>>>

- Paddy.




More information about the Python-list mailing list