text file parsing (awk -> python)

Wed Nov 22 07:55:44 EST 2006

Daniel Nogradi wrote:

> I have an awk program that parses a text file which I would like to
> rewrite in python. The text file has multi-line records separated by
> empty lines and each single-line field has two subfields:
> 
> node 10
> x -1
> y 1
> 
> node 11
> x -2
> y 1
> 
> node 12
> x -3
> y 1
> 
> and this I would like to parse into a list of dictionaries like so:
> 
> mydict[0] = { 'node':10, 'x':-1, 'y':1 }
> mydict[1] = { 'node':11, 'x':-2, 'y':1 }
> mydict[2] = { 'node':12, 'x':-3', 'y':1 }
> 
> But the names of the fields (node, x, y) keeps changing from file to
> file, even their number is not fixed, sometimes it is (node, x, y, z).
> 
> What would be the simples way to do this?

data = """node 10
x -1
y 1

node 11
x -2
y 1

node 12
x -3
y 1
"""

def open(filename):
    from cStringIO import StringIO
    return StringIO(data)

converters = dict(
    x=int,
    y=int
)

def name_value(line):
    name, value = line.split(None, 1)
    return name, converters.get(name, str.rstrip)(value)

if __name__ == "__main__":
    from itertools import groupby
    records = []

    for empty, record in groupby(open("records.txt"), key=str.isspace):
        if not empty:
            records.append(dict(name_value(line) for line in record))

    import pprint
    pprint.pprint(records)