text file parsing (awk -> python)

Daniel Nogradi nogradi at gmail.com
Wed Nov 22 09:20:53 EST 2006


> > I have an awk program that parses a text file which I would like to
> > rewrite in python. The text file has multi-line records separated by
> > empty lines and each single-line field has two subfields:
> >
> > node 10
> > x -1
> > y 1
> >
> > node 11
> > x -2
> > y 1
> >
> > node 12
> > x -3
> > y 1
> >
> > and this I would like to parse into a list of dictionaries like so:
> >
> > mydict[0] = { 'node':10, 'x':-1, 'y':1 }
> > mydict[1] = { 'node':11, 'x':-2, 'y':1 }
> > mydict[2] = { 'node':12, 'x':-3', 'y':1 }
> >
> > But the names of the fields (node, x, y) keeps changing from file to
> > file, even their number is not fixed, sometimes it is (node, x, y, z).
> >
> > What would be the simples way to do this?
>
> data = """node 10
> x -1
> y 1
>
> node 11
> x -2
> y 1
>
> node 12
> x -3
> y 1
> """
>
> def open(filename):
>     from cStringIO import StringIO
>     return StringIO(data)
>
> converters = dict(
>     x=int,
>     y=int
> )
>
> def name_value(line):
>     name, value = line.split(None, 1)
>     return name, converters.get(name, str.rstrip)(value)
>
> if __name__ == "__main__":
>     from itertools import groupby
>     records = []
>
>     for empty, record in groupby(open("records.txt"), key=str.isspace):
>         if not empty:
>             records.append(dict(name_value(line) for line in record))
>
>     import pprint
>     pprint.pprint(records)


Thanks very much, that's exactly what I had in mind.

Thanks again,
Daniel



More information about the Python-list mailing list