text file parsing (awk -> python)
Daniel Nogradi
nogradi at gmail.com
Wed Nov 22 09:20:53 EST 2006
> > I have an awk program that parses a text file which I would like to
> > rewrite in python. The text file has multi-line records separated by
> > empty lines and each single-line field has two subfields:
> >
> > node 10
> > x -1
> > y 1
> >
> > node 11
> > x -2
> > y 1
> >
> > node 12
> > x -3
> > y 1
> >
> > and this I would like to parse into a list of dictionaries like so:
> >
> > mydict[0] = { 'node':10, 'x':-1, 'y':1 }
> > mydict[1] = { 'node':11, 'x':-2, 'y':1 }
> > mydict[2] = { 'node':12, 'x':-3', 'y':1 }
> >
> > But the names of the fields (node, x, y) keeps changing from file to
> > file, even their number is not fixed, sometimes it is (node, x, y, z).
> >
> > What would be the simples way to do this?
>
> data = """node 10
> x -1
> y 1
>
> node 11
> x -2
> y 1
>
> node 12
> x -3
> y 1
> """
>
> def open(filename):
> from cStringIO import StringIO
> return StringIO(data)
>
> converters = dict(
> x=int,
> y=int
> )
>
> def name_value(line):
> name, value = line.split(None, 1)
> return name, converters.get(name, str.rstrip)(value)
>
> if __name__ == "__main__":
> from itertools import groupby
> records = []
>
> for empty, record in groupby(open("records.txt"), key=str.isspace):
> if not empty:
> records.append(dict(name_value(line) for line in record))
>
> import pprint
> pprint.pprint(records)
Thanks very much, that's exactly what I had in mind.
Thanks again,
Daniel
More information about the Python-list
mailing list