Parsing of a file

Mike Driscoll kyosohma at gmail.com
Thu Aug 7 12:52:47 EDT 2008


On Aug 6, 4:06 pm, John Machin <sjmac... at lexicon.net> wrote:
> On Aug 7, 6:02 am, Mike Driscoll <kyoso... at gmail.com> wrote:
>
>
>
> > On Aug 6, 1:55 pm, Tommy Grav <tg... at mac.com> wrote:
>
> > > I have a file with the format
>
> > > Field f29227: Ra=20:23:46.54 Dec=+67:30:00.0 MJD=53370.06797690 Frames
> > > 5 Set 1
> > > Field f31448: Ra=20:24:58.13 Dec=+79:39:43.9 MJD=53370.06811620 Frames
> > > 5 Set 2
> > > Field f31226: Ra=20:24:45.50 Dec=+78:26:45.2 MJD=53370.06823860 Frames
> > > 5 Set 3
> > > Field f31004: Ra=20:25:05.28 Dec=+77:13:46.9 MJD=53370.06836020 Frames
> > > 5 Set 4
> > > Field f30782: Ra=20:25:51.94 Dec=+76:00:48.6 MJD=53370.06848210 Frames
> > > 5 Set 5
> > > Field f30560: Ra=20:27:01.82 Dec=+74:47:50.3 MJD=53370.06860400 Frames
> > > 5 Set 6
> > > Field f30338: Ra=20:28:32.35 Dec=+73:34:52.0 MJD=53370.06872620 Frames
> > > 5 Set 7
> > > Field f30116: Ra=20:30:21.70 Dec=+72:21:53.6 MJD=53370.06884890 Frames
> > > 5 Set 8
> > > Field f29894: Ra=20:32:28.54 Dec=+71:08:55.0 MJD=53370.06897070 Frames
> > > 5 Set 9
> > > Field f29672: Ra=20:34:51.89 Dec=+69:55:56.6 MJD=53370.06909350 Frames
> > > 5 Set 10
>
> > > I would like to parse this file by extracting the field id, ra, dec
> > > and mjd for each line. It is
> > > not, however, certain that the width of each value of the field id,
> > > ra, dec or mjd is the same
> > > in each line. Is there a way to do this such that even if there was a
> > > line where Ra=****** and
> > > MJD=******** was swapped it would be parsed correctly?
>
> > > Cheers
> > >    Tommy
>
> > I'm sure Python can handle this. Try the PyParsing module or learn
> > Python regular expression syntax.
>
> >http://pyparsing.wikispaces.com/
>
> > You could probably do it very crudely by just iterating over each line
> > and then using the string's find() method.
>
> Perhaps you and the OP could spend some time becoming familiar with
> built-in functions and str methods. In particular, str.split is your
> friend:
>

I'm well aware of the split() method and built-ins, however since this
appeared to be a homework-type question and I was at work, I didn't
spend any time on the issue. The only reason I mentioned McGuire's
PyParsing module was because I had just finished reading his article
on the subject in Python Magazine and it sounded like something the OP
might find interesting.

Here's my own implementation based on what's already been done here.
I'm sure one get have some fun doing it with itertools or list
comprehensions if you wanted to get really fancy.

<code>

raw_data = """
Field f29227: Ra=20:23:46.54 Dec=+67:30:00.0 MJD=53370.06797690 Frames
5 Set 1
Field f31448: Ra=20:24:58.13 Dec=+79:39:43.9 MJD=53370.06811620 Frames
5 Set 2
Field f31226: Ra=20:24:45.50 Dec=+78:26:45.2 MJD=53370.06823860 Frames
5 Set 3
Field f31004: Ra=20:25:05.28 Dec=+77:13:46.9 MJD=53370.06836020 Frames
5 Set 4
Field f30782: Ra=20:25:51.94 Dec=+76:00:48.6 MJD=53370.06848210 Frames
5 Set 5
Field f30560: Ra=20:27:01.82 Dec=+74:47:50.3 MJD=53370.06860400 Frames
5 Set 6
Field f30338: Ra=20:28:32.35 Dec=+73:34:52.0 MJD=53370.06872620 Frames
5 Set 7
Field f30116: Ra=20:30:21.70 Dec=+72:21:53.6 MJD=53370.06884890 Frames
5 Set 8
Field f29894: Ra=20:32:28.54 Dec=+71:08:55.0 MJD=53370.06897070 Frames
5 Set 9
Field f29672: Ra=20:34:51.89 Dec=+69:55:56.6 MJD=53370.06909350 Frames
5 Set 10
""".splitlines()

myList = []
for line in raw_data:
    items = line.split()
    myDict = {}
    for item in items:
        if '=' in item:
            key, value = item.split('=')
            myDict[key] = value
        elif item[:1].lower() == 'f' and item[-1:] == ':':
            myDict['id'] = item[1:-1]
    myList.append(myDict)

print myList

</code>

This doesn't have any type checking or error handling, but it works
with the data provided.

Mike



More information about the Python-list mailing list