Parsing of a file

John Machin sjmachin at lexicon.net
Wed Aug 6 17:06:30 EDT 2008


On Aug 7, 6:02 am, Mike Driscoll <kyoso... at gmail.com> wrote:
> On Aug 6, 1:55 pm, Tommy Grav <tg... at mac.com> wrote:
>
>
>
> > I have a file with the format
>
> > Field f29227: Ra=20:23:46.54 Dec=+67:30:00.0 MJD=53370.06797690 Frames
> > 5 Set 1
> > Field f31448: Ra=20:24:58.13 Dec=+79:39:43.9 MJD=53370.06811620 Frames
> > 5 Set 2
> > Field f31226: Ra=20:24:45.50 Dec=+78:26:45.2 MJD=53370.06823860 Frames
> > 5 Set 3
> > Field f31004: Ra=20:25:05.28 Dec=+77:13:46.9 MJD=53370.06836020 Frames
> > 5 Set 4
> > Field f30782: Ra=20:25:51.94 Dec=+76:00:48.6 MJD=53370.06848210 Frames
> > 5 Set 5
> > Field f30560: Ra=20:27:01.82 Dec=+74:47:50.3 MJD=53370.06860400 Frames
> > 5 Set 6
> > Field f30338: Ra=20:28:32.35 Dec=+73:34:52.0 MJD=53370.06872620 Frames
> > 5 Set 7
> > Field f30116: Ra=20:30:21.70 Dec=+72:21:53.6 MJD=53370.06884890 Frames
> > 5 Set 8
> > Field f29894: Ra=20:32:28.54 Dec=+71:08:55.0 MJD=53370.06897070 Frames
> > 5 Set 9
> > Field f29672: Ra=20:34:51.89 Dec=+69:55:56.6 MJD=53370.06909350 Frames
> > 5 Set 10
>
> > I would like to parse this file by extracting the field id, ra, dec
> > and mjd for each line. It is
> > not, however, certain that the width of each value of the field id,
> > ra, dec or mjd is the same
> > in each line. Is there a way to do this such that even if there was a
> > line where Ra=****** and
> > MJD=******** was swapped it would be parsed correctly?
>
> > Cheers
> >    Tommy
>
> I'm sure Python can handle this. Try the PyParsing module or learn
> Python regular expression syntax.
>
> http://pyparsing.wikispaces.com/
>
> You could probably do it very crudely by just iterating over each line
> and then using the string's find() method.
>

Perhaps you and the OP could spend some time becoming familiar with
built-in functions and str methods. In particular, str.split is your
friend:

C:\junk>type tommy_grav.py
# Look, Ma, no imports!

guff = """\
Field f29227: Ra=20:23:46.54 Dec=+67:30:00.0 MJD=53370.06797690 Frames
5 Set 1
Field f31448: MJD=53370.06811620123 Dec=+79:39:43.9 Ra=20:24:58.13
Frames 5 Set
2
Field f31226: Ra=20:24:45.50 Dec=+78:26:45.2 MJD=53370.06823860 Frames
5 Set 3
Field f31004: Ra=20:25:05.28 Dec=+77:13:46.9 MJD=53370.06836020 Frames
5 Set 4
Field f30782: Ra=20:25:51.94 Dec=+76:00:48.6 MJD=53370.06848210 Frames
5 Set 5

Field f30560: Ra=20:27:01.82 Dec=+74:47:50.3 MJD=53370.06860400 Frames
5 Set 6
Field f30338: Ra=20:28:32.35 Dec=+73:34:52.0 MJD=53370.06872620 Frames
5 Set 7
Field f30116: Ra=20:30:21.70 Dec=+72:21:53.6 MJD=53370.06884890 Frames
5 Set 8
Field f29894: Ra=20:32:28.54 Dec=+71:08:55.0 MJD=53370.06897070 Frames
5 Set 9
Field f29672: Ra=20:34:51.89 Dec=+69:55:56.6 MJD=53370.06909350 Frames
5 Set 10

"""

is_angle = {
    'ra': True,
    'dec': True,
    'mjd': False,
    }

def convert_angle(text):
    deg, min, sec = map(float, text.split(':'))
    return (sec / 60. + min) / 60. + deg

def parse_line(line):
    t = line.split()
    assert t[0].lower() == 'field'
    assert t[1].startswith('f')
    assert t[1].endswith(':')
    field_id = t[1].rstrip(':')
    rdict = {}
    for f in t[2:]:
        parts = f.split('=')
        if len(parts) == 2:
            key = parts[0].lower()
            value = parts[1]
            assert key not in rdict
            if is_angle[key]:
                rvalue = convert_angle(value)
            else:
                rvalue = float(value)
            rdict[key] = rvalue
    return field_id, rdict['ra'], rdict['dec'], rdict['mjd']

for line in guff.splitlines():
    line = line.strip()
    if not line:
        continue
    field_id, ra, dec, mjd = parse_line(line)
    print field_id, ra, dec, mjd


C:\junk>tommy_grav.py
f29227 20.3962611111 67.5 53370.0679769
f31448 20.4161472222 79.6621944444 53370.0681162
f31226 20.4126388889 78.4458888889 53370.0682386
f31004 20.4181333333 77.2296944444 53370.0683602
f30782 20.4310944444 76.0135 53370.0684821
f30560 20.4505055556 74.7973055556 53370.068604
f30338 20.4756527778 73.5811111111 53370.0687262
f30116 20.5060277778 72.3648888889 53370.0688489
f29894 20.5412611111 71.1486111111 53370.0689707
f29672 20.5810805556 69.9323888889 53370.0690935

Cheers,
John




More information about the Python-list mailing list