Parsing of a file

John Machin sjmachin at lexicon.net
Wed Aug 6 17:36:01 EDT 2008


On Aug 7, 7:06 am, John Machin <sjmac... at lexicon.net> wrote:
> On Aug 7, 6:02 am, Mike Driscoll <kyoso... at gmail.com> wrote:
>
>
>
> > On Aug 6, 1:55 pm, Tommy Grav <tg... at mac.com> wrote:
>
> > > I have a file with the format
>
> > > Field f29227: Ra=20:23:46.54 Dec=+67:30:00.0 MJD=53370.06797690 Frames
> > > 5 Set 1
> > > Field f31448: Ra=20:24:58.13 Dec=+79:39:43.9 MJD=53370.06811620 Frames
> > > 5 Set 2
> > > Field f31226: Ra=20:24:45.50 Dec=+78:26:45.2 MJD=53370.06823860 Frames
> > > 5 Set 3
> > > Field f31004: Ra=20:25:05.28 Dec=+77:13:46.9 MJD=53370.06836020 Frames
> > > 5 Set 4
> > > Field f30782: Ra=20:25:51.94 Dec=+76:00:48.6 MJD=53370.06848210 Frames
> > > 5 Set 5
> > > Field f30560: Ra=20:27:01.82 Dec=+74:47:50.3 MJD=53370.06860400 Frames
> > > 5 Set 6
> > > Field f30338: Ra=20:28:32.35 Dec=+73:34:52.0 MJD=53370.06872620 Frames
> > > 5 Set 7
> > > Field f30116: Ra=20:30:21.70 Dec=+72:21:53.6 MJD=53370.06884890 Frames
> > > 5 Set 8
> > > Field f29894: Ra=20:32:28.54 Dec=+71:08:55.0 MJD=53370.06897070 Frames
> > > 5 Set 9
> > > Field f29672: Ra=20:34:51.89 Dec=+69:55:56.6 MJD=53370.06909350 Frames
> > > 5 Set 10
>
> > > I would like to parse this file by extracting the field id, ra, dec
> > > and mjd for each line. It is
> > > not, however, certain that the width of each value of the field id,
> > > ra, dec or mjd is the same
> > > in each line. Is there a way to do this such that even if there was a
> > > line where Ra=****** and
> > > MJD=******** was swapped it would be parsed correctly?
>
> > > Cheers
> > >    Tommy
>
> > I'm sure Python can handle this. Try the PyParsing module or learn
> > Python regular expression syntax.
>
> >http://pyparsing.wikispaces.com/
>
> > You could probably do it very crudely by just iterating over each line
> > and then using the string's find() method.
>
> Perhaps you and the OP could spend some time becoming familiar with
> built-in functions and str methods. In particular, str.split is your
> friend:
>
> C:\junk>type tommy_grav.py
> # Look, Ma, no imports!
>
> guff = """\
> Field f29227: Ra=20:23:46.54 Dec=+67:30:00.0 MJD=53370.06797690 Frames
> 5 Set 1
> Field f31448: MJD=53370.06811620123 Dec=+79:39:43.9 Ra=20:24:58.13
> Frames 5 Set
> 2
> Field f31226: Ra=20:24:45.50 Dec=+78:26:45.2 MJD=53370.06823860 Frames
> 5 Set 3
> Field f31004: Ra=20:25:05.28 Dec=+77:13:46.9 MJD=53370.06836020 Frames
> 5 Set 4
> Field f30782: Ra=20:25:51.94 Dec=+76:00:48.6 MJD=53370.06848210 Frames
> 5 Set 5
>
> Field f30560: Ra=20:27:01.82 Dec=+74:47:50.3 MJD=53370.06860400 Frames
> 5 Set 6
> Field f30338: Ra=20:28:32.35 Dec=+73:34:52.0 MJD=53370.06872620 Frames
> 5 Set 7
> Field f30116: Ra=20:30:21.70 Dec=+72:21:53.6 MJD=53370.06884890 Frames
> 5 Set 8
> Field f29894: Ra=20:32:28.54 Dec=+71:08:55.0 MJD=53370.06897070 Frames
> 5 Set 9
> Field f29672: Ra=20:34:51.89 Dec=+69:55:56.6 MJD=53370.06909350 Frames
> 5 Set 10
>
> """
>
> is_angle = {
>     'ra': True,
>     'dec': True,
>     'mjd': False,
>     }
>
> def convert_angle(text):
>     deg, min, sec = map(float, text.split(':'))
>     return (sec / 60. + min) / 60. + deg
>
> def parse_line(line):
>     t = line.split()
>     assert t[0].lower() == 'field'
>     assert t[1].startswith('f')
>     assert t[1].endswith(':')
>     field_id = t[1].rstrip(':')
>     rdict = {}
>     for f in t[2:]:
>         parts = f.split('=')
>         if len(parts) == 2:
>             key = parts[0].lower()
>             value = parts[1]
>             assert key not in rdict
>             if is_angle[key]:
>                 rvalue = convert_angle(value)
>             else:
>                 rvalue = float(value)
>             rdict[key] = rvalue
>     return field_id, rdict['ra'], rdict['dec'], rdict['mjd']
>
> for line in guff.splitlines():
>     line = line.strip()
>     if not line:
>         continue
>     field_id, ra, dec, mjd = parse_line(line)
>     print field_id, ra, dec, mjd
>
> C:\junk>tommy_grav.py
> f29227 20.3962611111 67.5 53370.0679769
> f31448 20.4161472222 79.6621944444 53370.0681162
> f31226 20.4126388889 78.4458888889 53370.0682386
> f31004 20.4181333333 77.2296944444 53370.0683602
> f30782 20.4310944444 76.0135 53370.0684821
> f30560 20.4505055556 74.7973055556 53370.068604
> f30338 20.4756527778 73.5811111111 53370.0687262
> f30116 20.5060277778 72.3648888889 53370.0688489
> f29894 20.5412611111 71.1486111111 53370.0689707
> f29672 20.5810805556 69.9323888889 53370.0690935
>
> Cheers,
> John

Slightly less ugly:

C:\junk>diff tommy_grav.py tommy_grav_2.py
18,23d17
< is_angle = {
<     'ra': True,
<     'dec': True,
<     'mjd': False,
<     }
<
27a22,27
> converter = {
>     'ra': convert_angle,
>     'dec': convert_angle,
>     'mjd': float,
>     }
>
41,44c41
<             if is_angle[key]:
<                 rvalue = convert_angle(value)
<             else:
<                 rvalue = float(value)
---
>             rvalue = converter[key](value)



More information about the Python-list mailing list