Parsing of a file

Henrique Dante de Almeida hdante at gmail.com
Wed Aug 6 19:00:05 EDT 2008


On Aug 6, 3:55 pm, Tommy Grav <tg... at mac.com> wrote:
> I have a file with the format
>
> Field f29227: Ra=20:23:46.54 Dec=+67:30:00.0 MJD=53370.06797690 Frames  
> 5 Set 1
> Field f31448: Ra=20:24:58.13 Dec=+79:39:43.9 MJD=53370.06811620 Frames  
> 5 Set 2
> Field f31226: Ra=20:24:45.50 Dec=+78:26:45.2 MJD=53370.06823860 Frames  
> 5 Set 3
> Field f31004: Ra=20:25:05.28 Dec=+77:13:46.9 MJD=53370.06836020 Frames  
> 5 Set 4
> Field f30782: Ra=20:25:51.94 Dec=+76:00:48.6 MJD=53370.06848210 Frames  
> 5 Set 5
> Field f30560: Ra=20:27:01.82 Dec=+74:47:50.3 MJD=53370.06860400 Frames  
> 5 Set 6
> Field f30338: Ra=20:28:32.35 Dec=+73:34:52.0 MJD=53370.06872620 Frames  
> 5 Set 7
> Field f30116: Ra=20:30:21.70 Dec=+72:21:53.6 MJD=53370.06884890 Frames  
> 5 Set 8
> Field f29894: Ra=20:32:28.54 Dec=+71:08:55.0 MJD=53370.06897070 Frames  
> 5 Set 9
> Field f29672: Ra=20:34:51.89 Dec=+69:55:56.6 MJD=53370.06909350 Frames  
> 5 Set 10
>
> I would like to parse this file by extracting the field id, ra, dec  
> and mjd for each line. It is
> not, however, certain that the width of each value of the field id,  
> ra, dec or mjd is the same
> in each line. Is there a way to do this such that even if there was a  
> line where Ra=****** and
> MJD=******** was swapped it would be parsed correctly?
>
> Cheers
>    Tommy

 Did you consider changing the file format in the first place, so that
you don't have to do any contortions to parse it ?

 Anyway, here is a solution with regular expressions (I'm a beginner
with re's in python, so, please correct it if wrong and suggest better
solutions):

import re
s = """Field f29227: Ra=20:23:46.54 Dec=+67:30:00.0 MJD=53370.06797690
Frames 5 Set 1
Field f31448: Ra=20:24:58.13 Dec=+79:39:43.9 MJD=53370.06811620 Frames
5 Set 2
Field f31226: Ra=20:24:45.50 Dec=+78:26:45.2 MJD=53370.06823860 Frames
5 Set 3
Field f31004: Ra=20:25:05.28 Dec=+77:13:46.9 MJD=53370.06836020 Frames
5 Set 4
Field f30782: Ra=20:25:51.94 Dec=+76:00:48.6 MJD=53370.06848210 Frames
5 Set 5
Field f30560: Dec=+74:47:50.3 Ra=20:27:01.82 MJD=53370.06860400 Frames
5 Set 6
Field f30338: Ra=20:28:32.35 Dec=+73:34:52.0 MJD=53370.06872620 Frames
5 Set 7
Field f30116: Ra=20:30:21.70 Dec=+72:21:53.6 MJD=53370.06884890 Frames
5 Set 8
Field f29894: Ra=20:32:28.54 Dec=+71:08:55.0 MJD=53370.06897070 Frames
5 Set 9
Field f29672: Ra=20:34:51.89 Dec=+69:55:56.6 MJD=53370.06909350 Frames
5 Set 10"""

s = s.split('\n')
r = re.compile(r'Field (\S+): (?:(?:Ra=(\S+) Dec=(\S+))|(?:Dec=(\S+)
Ra=(\S+))) MJD=(\S+)')
for i in s:
	match = r.findall(i)
	field = match[0][0]
	Ra = match[0][1] or match[0][4]
	Dec = match[0][2] or match[0][3]
	MJD = match[0][5]
	print field, Ra, Dec, MJD



More information about the Python-list mailing list