How do I parse this ? regexp ?

Paul McGuire ptmcg at austin.rr.com
Thu Apr 28 00:48:26 EDT 2005


Jake -

If regexp's give you pause, here is a pyparsing version that, while
verbose, is fairly straightforward.  I made some guesses at what some
of the data fields might be, but that doesn't matter much.

Note the use of setResultsName() to give different parse fragments
names so that they are directly addressable in the results, instead of
having to count out "the 0'th group is the date, the 1'st group is the
time...".  Also, there is a commented-out conversion action, to
automatically convert strings to floats during parsing.

Download pyparsing at http://pyparsing.sourceforge.net.

Good luck,
-- Paul


data = """04242005 18:20:42-0.000002, 271.1748608, [-4.119873046875,
3.4332275390625, 105.062255859375], [0.093780517578125, 0.041015625,
-0.960662841796875], [0.01556396484375, 0.01220703125,
0.01068115234375]"""

from pyparsing import *

COMMA = Literal(",").suppress()
LBRACK = Literal("[").suppress()
RBRACK = Literal("]").suppress()

# define a two-digit integer, we'll need a lot of them
int2 = Word(nums,exact=2)
month = int2
day = int2
yr = Combine("20" + int2)
date = Combine(month + day + yr)

hr = int2
min = int2
sec = int2
tz = oneOf("+ -") + Word(nums) + "." + Word(nums)
time = Combine( hr + ":" + min + ":" + sec + tz )

realNum = Combine( Optional("-") + Word(nums) + "." + Word(nums) )
# uncomment the next line and reals will be converted from strings to
floats during parsing
#realNum.setParseAction( lambda s,l,t: float(t[0]) )

triplet = Group( LBRACK + realNum + COMMA + realNum + COMMA + realNum +
RBRACK )
entry = Group( date.setResultsName("date") +
          time.setResultsName("time") + COMMA +
          realNum.setResultsName("temp") + COMMA +
          Group( triplet + COMMA + triplet + COMMA + triplet
).setResultsName("coords") )

dataFormat = OneOrMore(entry)
results = dataFormat.parseString(data)

for d in results:
    print d.date
    print d.time
    print d.temp
    print d.coords[0].asList()
    print d.coords[1].asList()
    print d.coords[2].asList()

returns:

04242005
18:20:42-0.000002
271.1748608
['-4.119873046875', '3.4332275390625', '105.062255859375']
['0.093780517578125', '0.041015625', '-0.960662841796875']
['0.01556396484375', '0.01220703125', '0.01068115234375']




More information about the Python-list mailing list