convert non-delimited to delimited
Paul McGuire
ptmcg at austin.rr.com
Mon Aug 27 20:06:31 EDT 2007
On Aug 27, 12:59 pm, RyanL <ryanlaurit... at gmail.com> wrote:
> I'm a newbie! I have a non-delimited data file that I'd like to
> convert to delimited.
>
> Example...
> Line in non-delimited file:
> 0139725635999992000010100534+42050-102800FM-15+1198KAIA
>
> Should be:
> 0139,725635,99999,2000,01,01,00,53,4,+42050,-102800,FM-15,+1198,KAIA
>
> What is the best way to go about this? I've looked all over for
> examples, help, suggestions, but have not found much. CSV module
> doesn't seem to do exactly what I want. Maybe I'm just missing
> something or not using the correct terminology in my searches. Any
> assistance is greatly appreaciated! Using Python 2.4
I'm guessing that these lines *aren't* fixed-length, especially those
signed integer fields. I used the patented Paul McGuire CrystalBall
module to come up with this pyparsing rendition. (OP may adjust to
suit.)
-- Paul
data = "0139725635999992000010100534+42050-102800FM-15+1198KAIA"
"""to be parsed as:
0139,725635,99999,2000,01,01,00,53,4,+42050,-102800,FM-15,+1198,KAIA"""
from pyparsing import *
import time
def convertTimeStamp(t):
t["date"] = map(int,t.date)
t["time"] = map(int,t.time)
return time.strftime("%Y-%m-%dT%H:%M",
tuple(t.date)+tuple(t.time)+(0,0,0,0))
yearMonthDay = Word(nums,exact=4) + Word(nums,exact=2) +
Word(nums,exact=2)
hourMinuteSecond = Word(nums,exact=2) + Word(nums,exact=2)
timestamp = ( yearMonthDay("date") + hourMinuteSecond("time") )
timestamp.setParseAction(convertTimeStamp)
signedInteger = Word("+-",nums)
fieldA = Word(nums,exact=4)("A")
fieldB = Word(nums,exact=6)("B")
fieldC = Word(nums,exact=5)("C")
fieldD = timestamp("timestamp")
fieldE = Word(nums)("E")
fieldF = signedInteger("latitude").setParseAction(lambda t : int(t[0])/
1000.0)
fieldG = signedInteger("longitude").setParseAction(lambda t :
int(t[0])/1000.0)
fieldH = Combine(Word(alphas,exact=2) + "-" + Word(nums,exact=2))("H")
fieldI = signedInteger("I")
fieldJ = Word(alphas)("J")
dataFields = fieldA + fieldB + fieldC + fieldD + fieldE + \
fieldF + fieldG + fieldH + fieldI + fieldJ
res = dataFields.parseString(data)
print res.dump()
prints:
['0139', '725635', '99999', '2000-01-01T00:53', '4',
42.049999999999997, -102.8, 'FM-15', '+1198', 'KAIA']
- A: 0139
- B: 725635
- C: 99999
- E: 4
- H: FM-15
- I: +1198
- J: KAIA
- latitude: 42.05
- longitude: -102.8
- timestamp: 2000-01-01T00:53
More information about the Python-list
mailing list