convert non-delimited to delimited

Paul McGuire ptmcg at austin.rr.com
Mon Aug 27 20:06:31 EDT 2007


On Aug 27, 12:59 pm, RyanL <ryanlaurit... at gmail.com> wrote:
> I'm a newbie!  I have a non-delimited data file that I'd like to
> convert to delimited.
>
> Example...
> Line in non-delimited file:
> 0139725635999992000010100534+42050-102800FM-15+1198KAIA
>
> Should be:
> 0139,725635,99999,2000,01,01,00,53,4,+42050,-102800,FM-15,+1198,KAIA
>
> What is the best way to go about this?  I've looked all over for
> examples, help, suggestions, but have not found much.  CSV module
> doesn't seem to do exactly what I want.  Maybe I'm just missing
> something or not using the correct terminology in my searches.  Any
> assistance is greatly appreaciated!  Using Python 2.4

I'm guessing that these lines *aren't* fixed-length, especially those
signed integer fields.  I used the patented Paul McGuire CrystalBall
module to come up with this pyparsing rendition.  (OP may adjust to
suit.)

-- Paul

data = "0139725635999992000010100534+42050-102800FM-15+1198KAIA"
"""to be parsed as:
0139,725635,99999,2000,01,01,00,53,4,+42050,-102800,FM-15,+1198,KAIA"""

from pyparsing import *
import time
def convertTimeStamp(t):
    t["date"] = map(int,t.date)
    t["time"] = map(int,t.time)
    return time.strftime("%Y-%m-%dT%H:%M",
                          tuple(t.date)+tuple(t.time)+(0,0,0,0))

yearMonthDay = Word(nums,exact=4) + Word(nums,exact=2) +
Word(nums,exact=2)
hourMinuteSecond = Word(nums,exact=2) + Word(nums,exact=2)
timestamp = ( yearMonthDay("date") + hourMinuteSecond("time") )
timestamp.setParseAction(convertTimeStamp)
signedInteger = Word("+-",nums)

fieldA = Word(nums,exact=4)("A")
fieldB = Word(nums,exact=6)("B")
fieldC = Word(nums,exact=5)("C")
fieldD = timestamp("timestamp")
fieldE = Word(nums)("E")
fieldF = signedInteger("latitude").setParseAction(lambda t : int(t[0])/
1000.0)
fieldG = signedInteger("longitude").setParseAction(lambda t :
int(t[0])/1000.0)
fieldH = Combine(Word(alphas,exact=2) + "-" + Word(nums,exact=2))("H")
fieldI = signedInteger("I")
fieldJ = Word(alphas)("J")
dataFields = fieldA + fieldB + fieldC + fieldD + fieldE + \
    fieldF + fieldG + fieldH + fieldI + fieldJ

res = dataFields.parseString(data)
print res.dump()

prints:

['0139', '725635', '99999', '2000-01-01T00:53', '4',
42.049999999999997, -102.8, 'FM-15', '+1198', 'KAIA']
- A: 0139
- B: 725635
- C: 99999
- E: 4
- H: FM-15
- I: +1198
- J: KAIA
- latitude: 42.05
- longitude: -102.8
- timestamp: 2000-01-01T00:53




More information about the Python-list mailing list