Time script help sought!

Paul McGuire ptmcg at austin.rr._bogus_.com
Tue Jan 11 18:08:17 EST 2005


"kpp9c" <kp8 at mac.com> wrote in message
news:1105480151.431987.327080 at c13g2000cwb.googlegroups.com...
> still working on it and also fixing the input data. I think for
> simplicity and consistency's sake i will have *all* time values input
> and output as hh:mm:ss maybe that would be easier.... but i have a few
> thousand find and replaceeseseses to do now (yes i am doing them by
> hand)
>
> grr... this is hard!
>

Oh, I wasn't going to chime in on this thread, your data looked so
well-formed that I wouldn't recommend pyparsing, but there is enough
variability going on here, I thought I'd give it a try.  Here's a pyparsing
treatment of your problem.  It will accommodate trailing comments or none,
leading hours or none on timestamps, and missing end times, and normalizes
all times back to the item start time.

Most of your processing logic will end up going into the processVals()
routine.  I've put various examples of how to access the parsed tokens by
field name, and some helper methods for converting to and from seconds and
hh:mm:ss or mm:ss times.

-- Paul


from pyparsing import *

data = """
Item_1    TAPE_1    1    00:23    8:23

Item_2    TAPE_1    2    8:23    9:41

Item_3    TAPE_1    3    9:41    10:41
Item_3    TAPE_1    4    10:47    11:19
Item_3    TAPE_1    5    11:21    11:55
Item_3    TAPE_1    6    11:58    12:10
Item_3    TAPE_1    7    12:15    12:45    Defect in analog tape sound.
Item_3    TAPE_1    8    12:58    24:20    Defect in analog tape sound.

Item_4    TAPE_1    9    24:33
Item_4    TAPE_1    10    25:48
Item_4    TAPE_1    11    29:48
Item_4    TAPE_1    12    31:46
Item_4    TAPE_1    13    34:17        Electronic sounds.
Item_4    TAPE_1    14    35:21
Item_4    TAPE_1    15    36:06
Item_4    TAPE_1    16    37:01  01:37:38
"""

def toSecs(tstr):
    fields = tstr.split(":")
    secs = int(fields[-1])
    secs += int(fields[-2])*60
    if len(fields)>2: secs += int(fields[-3])*60*60
    return secs

def secsToTime(secs):
    s = secs % 60
    m = ((secs - s) / 60 ) % 60
    h = (secs >= 3600 and (secs - s - m*60 ) / 3600 or 0)
    return "%02d:%02d:%02d" % (h,m,s)

# globals for normalizing timestamps
lastItem = ""
itemStart = 0

# put logic here for processing various parse fields
def processVals(s,l,t):
    global lastItem,itemStart
    print t.item,t.tape,t.recnum
    if not t.item == lastItem :
        lastItem = t.item
        itemStart = toSecs(t.start)

    startSecs = toSecs(t.start)
    print secsToTime(startSecs),"(%s)" % secsToTime(startSecs-itemStart)

    if t.end:
        endSecs = toSecs(t.end)
        print secsToTime(endSecs),"(%s)" % secsToTime(endSecs-itemStart)
        print endSecs-startSecs,"elapsed seconds"
        print secsToTime(endSecs-startSecs),"elapsed time"
    else:
        print "<no end time>"
    print t.comment
    print

# define structure of a line of data - sorry about the clunkiness of the
optional trailing fields
integer = Word(nums)
timestr = Combine(integer + ":" + integer + Optional(":" + integer))
dataline = ( Combine("Item_"+integer).setResultsName("item") +
            Combine("TAPE_"+integer).setResultsName("tape") +
            integer.setResultsName("recnum") +
            timestr.setResultsName("start") +
            Optional(~LineEnd() + timestr, default="").setResultsName("end")
+
            Optional(~LineEnd() + empty +
restOfLine,default="-").setResultsName("comment") )

# set up parse handler that will process the actual fields
dataline.setParseAction(processVals)

# now parse the little buggers
OneOrMore(dataline).parseString(data)

will print out:

Item_1 TAPE_1 1
00:00:23 (00:00:00)
00:08:23 (00:08:00)
480 elapsed seconds
00:08:00 elapsed time
-

Item_2 TAPE_1 2
00:08:23 (00:00:00)
00:09:41 (00:01:18)
78 elapsed seconds
00:01:18 elapsed time
-

Item_3 TAPE_1 3
00:09:41 (00:00:00)
00:10:41 (00:01:00)
60 elapsed seconds
00:01:00 elapsed time
-

Item_3 TAPE_1 4
00:10:47 (00:01:06)
00:11:19 (00:01:38)
32 elapsed seconds
00:00:32 elapsed time
-

Item_3 TAPE_1 5
00:11:21 (00:01:40)
00:11:55 (00:02:14)
34 elapsed seconds
00:00:34 elapsed time
-
...





More information about the Python-list mailing list