Time script help sought!
Paul McGuire
ptmcg at austin.rr._bogus_.com
Tue Jan 11 18:08:17 EST 2005
"kpp9c" <kp8 at mac.com> wrote in message
news:1105480151.431987.327080 at c13g2000cwb.googlegroups.com...
> still working on it and also fixing the input data. I think for
> simplicity and consistency's sake i will have *all* time values input
> and output as hh:mm:ss maybe that would be easier.... but i have a few
> thousand find and replaceeseseses to do now (yes i am doing them by
> hand)
>
> grr... this is hard!
>
Oh, I wasn't going to chime in on this thread, your data looked so
well-formed that I wouldn't recommend pyparsing, but there is enough
variability going on here, I thought I'd give it a try. Here's a pyparsing
treatment of your problem. It will accommodate trailing comments or none,
leading hours or none on timestamps, and missing end times, and normalizes
all times back to the item start time.
Most of your processing logic will end up going into the processVals()
routine. I've put various examples of how to access the parsed tokens by
field name, and some helper methods for converting to and from seconds and
hh:mm:ss or mm:ss times.
-- Paul
from pyparsing import *
data = """
Item_1 TAPE_1 1 00:23 8:23
Item_2 TAPE_1 2 8:23 9:41
Item_3 TAPE_1 3 9:41 10:41
Item_3 TAPE_1 4 10:47 11:19
Item_3 TAPE_1 5 11:21 11:55
Item_3 TAPE_1 6 11:58 12:10
Item_3 TAPE_1 7 12:15 12:45 Defect in analog tape sound.
Item_3 TAPE_1 8 12:58 24:20 Defect in analog tape sound.
Item_4 TAPE_1 9 24:33
Item_4 TAPE_1 10 25:48
Item_4 TAPE_1 11 29:48
Item_4 TAPE_1 12 31:46
Item_4 TAPE_1 13 34:17 Electronic sounds.
Item_4 TAPE_1 14 35:21
Item_4 TAPE_1 15 36:06
Item_4 TAPE_1 16 37:01 01:37:38
"""
def toSecs(tstr):
fields = tstr.split(":")
secs = int(fields[-1])
secs += int(fields[-2])*60
if len(fields)>2: secs += int(fields[-3])*60*60
return secs
def secsToTime(secs):
s = secs % 60
m = ((secs - s) / 60 ) % 60
h = (secs >= 3600 and (secs - s - m*60 ) / 3600 or 0)
return "%02d:%02d:%02d" % (h,m,s)
# globals for normalizing timestamps
lastItem = ""
itemStart = 0
# put logic here for processing various parse fields
def processVals(s,l,t):
global lastItem,itemStart
print t.item,t.tape,t.recnum
if not t.item == lastItem :
lastItem = t.item
itemStart = toSecs(t.start)
startSecs = toSecs(t.start)
print secsToTime(startSecs),"(%s)" % secsToTime(startSecs-itemStart)
if t.end:
endSecs = toSecs(t.end)
print secsToTime(endSecs),"(%s)" % secsToTime(endSecs-itemStart)
print endSecs-startSecs,"elapsed seconds"
print secsToTime(endSecs-startSecs),"elapsed time"
else:
print "<no end time>"
print t.comment
print
# define structure of a line of data - sorry about the clunkiness of the
optional trailing fields
integer = Word(nums)
timestr = Combine(integer + ":" + integer + Optional(":" + integer))
dataline = ( Combine("Item_"+integer).setResultsName("item") +
Combine("TAPE_"+integer).setResultsName("tape") +
integer.setResultsName("recnum") +
timestr.setResultsName("start") +
Optional(~LineEnd() + timestr, default="").setResultsName("end")
+
Optional(~LineEnd() + empty +
restOfLine,default="-").setResultsName("comment") )
# set up parse handler that will process the actual fields
dataline.setParseAction(processVals)
# now parse the little buggers
OneOrMore(dataline).parseString(data)
will print out:
Item_1 TAPE_1 1
00:00:23 (00:00:00)
00:08:23 (00:08:00)
480 elapsed seconds
00:08:00 elapsed time
-
Item_2 TAPE_1 2
00:08:23 (00:00:00)
00:09:41 (00:01:18)
78 elapsed seconds
00:01:18 elapsed time
-
Item_3 TAPE_1 3
00:09:41 (00:00:00)
00:10:41 (00:01:00)
60 elapsed seconds
00:01:00 elapsed time
-
Item_3 TAPE_1 4
00:10:47 (00:01:06)
00:11:19 (00:01:38)
32 elapsed seconds
00:00:32 elapsed time
-
Item_3 TAPE_1 5
00:11:21 (00:01:40)
00:11:55 (00:02:14)
34 elapsed seconds
00:00:34 elapsed time
-
...
More information about the Python-list
mailing list