database query - logic question

Tim Chase python.list at tim.thechases.com
Wed Jan 2 16:42:47 EST 2008


Israel Carr wrote:
> Thanks for anyone who takes the time to read this.  If I posted to the
> wrong list, I apologize and you can disregard.
> 
> I need help with a script to pull data from a postgres database.  I'm ok
> with the database connection just not sure how to parse the data to get
> the results I need.
> 
> I'm running Python 2.4.4. For what it's worth, once I can get my logic
> correct I'll be publishing the reports mentioned below via zope for web
> clients.
> 
> Here is a small sample of the records in the table:
> 
> name		date		time		status		
> machine1	01/01/2008	13:00:00	system ok
> machine1	01/01/2008	13:05:00	system ok
> machine1	01/01/2008	13:10:00	status1
> machine1 	01/01/2008	13:10:30	status1
> machine1	01/01/2008	13:11:00	system ok
> machine1	01/01/2008	13:16:30	status2
> machine1	01/01/2008	13:17:00	status2
> machine1	01/01/2008	13:17:30	status2
> machine1 	01/01/2008	13:18:00	status2
> machine1	01/01/2008	13:18:30	status2
> machine1	01/01/2008	13:19:00	system ok
> machine1	01/01/2008	13:24:00	status2
> machine1	01/01/2008	13:24:30	status2
> machine1    01/01/2008	13:25:00	system ok
> 
> I need to report from this data.
> The detail report needs to be something like:
> machine1	01/01/2008 13:10:00 status1	00:01:30
> machine1	01/01/2008 13:16:30 status2	00:02:30
> machine1	01/01/2008 13:24:00 status2	00:01:00

Well, just for fun of the SQL challenge, I tossed together the 
following (using sqlite3)

   SELECT name, Min(ts) as ts, next_ts, status
   FROM (
     SELECT *, (
       SELECT ts
       FROM test
       WHERE
         test.name = t.name
         AND test.ts > t.ts
         AND test.status = 'system ok'
       ORDER BY test.ts ASC
       LIMIT 1) AS next_ts
     FROM test t
     WHERE status <> 'system ok'
     ) with_next
   GROUP BY name, status, next_ts

where my table has "name", "ts" (a timestamp field combo of your 
"date" and "time" fields, and for sqlite, formatting in 
"YYYY-MM-DD mm:ss" format)

which yields rows with the machine name, the non "system ok" 
status, the timestamp of the initial event, and the timestamp of 
the subsequent "system ok" stamp.  There's a bit of an 
underdefined case where you have more than one non-OK status 
before OK gets reset:

   00:10  status1
   00:20  status1
   00:30  status2
   00:40  status ok

If this can't happen, it should work fine.  If the above can 
happen, you'll get odd overlaps in your reporting.  Since I 
couldn't find an Interval data type in sqlite, you'd just have to 
take the "ts" and "next_ts" columns and subtract them to get the 
interval you want.

> and the summary needs to be
> machine1	01/01/2008 total 'status1' time = 00:01:30
> machine1	01/01/2008 total 'status2' time = 00:03:30
> _____
> machine1	01/01/2008 total 'non-OK' time = 00:05:00 #this is the
> sum of 	status1 and status2 times

While the below doesn't track the changing of the machine, you 
can follow the basic framework given here.  I threw in a couple 
helper functions to normalize whatever data types 
("normalize_status()" and "make_timestamp()")

   NO_TIME = datetime.datetime(datetime.MINYEAR, 1, 1)
   OK = 'system ok'
   normalize_status = lambda s: s.lower()

   def log(s):
     print s
     print '=' * len(s)

   def make_timestamp(date, time):
     d = datetime.datetime(*(int(s) for s in
       date.split('-') +
       time.split(':')))
     return d

   status_tally = {}
   last_status = OK
   last_ts = NO_TIME
   log('Intervals (your first request)')
   for i, (machine, date, time, status) in enumerate(fetchall()):
     ts = make_timestamp(date, time)
     status = normalize_status(status)
     if status == OK and last_status <> OK:
       interval = ts - last_ts
       print machine, last_status, last_ts, interval
       if last_status in status_tally:
         status_tally[last_status] += interval
       else:
         status_tally[last_status] = interval
       last_status = status
     elif status <> OK and last_status == OK:
       last_ts = ts
       last_status = status

   log('Summary (your 2nd request)')
   for k,v in status_tally.iteritems():
     print k, v

   log('Grand Total (your 3rd request)')
   print sum(status_tally.values(), datetime.timedelta(0))

Thanks for the mental exercise. :)

-tkc







More information about the Python-list mailing list