Efficient counting of results

Israel Brewster israel at ravnalaska.net
Thu Oct 19 13:40:23 EDT 2017


I am working on developing a report that groups data into a two-dimensional array based on date and time. More specifically, date is grouped into categories:

day, week-to-date, month-to-date, and year-to-date

Then, for each of those categories, I need to get a count of records that fall into the following categories:

0 minutes late, 1-5 minutes late, and 6-15 minutes late

where minutes late will be calculated based on a known scheduled time and the time in the record. To further complicate things, there are actually two times in each record, so under the day, week-to-date, month-to-date etc groups, there will be two sets of "late" bins, one for each time. In table form it would look  something like this:

                    | day  |  week-to-date | month-to-date |  year-to-date  |
----------------------------------------------------------------------------------------
t1 0min        | <counts for each time group>
t1 1-5 min    | ...
t1 6-15 min  | ...
t2 0min        | ...
t2 1-5 min    | ...
t2 6-15 min  | ...

So in the extreme scenario of a record that is for the current day, it will be counted into 8 bins: once each for day, week-to-date, month-to-date and year-to-date under the proper "late" bin for the first time in the record, and once each into each of the time groups under the proper "late" bin for the second time in the record. An older record may only be counted twice, under the year-to-date group. A record with no matching schedule is discarded, as is any record that is "late" by more than 15 minutes (those are gathered into a separate report)

My initial approach was to simply make dictionaries for each "row" in the table, like so:

t10 = {'daily': 0, 'WTD': 0, 'MTD': 0, 'YTD': 0,}
t15 = {'daily': 0, 'WTD': 0, 'MTD': 0, 'YTD': 0,}
.
.
t25 = {'daily': 0, 'WTD': 0, 'MTD': 0, 'YTD': 0,}
t215 = {'daily': 0, 'WTD': 0, 'MTD': 0, 'YTD': 0,}

then loop through the records, find the schedule for that record (if any, if not move on as mentioned earlier), compare t1 and t2 against the schedule, and increment the appropriate bin counts using a bunch of if statements. Functional, if ugly. But then I got to thinking: I keep hearing about all these fancy numerical analysis tools for python like pandas and numpy - could something like that help? Might there be a way to simply set up a table with "rules" for the columns and rows, and drop my records into the table, having them automatically counted into the proper bins or something? Or am I over thinking this, and the "simple", if ugly approach is best?

-----------------------------------------------
Israel Brewster
Systems Analyst II
Ravn Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7293
-----------------------------------------------







More information about the Python-list mailing list