Efficient counting of results

Peter Otten __peter__ at web.de
Thu Oct 19 15:30:33 EDT 2017


Israel Brewster wrote:

> 
>> On Oct 19, 2017, at 10:02 AM, Stefan Ram <ram at zedat.fu-berlin.de> wrote:
>> 
>> Israel Brewster <israel at ravnalaska.net> writes:
>>> t10 = {'daily': 0, 'WTD': 0, 'MTD': 0, 'YTD': 0,}
>>> increment the appropriate bin counts using a bunch of if statements.
>> 
>>  I can't really completely comprehend your requirements
>>  specification, you might have perfectly described it all and
>>  it's just too complicated for me to comprehend, but I just
>>  would like to add that there are several ways to implement a
>>  "two-dimensional" matrix. You can also imagine your
>>  dictionary like this:
>> 
>> example =
>> { 'd10': 0, 'd15': 0, 'd20': 0, 'd215': 0,
>>  'w10': 0, 'w15': 0, 'w20': 0, 'w215': 0,
>>  'm10': 0, 'm15': 0, 'm20': 0, 'm215': 0,
>>  'y10': 0, 'y15': 0, 'y20': 0, 'y215': 0 }
>> 
>>  Then, when the categories are already in two variables, say,
>>  »a« (»d«, »w«, »m«, or »y«) and »b« (»10«, »15«, »20«, or
>>  »215«), you can address the appropriate bin as
> 
> Oh, I probably was a bit weak on the explanation somewhere. I'm still
> wrapping *my* head around some of the details. That's what makes it fun
> :-) If it helps, my data would look something like this:
> 
> [ (date, key, t1, t2),
>  (date, key, t1, t2)
> .
> .
> ]
> 
> Where the date and the key are what is used to determine what "on-time" is
> for the record, and thus which "late" bin to put it in. So if the date of
> the first record was today, t1 was on-time, and t2 was 5 minutes late,
> then I would need to increment ALL of the following (using your data
> structure from above):
> 
> d10, w10, m10, y10, d25, w25, m25 AND y25

Start with simpler more generic operations. A

def group(rows):
   ...

function that expects rows of the form

(date, key, t)

can be run twice, once with 

summary1 = group((date, key, t1) for date, key, t1, t2 in rows)

and then with t2.

Then only calculate the daily sums as you can derive the weekly, monthly and 
yearly totals by summing over the respective days (you may also get the 
yearly totals by summing over the respective months, but I expect  this to 
be an optimisation with negligable effect).

> Since this record counts not just for the current day, but also for
> week-to-date, month-to-date and year-to-date. Basically, as the time
> categories get larger, the percentage of the total records included in
> that date group also gets larger. The year-to-date group will include all
> records, grouped by lateness, the daily group will only include todays
> records.
> 
> Maybe that will help clear things up. Or not. :-)
>> 
>> example[ a + b ]+= 1
> 
> Not quite following the logic here. Sorry.
> 
>> 
>>  . (And to not have to initialized the entries to zero,
>>  class collections.defaultdict might come in handy.)
> 
> Yep, those are handy in many places. Thanks for the suggestion.
> 
> -----------------------------------------------
> Israel Brewster
> Systems Analyst II
> Ravn Alaska
> 5245 Airport Industrial Rd
> Fairbanks, AK 99709
> (907) 450-7293
> -----------------------------------------------
> 
>> 
>> --
>> https://mail.python.org/mailman/listinfo/python-list
> 





More information about the Python-list mailing list