Efficient counting of results

Fri Oct 20 20:10:24 EDT 2017

On Fri, 20 Oct 2017 09:05:15 -0800, Israel Brewster wrote:

> On Oct 19, 2017, at 5:18 PM, Steve D'Aprano <steve+python at pearwood.info>
> wrote:
>> What t1 and t2 are, I have no idea. Your code there suggests that they
>> are fields in your data records, but the contents of the fields, who
>> knows?
> 
> t1 and t2 are *independent* timestamp fields. My apologies - I made the
> obviously false assumption that it was clear they were timestamps, or at
> least times based on the fact I was calculating "minutes late" based on
> them.

It wasn't clear to me whether they were timestamps, flags, or something 
else. For example, you said:

    "if the date of the first record was today, t1 was on-time,
    and t2 was 5 minutes late"

which suggested to me that the first record is a timestamp, and t1 and t2 
were possibly enums or flags:

(date=Date(2017, 10, 21), key=key, t1=ON_TIME, t2=FIVE_MINUTES_LATE)

or possibly:

(date=Date(2017, 10, 21), key=key, t1=True, t2=False)

for example.

[...]
> Easily: because the record contains two DIFFERENT times. Since you want
> more concrete, we're talking departure and arrival times here. Quite
> easy to depart on-time, but arrive late, or depart late but arrive
> on-time.

Ah, the penny drops!

If you had called them "arrival" and "departure" instead of "t1" and 
"t2", it would have been significantly less mysterious.

Sometimes a well-chosen variable name is worth a thousand words of 
explanation.

>> It also contradicts your statement that it is *date* and *key* that
>> determines which late bin to use.
> 
> I never made such a statement. I said they are used to determine "WHAT
> on-time IS for the record", not WHETHER the record is on-time or not,
> and certainly not which late bin to use. To put it a different way,
> those are the key to a lookup table that tells me what T1 and T2 are
> *supposed* to be in order for *each one* to be on time.

Ah, that makes sense now. Thank you for explaining.

[...]
>> Rather, it seems that date and key are irrelevant and can be ignored,
>> it is only t1 and t2 which determine which late bins to update.
> 
> Except that then we have no way to know what t1 and t2 *should* be.

Yes, that makes sense now. Your example of the driver runs really helped 
clarify what you are computing.

> You
> apparently made the assumption that t1 and t2 should always be some
> fixed value.

I tried to interpret your requirements as best I could from your 
description. Sorry that I failed so badly.

[...]
> Perhaps a better approach to explaining is to pose the question the
> report is trying to answer:

That would have been helpful.

[...]
> As Stefan Ram pointed out, there is nothing wrong with the solution I
> have: simply using if statements around the calculated lateness of t1
> and t2 to increment the appropriate counters. I was just thinking there
> might be tools to make the job easier/cleaner/more efficient. From the
> responses I have gotten, it would seem that that is likely not the case,
> so I'll just say "thank you all for your time", and let the matter rest.

No problem. Sorry I couldn't be more helpful and glad you have a working 
solution.

-- 
Steven D'Aprano