tricky time interval billing problem

Steve Holden steve at holdenweb.com
Mon Dec 5 22:29:30 EST 2005


pruebauno at latinmail.com wrote:
> I am currently working on a tricky problem at work. I googled around a
> bit, but "time intervals" did not come up with anything useful.
> Although I have some rough idea of how I could solve it, I still would
> value some input.
> 
> I have information of (It has only couple dozen entries.)
> ServiceNum, DollarCost
> 
> and a input database table in the form (several GBytes):
> ClientNumber (CN), BeginDate(BD), EndDate(ED),
> ServiceNumber_Purchased(SN)
> --Date intervals are always [closed, closed]
> 
> The output is:
> ClientNumber(CN), BeginDate(BD), Enddate(ED), DollarCost(DC)
> 
> With the following requirements:
> 1) The input dates can be overlapping dates.
> The output has to be non-overlapping "broken up" dates
> 
> Example: (assuming SN 1=$10 and SN 2=$20)
> input (CN, BD, ED ,SN):
> 10, 1/1/2001,1/1/2005,1
> 10, 1/1/2002,1/1/2004,2
> 
> should result in:
> output (CN, BD, ED, DC):
> 10, 1/1/2001, 12/31/2001, $10
> 10, 1/1/2002, 1/1/2004, $30
> 10, 1/2,2004, 1/1/2005, $10
> 
> 2) if the same service is purchased twice for an interval it should be
> billed only once
> Example:
> input:
> 10, 1/1/2001, 1/1/2005, 1
> 10, 1/1/2004, 1/1/2007, 1
> 
> output:
> 10, 1/1/2001, 1/1/2007, $10
> 
> Here are my thoughts about a solution so far:
> 
> 1. fetch the input data sorted by clientNumber, Begindate, Enddate and
> ServiceNumber
> 
> 2. read the row, store as temporary good interval, then read another
> row
> 
> 3. if new begin-date is bigger than previous good interval end-date (or
> previously saved end-date), output previous good interval, start new
> good interval
> 
> 4. else create new good interval entry with good interval begin-date
> and current row begin-date, store good interval end-date into a list
> with bisect module or something (so we always compare against smallest
> end-date).
> 
> 5. Use "bitwise or" on a service bitfield to add services and caluate
> the total
> 
> As you can see my algorithm is a bit scetchy right now, but at this
> point I am more interested to determine if the bisect module would be
> the best way to approach the date interval problem or if I should use
> some other data structure instead (stack, queue, set,...). I am working
> on a Python proof of concept now. It is likely that the company will
> want a C++ version of it later (yikes).
> 
> nes
> 
First of all, you need to use ordering to ensure that the database gives 
you the most convenient order for processing, as this will make your 
computations much easier. So I'd suggest sorting by clientNumber, 
ServiceNumber, Begindate and Enddate. That way you can consider each 
service separately.

Then all you need to do is accumulate the entries where the clientNumber 
and ServiceNumber are the same and the start date of the second is less 
than or equal to the end date of the first, repeatedly until either 
there's no date overlap or a new service and/or client is started.

This shouldn't need any intermediate storage of results: if the row 
you've just read can be used to extend the current range then extend it, 
otherwise emit the current range and replace it with the new row.

Or is there something I've failed to understand about how you want to 
process the data?

regards
  Steve
-- 
Steve Holden       +44 150 684 7255  +1 800 494 3119
Holden Web LLC                     www.holdenweb.com
PyCon TX 2006                  www.python.org/pycon/




More information about the Python-list mailing list