tricky time interval billing problem

pruebauno at latinmail.com pruebauno at latinmail.com
Mon Dec 5 15:30:38 EST 2005


I am currently working on a tricky problem at work. I googled around a
bit, but "time intervals" did not come up with anything useful.
Although I have some rough idea of how I could solve it, I still would
value some input.

I have information of (It has only couple dozen entries.)
ServiceNum, DollarCost

and a input database table in the form (several GBytes):
ClientNumber (CN), BeginDate(BD), EndDate(ED),
ServiceNumber_Purchased(SN)
--Date intervals are always [closed, closed]

The output is:
ClientNumber(CN), BeginDate(BD), Enddate(ED), DollarCost(DC)

With the following requirements:
1) The input dates can be overlapping dates.
The output has to be non-overlapping "broken up" dates

Example: (assuming SN 1=$10 and SN 2=$20)
input (CN, BD, ED ,SN):
10, 1/1/2001,1/1/2005,1
10, 1/1/2002,1/1/2004,2

should result in:
output (CN, BD, ED, DC):
10, 1/1/2001, 12/31/2001, $10
10, 1/1/2002, 1/1/2004, $30
10, 1/2,2004, 1/1/2005, $10

2) if the same service is purchased twice for an interval it should be
billed only once
Example:
input:
10, 1/1/2001, 1/1/2005, 1
10, 1/1/2004, 1/1/2007, 1

output:
10, 1/1/2001, 1/1/2007, $10

Here are my thoughts about a solution so far:

1. fetch the input data sorted by clientNumber, Begindate, Enddate and
ServiceNumber

2. read the row, store as temporary good interval, then read another
row

3. if new begin-date is bigger than previous good interval end-date (or
previously saved end-date), output previous good interval, start new
good interval

4. else create new good interval entry with good interval begin-date
and current row begin-date, store good interval end-date into a list
with bisect module or something (so we always compare against smallest
end-date).

5. Use "bitwise or" on a service bitfield to add services and caluate
the total

As you can see my algorithm is a bit scetchy right now, but at this
point I am more interested to determine if the bisect module would be
the best way to approach the date interval problem or if I should use
some other data structure instead (stack, queue, set,...). I am working
on a Python proof of concept now. It is likely that the company will
want a C++ version of it later (yikes).

nes




More information about the Python-list mailing list