complaints about no replies last week

Tue Mar 31 16:07:57 EDT 2009

pruebauno at latinmail.com writes:
[...]
> Well since I attracted a couple people's attention I will describe the
> problem in more detail. Describing the problem properly is probably as
> hard as solving it, so excuse me if I struggle a bit.
>
> The problem is for a health insurance company and involves the period
> of time a person is covered. Most insurance companies allow not only
> for the main member to be insured but his family: the spouse and the
> dependents (children). This additional coverage costs extra but less
> than a full new insurance. So for example if Alice buys an insurance
> worth at 100 dollars a month, she can insure her husband Bob for an
> additional 50 dollars. Under certain circumstances Alice may go off
> the insurance and only Bob stays. In that case the price goes back to
> 100 dollars or maybe there is a deal for 80 or something like that. In
> other words the cost of the insurance is dependent on the combination
> of family members that participate in it. Additionally not only do we
> have different family compositions but also different insurance
> products. So you can get medical, dental and vision insurance.
>
> All that data is stored in a database that is not very tidy and looks
> something like this:
>
> First Day of Coverage, Last Day of Coverage, Relationship, Product
> 5/3/2005, 5/3/2005, D, M
> 9/10/2005, 10/10/2005, S, V
> 3/15/2005, 7/15/2005, M, M
> 3/1/2005, 6/1/2005, S, M
> 5/15/2005, 7/20/2005, D, D
> 9/10/2005, 1/1/2140, M, V
> 2/1/2005, 5/3/2005, M, M
>
> Where
> Relationship: M=Main Member, S=Spouse, D=Dependent
> Product: M=Medical, D=Dental, V=Vision
>
> As far as I know at the present time there are no deals based on
> combination of products purchased so we will handle each product
> independently. What I want is a simple algorithm that allows me to
> calculate something like this out of it (hopefully I didn’t make a
> mistake):
>
> Medical:
> 2/1/2005, 2/28/2005, M
> 3/1/2005, 5/2/2005, M&S
> 5/3/2005, 5/3/2005, M&S&D
> 5/4/2005, 6/1/2005, M&S
> 6/2/2005, 7/15/2005, M
>
> Dental:
> 5/15/2005, 7/20/2005, D
>
> Vision:
> 9/10/2005, 10/10/2005, M&S
> 10/11/2005, 1/1/2140, M

OK the approach I describe in my previous email works fine for this
particular problem. I implement it below - the function walk_ivals is at
the heart of it, I've made it as simple as possible and it's pretty
clear that it is O(nlogn).

The function that actually sorts the data is union(), and it's just a
call to walk_ivals with callback the function acc() which is constant
time, therefore union() itself has the same complexity as walk_ivals.

There are no comments - I don't have the time to add any, sorry!

--------------------------------------------------
import datetime
from collections import defaultdict

def walk_ivals(ivals, callback, endvals=(-1, 1)):
    endpoints = [(x, data) for ival, data in ivals for x in zip(ival, endvals)]
    endpoints.sort()
    for (endpoint, endval), data in endpoints:
        callback(endpoint, endval, data)

def union(ivals):
    timelines = defaultdict(list)
    mult = defaultdict(lambda: defaultdict(int))
    def acc(date, step, data):
        rel, prod = data
        old_mult = mult[prod][rel]
        mult[prod][rel] -= step
        if not(old_mult and mult[prod][rel]):
            rels = [rel for rel, m in mult[prod].iteritems() if m]
            if timelines[prod] and timelines[prod][-1][0] == date:
                timelines[prod][-1][1] = rels
            else:
                timelines[prod].append([date, rels])
    walk_ivals(ivals, acc)
    return timelines

test_data = """5/3/2005, 5/3/2005, D, M
9/10/2005, 10/10/2005, S, V
3/15/2005, 7/15/2005, M, M
3/1/2005, 6/1/2005, S, M
5/15/2005, 7/20/2005, D, D
9/10/2005, 1/1/2140, M, V
2/1/2005, 5/3/2005, M, M"""

def parse_date(date_string):
    month, day, year = map(int, date_string.split('/'))
    return datetime.date(year, month, day)

def parse_data(data_string):
    data = []
    for line in data_string.split("\n"):
        start, end, rel, prod = line.split(",")
        start, end = map(parse_date, (start + datetime.timedelta(1), end))
        rel, prod = rel.strip(), prod.strip()
        data.append([(start, end), (rel, prod)])
    return data

def test():
    ivals = parse_data(test_data)
    timelines = union(ivals)
    for prod, timeline in timelines.iteritems():
        print "-"*20
        print "Product", prod
        for date, covers in timeline:
            print date, ' & '.join(covers)

if __name__ == '__main__':
   test()      
--------------------------------------------------

Here is what it outputs:

marigold:junk arno$ python intervals2.py 
--------------------
Product M
2005-02-01 M
2005-03-01 S & M
2005-05-03 S & M & D
2005-05-04 S & M
2005-06-02 M
2005-07-16 
--------------------
Product D
2005-05-15 D
2005-07-21 
--------------------
Product V
2005-09-10 S & M
2005-10-11 M
2140-01-02 

--------------------------------------------------

The date output is slightly different from yours - I didn't realise you
had time intervals and now I don't have the time to change it, although
it's only a cosmetic change (the end-date of each interval is the
start-date of the next one minus 1 day).

HTH

PS: warning - I have done some minor editing of the code after pasting
it, so it might break (although it should be fine).

-- 
Arnaud