complaints about no replies last week

Arnaud Delobelle arnodel at googlemail.com
Tue Mar 31 02:56:06 EDT 2009



Arnaud Delobelle wrote:

> pruebauno at latinmail.com writes:
> [...]
> > I myself asked about how to write a library to efficiently do union
> > and intersection of sets containing time intervals some time ago on
> > this list and got little to no answers. It is a tricky problem. Since
> > I was getting paid I got an O(n*n) solution working. People on this
> > list on the other hand do not get paid and answer whatever strikes
> > their fancy. Sometimes the question is hard or confusing and nobody is
> > motivated enough to answer.
>
> I wasn't around when you posted this I guess. Do you mean intervals sets
> on the (real) number line such as:
>
>       1-3, 6-10 meaning all numbers between 1 and 3 and all numbers
>       between 6 and 10.
>
> In this case I think you can achieve union and intersection in O(nlogn)
> where n is the total number of intervals in the interval sets to unify
> or intersect. There is an implementation below. I have chosen a very
> simple data structure for interval sets: an interval set is the list of
> its endpoints. E.g.
>
>     1-3, 6-10 is the list [1, 3, 6, 10]
>
> This means that I can't specify whether an interval is closed or open.
> So in the implementation below all intervals are assumed to be open.
> The method could be made to work for any kind of intervals with the same
> complexity, there would just be a few more LOC.  I'm focusing on the
> principle - here it is:
>
>
> --------------------------------------------------
> # Implementation of union and intersection of interval sets.
>
> from itertools import *
>
> def combine(threshold, intsets):
>     endpoints = sorted(chain(*imap(izip, intsets, repeat(cycle([1,-1])))))
>     height = 0
>     compound = []
>     for x, step in endpoints:
>         old_height = height
>         height += step
>         if max(height, old_height) == threshold:
>             compound.append(x)
>     return compound
>
> def union(*intsets):
>     return combine(1, intsets)
>
> def intersection(*intsets):
>     return combine(len(intsets), intsets)
>
> # tests
>
> def pretty(a):
>     a = iter(a)
>     return ', '.join("%s-%s" % (a, b) for a, b in izip(a, a))
>
> tests = [
>     ([1, 5, 10, 15], [3, 11, 13, 20]),
>     ([2, 4, 6, 8], [4, 7, 10, 11]),
>     ([0, 11], [5, 10, 15, 25], [7, 12, 13, 15]),
>     ]
>
> for intsets in tests:
>     print "sets: ", "; ".join(imap(pretty, intsets))
>     print "union: ", pretty(union(*intsets))
>     print "intersection: ", pretty(intersection(*intsets))
>     print "-"*20
> --------------------------------------------------
>
> Is this what you were looking for?
>
> --
> Arnaud

I realised after posting last night that I must be

(1) solving the wrong problem
(2) solving it badly

- My implementation of the combine() function above is O(nlogn)
(because of the sorted() call) whereas it could be O(n) by iterating
over the interval in the parallel manner, hence (2).  This would make
union() and intersection() O(n).

- As the problem was solved by the OP in O(n^2) I must be solving the
wrong problem (1).

I apologise for this.

However it was a nice and compact implementation IMHO :)

--
Arnaud



More information about the Python-list mailing list