merging intervals repeatedly

Magdoll magdoll at gmail.com
Tue Mar 11 18:41:24 EDT 2008


Hi,

I have to read through a file that will give me a bunch of intervals.
My ultimate goal is to produce a final set of intervals such that not
two intervals overlap by more than N, where N is a predetermined
length.

For example, I could read through this input:
(1,10), (3,15), (20,30),(29,40),(51,65),(62,100),(50,66)

btw, the input is not guaranteed to be in any sorted order.

say N = 5, so the final set should be
(1,15), (20, 30), (29, 40), (50, 100)

Is there already some existing code in Python that I can easily take
advantage of to produce this? Right now I've written my own simple
solution, which is just to maintain a list of the intervals. I can use
the Interval module, but it doesn't really affect much. I read one
interval from the input file at a time, and use bisect to insert it in
order. The problem comes with merging, which sometimes can be
cascading.

ex:
read (51,65) ==> put (51,65) in list
read (62,100) ==> put (62,100) in list (overlap only be 4 <= N)
read (50,66) ==> merge with (51,65) to become (50,66) ==> now can
merge with (62,100)

Of course I can check for cascading merges by pivoting around the
position where the insertion would've taken place...but just
wondering, is there some other nice way to do this? I also intuitively
don't sense a need for using trees, unless someone's already written
it, the interface is easy to use, and it won't result in more insert/
delete/structure changes that nullifies its beauty...(also I'm not
using the output list for searching)


Thanks in advance.




More information about the Python-list mailing list