mapping subintervals

Matteo mahall at ncsa.uiuc.edu
Wed Jun 13 17:08:12 EDT 2007


On Jun 13, 2:32 pm, Lee Sander <lesa... at gmail.com> wrote:
> hi,
> I have the following problem which is turning out to be non-trivial. I
> realize that this is not
> exactly a python problem but more of an algorithm problem -- but I
> post it here because
> I want to implement this in python.
>
> I want to write a code that given an interval (integer tuple:
> start,stop) will find which other
> interval it matches to. Here is an example,
> suppose I have the following NON-OVERLAPPING intervals
>
> 10 - 21   ==> 'a1'
> 34 - 51   ==> 'a2'
> 77 - 101 ==> 'a3'
> etc
>
> So, suppose I'm give an interval such as (42,50), I should go through
> the list and map it to "a2" etc
> if the region is a subset of an exiting interval.  If the query
> interval does not exist in the table or
> maps to two or more intervals (eg, (45, 80)) the program return a
> None.
...snip...
> Many thanks
> Lee

OK - I'm going to assume your intervals are inclusive (i.e. 34-51
contains both 34 and 51).

If your intervals are all really all non-overlapping, one thing you
can try is to put all the endpoints in a single list, and sort it.
Then, you can use the bisect module to search for intervals, which
will give you a logarithmic time algorithm.

Here, I'm going to assume you just need the index of the containing
interval. If you really need a name (i.e. 'a1' or 'a2'), you can use a
list of names, and index into that.

I hope those assumptions are valid! if so, the following should work:

from bisect import bisect

# assume initial intervals are sorted
intervals=[(10,21),(34,51), (77,101)]
# get a sorted list of endpoints
endpts=sorted([a for a,b in intervals]+[b for a,b in intervals])

def test_interval(ivl,endpts):
  # Find where the left endpoint of ivl would lie in the list
  # i.e. the index of the first element greater than ivl[0]
  l_idx=bisect(endpts,ivl[0])
  # If l_idx is even, then it lies between two intervals, and thus
  # is not contained in any interval. Otherwise it returns the index
  if l_idx % 2 == 0: return None
  # if l_idx is out of bounds (i.e. ==len(endpts)), then ivl is
  # not contained in any interval (too large)
  if l_idx==len(endpts): return None
  # Now, all we need to do is check that the right endpoint of ivl is
  # less than or equal to the corresponding endpoint in the list.
  # Happily, that endpoint is at index l_idx
  if ivl[1]<=endpts[l_idx]:
    # So return the index of the interval:
    return (l_idx-1)/2
  else:
    return None


Then, from a shell:
>>> print tst.test_interval((0,13),endpts)
None
>>> print tst.test_interval((15,21),endpts)
0
>>> print tst.test_interval((35,40),endpts)
1
>>> print tst.test_interval((40,80),endpts)
None
>>> print tst.test_interval((109,200),endpts)
None




More information about the Python-list mailing list