efficient interval containment lookup

Per Freem perfreem at yahoo.com
Mon Jan 12 17:49:43 EST 2009


thanks for your replies -- a few clarifications and questions. the
is_within operation is containment, i.e. (a,b) is within (c,d) iff a
>= c and b <= d. Note that I am not looking for intervals that
overlap... this is why interval trees seem to me to not be relevant,
as the overlapping interval problem is way harder than what I am
trying to do. Please correct me if I'm wrong on this...

Scott Daniels, I was hoping you could elaborate on your comment about
bisect. I am trying to use it as follows: I try to grid my space
(since my intervals have an upper and lower bound) into segments (e.g.
of 100) and then I take these "bins" and put them into a bisect list,
so that it is sorted. Then when a new interval comes in, I try to
place it within one of those bins.  But this is getting messy: I don't
know if I should place it there by its beginning number or end
number.  Also, if I have an interval that overlaps my boundaries --
i.e. (900, 1010) when my first interval is (0, 1000), I may miss some
items from listB when i make my count.  Is there an elegant solution
to this?  Gridding like you said seemed straight forward but now it
seems complicated..

I'd like to add that this is *not* a homework problem, by the way.

On Jan 12, 4:05 pm, Robert Kern <robert.k... at gmail.com> wrote:
> [Apologies for piggybacking, but I think GMane had a hiccup today and missed the
> original post]
>
> [Somebody wrote]:
>
> >> suppose I have two lists of intervals, one significantly larger than
> >> the other.
> >> For example listA = [(10, 30), (5, 25), (100, 200), ...] might contain
> >> thousands
> >> of elements while listB (of the same form) might contain hundreds of
> >> thousands
> >> or millions of elements.
> >> I want to count how many intervals in listB are contained within every
> >> listA. For example, if listA = [(10, 30), (600, 800)] and listB =
> >> [(20, 25), (12, 18)] is the input, then the output should be that (10,
> >> 30) has 2 intervals from listB contained within it, while (600, 800)
> >> has 0. (Elements of listB can be contained within many intervals in
> >> listA, not just one.)
>
> Interval trees.
>
> http://en.wikipedia.org/wiki/Interval_tree
>
> --
> Robert Kern
>
> "I have come to believe that the whole world is an enigma, a harmless enigma
>   that is made terrible by our own mad attempt to interpret it as though it had
>   an underlying truth."
>    -- Umberto Eco




More information about the Python-list mailing list