Overlap in python

Bearophile bearophileHUGS at lycos.com
Tue Aug 4 16:54:16 EDT 2009


Jay Bird:
> I've been trying to figure out a simple algorithm on how to combine a
> list of parts that have 1D locations that overlap into a non-
> overlapping list.  For example, here would be my input:
>
> part name   location
> a                  5-9
> b                  7-10
> c                  3-6
> d                  15-20
> e                  18-23
>
> And here is what I need for an output:
> part name   location
> c.a.b            3-10
> d.e               15-23

That's sometimes known as "The Set Union Problem on Intervals".
Knowing the name of a problem is sometimes half the work :-) People
have written papers on this.

This is a O(N^2) force-brute algorithm, you can try it on your data:

data = """part name   location
a                  5-9
b                  7-10
c                  3-6
d                  15-20
e                  18-23"""

pairs = map(str.split, data.splitlines()[1:])
triples = [map(int, i.split("-")) + [name] for (name, i) in pairs]

overlap = lambda x1,y1, x2,y2: y1 >= x2 and x1 <= y2

results = []
for (x1,y1,name) in triples:
    for i, res in enumerate(results):
        if overlap(x1,y1, res[0],res[1]):
            results[i].append(name)
            results[i][0] = min(x1, res[0])
            results[i][1] = max(y1, res[1])
            break
    else:
        results.append([x1, y1, name])

print results
# Output: [[3, 10, 'a', 'b', 'c'], [15, 23, 'd', 'e']]

If you have have a lot of data then it will be too much slow, and you
have to use a more complex algorithm.

If your intervals are many, they are small, and they are spread in a
not too much large range of possible values, you can create an integer
array of indexes, and you can "paint" intervals on it.

Bye,
bearophile



More information about the Python-list mailing list