ANN: equivalence 0.1

George Sakkis george.sakkis at gmail.com
Mon Jun 2 09:30:46 EDT 2008


On Mon, Jun 2, 2008 at 6:31 AM, Giuseppe Ottaviano <giuott at gmail.com> wrote:
>
> On Jun 1, 2008, at 6:16 PM, George Sakkis wrote:
>
>> Equivalence is a class that can be used to maintain a partition of
>> objects into equivalence sets, making sure that the equivalence
>> properties (reflexivity, symmetry, transitivity) are preserved. Two
>> objects x and y are considered equivalent either implicitly (through a
>> key function) or explicitly by calling merge(x,y).
>
> I think this library would be very useful, and I like the interface (in
> particular the idea of the projecting function),

Thanks for trying it out!

> but I think the algorithm
> is less than optimal. It can show quadratic behavior in some cases:
>
> $ python -m timeit -s 'import equivalence' -s 'eq =
> equivalence.Equivalence()' 'for i in xrange(10000): eq.merge(i, i+1)'
> 10 loops, best of 3: 57.6 msec per loop
>
> $ python -m timeit -s 'import equivalence' -s 'eq =
> equivalence.Equivalence()' 'for i in xrange(10000): eq.merge(i+1, i)'
> 10 loops, best of 3: 2.59 sec per loop

Interesting.. it took me a while to figure out why the second case is
so much slower and you're right, it is indeed quadratic. I don't know
how likely would such pathological cases be in practice, given that
the preferred way to merge a batch of objects is
eq.merge(*xrange(10001)), which is more than 3 times faster than the
non-pathologic first case (and which I optimized even further to avoid
attribute lookups within the loop so it's more like 5 times faster
now). Also the batch version in this case remains linear even if you
merge backwards, eq.merge(*xrange(10000,-1,-1)), or in any order for
that matter.

> Have you considered using the Union-Find algorithm, which would be (almost)
> linear in all cases?

I am familiar with it and I will certainly consider it for the next
version; for now I was primarily interested in functionality (API) and
correctness.

Thanks,
George



More information about the Python-list mailing list