ANN: equivalence 0.1

Giuseppe Ottaviano giuott at gmail.com
Mon Jun 2 10:05:01 EDT 2008


>
> Interesting.. it took me a while to figure out why the second case is
> so much slower and you're right, it is indeed quadratic. I don't know
> how likely would such pathological cases be in practice, given that
> the preferred way to merge a batch of objects is
> eq.merge(*xrange(10001)), which is more than 3 times faster than the
> non-pathologic first case (and which I optimized even further to avoid
> attribute lookups within the loop so it's more like 5 times faster
> now). Also the batch version in this case remains linear even if you
> merge backwards, eq.merge(*xrange(10000,-1,-1)), or in any order for
> that matter.

The example just showed what could happen if the merges are done in  
pathological order, it is not about batch merging. I think that  
pathological cases like this indeed show up in real cases: many  
algorithms of near duplicate elimination and clustering reduce to  
finding connected components of a graph whose edges are given as a  
stream, so you can't control their order.
With this implementation, every time a component sized N is given a  
second (or following) argument to merge, you pay Omega(N).

> I am familiar with it and I will certainly consider it for the next
> version; for now I was primarily interested in functionality (API) and
> correctness.

Good :) 



More information about the Python-list mailing list