[Pythonmac-SIG] Sets and Speed

Bob Ippolito bob at redivi.com
Mon May 30 18:03:30 CEST 2005


On May 30, 2005, at 7:44 AM, Yair Benita wrote:

> My research involves genomic research and the use of sets (recently
> introduced in version 2.4) makes my life easier in a lot of ways.  
> However, I
> noticed that working with large set slows the script to unbearable  
> speed.

Sets were actually introduced in Python 2.3, in the sets module, but  
were introduced as a built-in in Python 2.4.

> Below I compare two simple scripts, one makes use of a list and the  
> other
> makes use of a set. The difference of 20 seconds may not be much,  
> but let me
> tell you that this difference grows exponentially. When my sets  
> reach more
> that 100,000 elements, a union command is painfully slow. True, a  
> union
> command of a set may be much more complicated than using a list but  
> the time
> difference is simply too big for me.
>
> Any thoughts/suggestions?

Yeah, don't write code like that.  If you don't want exponential  
time, don't write algorithms that will take exponential time :)  Use  
the right algorithm.

Your two examples aren't equivalent.  It has exponential time because  
union takes two sets to return a third.  In your example, you are  
uselessly creating two new sets (and a single-element list) on every  
iteration: one set with one item, and one set with N items.  What you  
really want to be using is the add or update method, which mutates  
the set in-place.  The list example is nowhere near equivalent.  To  
compare apples to apples the list example should be:

if i not in x:
     x = x + [i]

and that would be much slower than either of your examples!

On my 1ghz G4 laptop:

Your set example: 87.921s
Your list example: 51.542s
My set example (using x.add(i)): 0.095s

> I suppose that the obvious solution is to work with lists most of  
> the time
> and use the sets only when I really need them. But then, what's the  
> point of
> having those set?

What's the point of having Python if you can write faster code in  
assembly?

Where did you learn to use lists as a set anyway?  The canonical  
pre-2.3 example was to use a dict as a set (since the keys are a set).

-bob



More information about the Pythonmac-SIG mailing list