[Pythonmac-SIG] Sets and Speed

Yair Benita y.benita at wanadoo.nl
Mon May 30 23:48:04 CEST 2005


Dear Bob, your answer makes me feel very stupid and I think you're  
even angry at me for appearing stupid, or maybe for hinting that  
python was not good enough. Don't get me wrong, I love python, and  
will not switch to assembly any time soon....
Of course, I don't use that kind of code it was only an example to  
make a point that the union command was very slow when the sets get  
big. I guess it wasn't such a good example.

I usually combine 2 sets using union but one set is much smaller than  
the other. It is better to do it using "add", as suggested, and that  
is my solution. I didn't realize that using "add" will not create  
redundancy, as you said these are like keys in a dictionary so no  
duplicates can occur. I suppose "union" is worth doing when I have  
two big sets to be combined.

I will be more careful when I post next time. promise.

Yair

On May 30, 2005, at 6:03 , Bob Ippolito wrote:

>
> On May 30, 2005, at 7:44 AM, Yair Benita wrote:
>
>
>> My research involves genomic research and the use of sets (recently
>> introduced in version 2.4) makes my life easier in a lot of ways.  
>> However, I
>> noticed that working with large set slows the script to unbearable  
>> speed.
>>
>
> Sets were actually introduced in Python 2.3, in the sets module,  
> but were introduced as a built-in in Python 2.4.
>
>
>> Below I compare two simple scripts, one makes use of a list and  
>> the other
>> makes use of a set. The difference of 20 seconds may not be much,  
>> but let me
>> tell you that this difference grows exponentially. When my sets  
>> reach more
>> that 100,000 elements, a union command is painfully slow. True, a  
>> union
>> command of a set may be much more complicated than using a list  
>> but the time
>> difference is simply too big for me.
>>
>> Any thoughts/suggestions?
>>
>
> Yeah, don't write code like that.  If you don't want exponential  
> time, don't write algorithms that will take exponential time :)   
> Use the right algorithm.
>
> Your two examples aren't equivalent.  It has exponential time  
> because union takes two sets to return a third.  In your example,  
> you are uselessly creating two new sets (and a single-element list)  
> on every iteration: one set with one item, and one set with N  
> items.  What you really want to be using is the add or update  
> method, which mutates the set in-place.  The list example is  
> nowhere near equivalent.  To compare apples to apples the list  
> example should be:
>
> if i not in x:
>     x = x + [i]
>
> and that would be much slower than either of your examples!
>
> On my 1ghz G4 laptop:
>
> Your set example: 87.921s
> Your list example: 51.542s
> My set example (using x.add(i)): 0.095s
>
>
>> I suppose that the obvious solution is to work with lists most of  
>> the time
>> and use the sets only when I really need them. But then, what's  
>> the point of
>> having those set?
>>
>
> What's the point of having Python if you can write faster code in  
> assembly?
>
> Where did you learn to use lists as a set anyway?  The canonical  
> pre-2.3 example was to use a dict as a set (since the keys are a set).
>
> -bob
>
>



More information about the Pythonmac-SIG mailing list