[Python-ideas] Support multiplication for sets

Steven D'Aprano steve at pearwood.info
Sat Oct 8 04:13:58 CEST 2011


Guido van Rossum wrote:
> On Fri, Oct 7, 2011 at 5:23 PM, Steven D'Aprano <steve at pearwood.info> wrote:
>> (2) Cartesian product is potentially very expensive. The Cartesian product
>> of a moderate-sized set and another moderate-sized set could turn out to be
>> a HUGE set.
>>
>> This is not a fatal objection, since other operations in Python are
>> potentially expensive:
>>
>> alist*10000000
>>
>> but at least it looks expensive. You're multiplying by a big number, of
>> course you're going to require a lot of memory. But set multiplication can
>> very easily creep up on you:
>>
>> aset*bset
>>
>> will have size len(aset)*len(bset) which may be huge even if neither set on
>> their own is. Better to keep it as a lazy iterator rather than try to
>> generate a potentially huge set in one go.
> 
> I'm not defending the Cartesian product proposal, but this argument is
> just silly. What if the first example was written
> 
> alist * n
> 
> ? Does that look expensive?


I didn't say it was a *good* argument <wink>

I already acknowledged that there are expensive operations in Python, 
and some of them are done by operators. Perhaps I'm just over-sensitive 
to the risk of large Cartesian products, having locked up my desktop 
with a foolish list(product(seta, setb)) in exactly the circumstances I 
described above: both sets were moderate sizes, and it never dawned on 
me until my PC ground to a halt that the product would be so much bigger.

(I blame myself for this error: I should know better than to carelessly 
pass an iterator to list without thinking, which is exactly what I did.)

In my experience, most uses of list multiplication look something like this:

[0]*len(arg)

which is not huge except in the extreme case that arg is already huge. 
But the typical use of set multiplication is surely going to be 
something like:

arg1*arg2

which may be huge even if neither of the args are. I don't think 
Cartesian product is important enough, or fundamental enough, to justify 
  making it easier to inadvertently generate a huge set by mistake. That 
was all I tried to say.



-- 
Steven



More information about the Python-ideas mailing list