[Python-ideas] Fast sum() for non-numbers - why so much worries?

Sergey sergemp at mail.ru
Fri Jul 12 07:16:16 CEST 2013


On Jul 10, 2013 Andrew Barnert wrote:

> But you haven't found a workable solution for all of the other
> types people have brought up--or even for a single one of them. So
> that's a serious misrepresentation.

Strings, tuples (2 options), lists (3 options) and your cons-lists
(2 options). What other types have I missed?

Most of those has nothing to do with my suggestion, it's just that
you (or not just you) asked how one can optimize XXX for O(N) sum,
so I found some ways.

>> It's just instead of discussing what is the best way to fix a slowness,
>> I'm spending most time trying to convince people that slowness should
>> be fixed.
>> — sum is slow for lists, let's fix that!
>> — you shouldn't use sum...
>> — why can't I use sum?
>> — because it's slow
> 
> But that's not why you shouldn't use sum.

No, That's the MAIN reason! There's one more, about "+" (and sum)
being confusing for concatenation, and I can understand that, however
it's very weak as long "+" is used for joining sequences. But the
main reason that most people, including you, are pushing is that
sum() is slow. The whole our "you can't make it fast for" thread
is the evidence. Even now:

> Besides, being fast for list and tuple but slow for other
> collection types would be an attractive nuisance.

You're pushing "being slow" as a key argument. If due to some
technical details sum() was fast from the very beginning then nobody
would say that you shouldn't use it. Same as nobody is saying that
you should use reduce() instead of sum() for numbers because sum()
works only for __add__ while reduce works for every operator possible.

People are free to use sum() to __add__ numbers, and use reduce() e.g.
to __xor__ them. Nobody says "you shouldn't use sum() for numbers
because there're operators that don't work with it". Nobody says
that sum() is "attractive nuisance" compared to reduce(). For the
same reason IF sum would be fast for list, but slow for set, people
would use sum() where it's good and won't use it where it's bad.
There's nothing unusual or wrong here.

min() is good for list of tuples, right? max() is also good for list
of tuples? Even any() is good for list of tuples (weird, but good).
Then why sum() is bad for list of tuples? Because it's SLOW!

So, yes, sum() being slow is the main reason. Everything else is
just a "bonus", additional arguments that are supposed to convince
your opponent even more. But speed is the key.

That's why I'm saying so often: if this is the main reason then let's
fix it.

> Your only response to that has been to claim that it can be fast
> for every possible collection type, but it can't. You haven't
> gotten it to work. And there are good reasons to believe it's not
> just hard, but impossible.

Sorry, but can you please quote what response are you referencing?

>> I haven't thought that somebody can truly believe that something should
>> be slow, and will find one excuse after another instead of just fixing
>> the slowness.
>
> Calling recv(1) over and over on a socket is slow. We could fix
> that by adding an implicit buffer to all socket objects. Can you
> believe that someone might object to that patch?

Not sure what you mean, but sockets do have an internal buffer,
its default size is configurable with `sysctl`.

PS: maybe for 10 years people got used to the slow sum() and related
arguments so much that they don't want to lose them? ;)

—— 


More information about the Python-ideas mailing list