[Python-ideas] Fast sum() for non-numbers

Tue Jul 9 19:11:06 CEST 2013

On Jul 9, 2013, at 6:42, Sergey <sergemp at mail.ru> wrote:

> On Jul 8, 2013 Andrew Barnert wrote:
> 
>> I'm -1 on adding special-casing for tuples that would not be
>> available for any other immutable type.
> 
> Ok, let's be honest, I don't like that special case too. :(
> But when I had two options:
> 
> 1. Make sum faster for everything BUT tuples and write in a manual:
>    ...
>    sum() is fast for all built-in types except `tuple`. For tuples
>    you have to manually convert it to list, i.e. instead of:
>        sum(list_of_tuples, tuple())
>    you have to write:
>        tuple(sum(map(list,list_of_tuples),[]))
>    or
>        tuple(itertools.chain.from_iterable(list_of_tuples))
>    ...
> 
> 2. Implement just one (!) special case for the only type in python
>   needing it and write:
>    ...
>    sum() is fast for all built-in types!
>    ...
> 
> I chose #2. Tuple is one of the most frequently used types in python,
> and it's the only type that needs such a special case.
> 
> Until someone writes a better solution: Practicality beats purity.
> That was my motivation.

#3 is for the docs to say what they currently say--sum is fast for numbers--but change the "not strings" to "not sequences", and maybe add a note saying _how_ to concatenate sequences (usually join for strings and chain for everything else).

This makes sense with or without an __iadd__ patch (as long as __iadd__ is useful for some number-like types--as I said before, I expect it might be, but I don't actually know).

>> No, you can't. You can do something different, but only by
>> modifying the C source to sum.
>> [...]
>>> I can't make __iadd__ faster, because tuple has
>>> no __iadd__, however I can make a faster __add__.
>> 
>> No, you can't make tuple.__add__ faster either. (If you can,
>> please submit a patch, because that would be useful completely
>> independent of sum.)
> 
> Theoretically it's possible to rewrite a tuple type to internally use
> list type for storage, and additionally have a `localcount` variable
> saying how many items from that list belong to this tuple. Then
> __add__ for such tuple would just create a new tuple with exactly
> same internal list (with incremented ref.count) extended. This way,
> technically, __add__ would modify all the tuples around, but due to
> internal `localcount` variable you won't notice that.

I was going to point out the side effects of such a change, but someone beat me to it.

> Would you like such a patch instead? Would you want to write it? ;)
> 
> It's just this patch only optimizes add, which is ONLY needed for
> many sequential adds, i.e. for sum(), so I thought that it would be
> MUCH easier to add it to sum instead.

And it's even easier to add neither.

>>> Are there any other types (except strings, that are blocked anyway)?
>> 
>> Yes. Every single immutable type.
> 
> Which is just one type — tuple. There's no other summable standard
> types in python having O(N) __add__, is it?

Does nobody ever use types from the stdlib, third-party libs, or their own code in your world? Builtin types are not generally magical. A function that works well for all builtin types, but not types in the stdlib, is a recipe for confusion.

>> So, if you're suggesting that sum can be fast for anything
>> reasonable, you're just wrong.
> 
> I suggested two ways how to do that. First, one can use the approach
> above, i.e. use mutable type internally. Second, for internal cpython
> types we have a simpler option to implement optimization directly
> in sum(). And there may be many others, specific to the types in
> question.

Using a mutable type internally will almost always have side effects, or at least complicate the implementation. What you're suggesting is that theoretically, for some different language that placed a very strange and sometimes hard to meet requirement on all types, sum could be fast for all types. That doesn't mean sum can be fast for all Python types, because Python doesn't, and shouldn't, have such a requirement.

And again, what would be the benefit? That sum could become the obvious way to do concatenation instead of just summing? That's not even desirable, much less worth bending over backward for.

>>> We're talking about python, and discussing use of sum() in python
>>> for such lists in particular.
>> 
>> No. You insisted that every collection type is O(N) summable with
>> your design. Others argued that this isn't true. You asked for an
>> example. I offered cons lists.
> 
> I don't remember saying that every collection type in a world is
> O(N) summable, but ok. Would you agree that all summable built-in
> collection types of python become O(N) summable with "my design"?

Yes, but if it's impossible to add a new collection type that works like tuple, that makes python worse, not better.

> I.e. they were not O(N) summable before my patch, but they are O(N)
> after it. Then why don't you like the patch?
> 
> Because somewhere in the world there could theoretically exist some
> other types that are still not O(N) summable?

No, because all over the world there already actually exist such types, and because it's trivial--and useful--to create a new one.

> Maybe, then we (or
> their authors) will deal with them later (and I tried to show you
> the options for your single linked list example).

And you only came up with options that destroy vital features of the type, making it useless for most people who would want to use it.

But, more importantly, it is very simple to design and implement new collection types in Python today. Adding a new requirement that's hard enough to reach--one that you haven't been able to pull it off for the first example you were given--implies that it would no longer be easy.

>> If you agree that your design is not a general solution for al
>> sequences, then all of your misunderstandings about cons lists are
>> a red herring, and we can drop them.
> 
> I guess you misunderstand "my design" (or whatever you call that).

Your design is the design explicitly described in your email: sum uses __iadd__, and has special casing for at least one type.

The argument against this is that it makes an existing mild attractive nuisance much worse, by implying that sum is actually fast for concatenation in general. Your counter was that sum can be actually fast for concatenation in general. That's not true. If you're now saying that it can't, only for certain types, then you need a new justification for why it isn't an attractive nuisance. 

The one you seem to be implying is "everybody expects non-builtin types to be less usable than builtins", which is wrong.

> Let's put it like that. Currently:
>  The only way to make a type O(N) summable is to implement fast
>  __add__ for it.
> So I suggested:
>  It is often easier to implement fast __iadd__ than fast __add__.
>  So let's change sum so that it took advantage of __iadd__ if it
>  exists.

So far, so good--but again, it would really help your case to find number-like types that this is true for. I already suggested some (np.matrix, for example).

> Then someone said:
>  You still cannot make sum fast for everything, i.e. for tuples
> I understood that as:
>  If you already changing sum() you should make it fast for
>  tuples too, so that we could say "sum() is fast now".

Then you misunderstood it. Tuples were offered as one example, out of many, that won't be fast. Solving that one example by adding a special case in the C code doesn't help the general problem unless you're prepared to do the same for every other example, which is impossible. 

> So I replied:
>  Yes, that patch does not meet "sum() is fast now" goal, because
>  there's one more type `tuple` that is still slow. So, if we want
>  to make sum fast for all built-in types, we must make it fast for
>  tuples too. Here's a small patch, that just adds a special case
>  for tuples, as that is the only type that needs it. This patch
>  can be also extended to other types, e.g. lists and strings.
> 
> Yes, authors of custom types won't have that simple option. But we
> have it

Who is this "we" here? Most users of Python use custom types. That's inherent in an OO language. The stdlib is full of custom types, and so are most third party libs and most applications.

> , so why not use it, if it's MUCH easier than alternatives?

The obvious alternative--just not doing it--is much easier.

> 
>>> It's just you said:
>> [...]
>> First, that wasn't me; please try to keep your attributions straight.
> 
> Oops, sorry, my mistake.
> 
>> But I agree with the sentiment. There is an efficient way to
>> concatenate a lot of cons lists (basically, just fold backward
>> instead of forward), but sum is not it.
> 
> Hm... If you implement fast __radd__ instead of __add__ sum would
> use that, wouldn't it? Is that the easy way you were looking for?

First, how do you propose that sum find out whether adding or radding is faster for a given type?

More importantly: that wouldn't actually do anything in this case. The key to the optimization is doing the sequence of adds in reverse order, not flipping each add.
>> So, whoever said that is right—encouraging people to treat sum()
>> as the preferred way to concatenate large amounts of data is a bad
>> idea.
> 
> Then, what way you suggest to be preferred? For example how would you
> do that in py4k? I guess you would prefer sum() to reject lists and
> tuples as it now rejects strings and use some other function for them
> instead? Or what? What is YOUR preferred way?

I've already answered this, but I'll repeat it.

sum is not the obvious way to concatenate sequences today, and my preferred way is for that to stay true. So, I'm:

* +1 on sum using __iadd__ if it helps actual sums, -0 if it only helps list concatenation.
* -1 on special casing tuple.
* -1 on changing the docs to imply that sum should be used for sequences.
* +0 on changing the docs to say "not sequences" instead of strings, and maybe even expand on it by showing how to concatenate sequences properly.
* -0 on explicitly rejecting sequences or non-numbers or whatever in sum (largely because it's too hard to determine--and explain--what it should try to reject, but also because I don't think it's a common enough problem to be worth a change).
* +0 on moving chain.from_iterable to builtins and renaming it.

In other words, I don't think summing tuples is enough of an attractive nuisance that it's worth bending over backward to prevent it--but that doesn't mean we should bend over backward to improve something people shouldn't be doing, especially since that would make summing many other types into an attractive nuisance that doesn't exist today.

>> Agreed. That makes the APIs a little more complicated (you need to
>> a list and a list::iterator instead of just a node), but that's not
>> a big deal. And, with (list, list::iterator) being equivalent to a
>> node, it leads to exactly the same issues as you started with in
>> having just a node type.
> 
> We have 'list' and 'listiterator', 'tuple' and 'tupleiterator', 'set'
> and 'setiterator'. Nothing unusual here. And no issues about them.

But they aren't the same kind of thing at all. I don't want to explain the differences between what C++ calls iterators and what Python calls iterators unless it's necessary. But briefly, a std::list::iterator is a mutable reference to a node. Exposing that type means--as I've already said twice--that you end up with exactly the same problems you have exposing the node directly. If you don't understand why that's true, that's fine, but please stop ignoring it completely.

>> Yes, deque is a great tool, but it's not the same tool as a linked
>> list, and doesn't support the same algorithms.
> 
> Not all of them, but some. I.e. if you used your cons-lists as queue
> or stack, then deque is a good replacement. 

Well, yes, but a dynamic array like Python's list is also a perfectly good stack. So what?

I honestly can't tell at this point whether you're being deliberately obtuse, or just don't understand the basics of why we have different data structures in the first place.

>>> That wasn't saying "just make it do something different". That was
>>> saying "you can have linked lists in python, that are O(N) summable".
>> 
>> Which is exactly the point you were arguing against. If you now
>> agree with everyone else, fine. There are types that can be
>> efficiently concatenated, but not with sum. That's why everyone
>> else thinks you shouldn't encourage people to use sum for general
>> concatenation.
> 
> Really, I don't understand that point. Are you saying, that sum()
> must remain slow for FREQUENTLY USED standard types just because
> there MAY BE some other types for which it would still be slow?

You're twisting the emphasis drastically, but basically yes.

Today, sum is not the best way to concatenate sequences. Making it work better for some sequences but not others would mean it's still not the best way to concatenate sequences, but it would _appear_ to be. That's the very definition of an attractive nuisance.

>> Using a global variable (or a class attribute, which is the
>> same thing) means that sum isn't reentrant, or thread-safe, or
>> generator-safe.
> 
> Is it now? No? Then what changes?

Yes, it is now. So that's what changes.

Again, this is something general and basic--operations that use global variables are not reentrant--and I can't tell whether you're being deliberately obtuse or whether you really don't understand that.