[Python-ideas] Fast sum() for non-numbers

Tue Jul 9 15:42:35 CEST 2013

On Jul 8, 2013 Andrew Barnert wrote:

> I'm -1 on adding special-casing for tuples that would not be
> available for any other immutable type. 

Ok, let's be honest, I don't like that special case too. :(
But when I had two options:

1. Make sum faster for everything BUT tuples and write in a manual:
    ...
    sum() is fast for all built-in types except `tuple`. For tuples
    you have to manually convert it to list, i.e. instead of:
        sum(list_of_tuples, tuple())
    you have to write:
        tuple(sum(map(list,list_of_tuples),[]))
    or
        tuple(itertools.chain.from_iterable(list_of_tuples))
    ...

2. Implement just one (!) special case for the only type in python
   needing it and write:
    ...
    sum() is fast for all built-in types!
    ...

I chose #2. Tuple is one of the most frequently used types in python,
and it's the only type that needs such a special case.

Until someone writes a better solution: Practicality beats purity.
That was my motivation.

> No, you can't. You can do something different, but only by
> modifying the C source to sum.
> [...]
>> I can't make __iadd__ faster, because tuple has
>> no __iadd__, however I can make a faster __add__.
> 
> No, you can't make tuple.__add__ faster either. (If you can,
> please submit a patch, because that would be useful completely
> independent of sum.)

Theoretically it's possible to rewrite a tuple type to internally use
list type for storage, and additionally have a `localcount` variable
saying how many items from that list belong to this tuple. Then
__add__ for such tuple would just create a new tuple with exactly
same internal list (with incremented ref.count) extended. This way,
technically, __add__ would modify all the tuples around, but due to
internal `localcount` variable you won't notice that.

Would you like such a patch instead? Would you want to write it? ;)

It's just this patch only optimizes add, which is ONLY needed for
many sequential adds, i.e. for sum(), so I thought that it would be
MUCH easier to add it to sum instead.

>> Are there any other types (except strings, that are blocked anyway)?
> 
> Yes. Every single immutable type.

Which is just one type — tuple. There's no other summable standard
types in python having O(N) __add__, is it?

> blist.btuple has the same problem.

Has it? I took just a brief look in its source and it seems that it
already uses blist internally, so it can implement fast __add__ on
its own (i.e. using the idea I described above).

> You wanted sets to be addable? Then frozenset would have this
> problem.

But they're not addable, so still not a problem. :)

> Strings obviously have this problem, and any third-party immutable
> string type will as well.

And strings are blocked by sum(), so no performance problems for them.
(Third-party immutable strings?)

> So, if you're suggesting that sum can be fast for anything
> reasonable, you're just wrong.

I suggested two ways how to do that. First, one can use the approach
above, i.e. use mutable type internally. Second, for internal cpython
types we have a simpler option to implement optimization directly
in sum(). And there may be many others, specific to the types in
question.

>> We're talking about python, and discussing use of sum() in python
>> for such lists in particular.
> 
> No. You insisted that every collection type is O(N) summable with
> your design. Others argued that this isn't true. You asked for an
> example. I offered cons lists.

I don't remember saying that every collection type in a world is
O(N) summable, but ok. Would you agree that all summable built-in
collection types of python become O(N) summable with "my design"?

I.e. they were not O(N) summable before my patch, but they are O(N)
after it. Then why don't you like the patch?

Because somewhere in the world there could theoretically exist some
other types that are still not O(N) summable? Maybe, then we (or
their authors) will deal with them later (and I tried to show you
the options for your single linked list example). After all, they
were not O(N) summable before this patch anyway. But they MAY BECOME
O(N) summable after it.

> If you agree that your design is not a general solution for al
> sequences, then all of your misunderstandings about cons lists are
> a red herring, and we can drop them.

I guess you misunderstand "my design" (or whatever you call that).
Let's put it like that. Currently:
  The only way to make a type O(N) summable is to implement fast
  __add__ for it.
So I suggested:
  It is often easier to implement fast __iadd__ than fast __add__.
  So let's change sum so that it took advantage of __iadd__ if it
  exists.

Then someone said:
  You still cannot make sum fast for everything, i.e. for tuples
I understood that as:
  If you already changing sum() you should make it fast for
  tuples too, so that we could say "sum() is fast now".
So I replied:
  Yes, that patch does not meet "sum() is fast now" goal, because
  there's one more type `tuple` that is still slow. So, if we want
  to make sum fast for all built-in types, we must make it fast for
  tuples too. Here's a small patch, that just adds a special case
  for tuples, as that is the only type that needs it. This patch
  can be also extended to other types, e.g. lists and strings.

Yes, authors of custom types won't have that simple option. But we
have it, so why not use it, if it's MUCH easier than alternatives?

>>It's just you said:
> [...]
> First, that wasn't me; please try to keep your attributions straight.

Oops, sorry, my mistake.

> But I agree with the sentiment. There is an efficient way to
> concatenate a lot of cons lists (basically, just fold backward
> instead of forward), but sum is not it.

Hm... If you implement fast __radd__ instead of __add__ sum would
use that, wouldn't it? Is that the easy way you were looking for?

> So, whoever said that is right—encouraging people to treat sum()
> as the preferred way to concatenate large amounts of data is a bad
> idea.

Then, what way you suggest to be preferred? For example how would you
do that in py4k? I guess you would prefer sum() to reject lists and
tuples as it now rejects strings and use some other function for them
instead? Or what? What is YOUR preferred way?

> Agreed. That makes the APIs a little more complicated (you need to
> a list and a list::iterator instead of just a node), but that's not
> a big deal. And, with (list, list::iterator) being equivalent to a
> node, it leads to exactly the same issues as you started with in
> having just a node type.

We have 'list' and 'listiterator', 'tuple' and 'tupleiterator', 'set'
and 'setiterator'. Nothing unusual here. And no issues about them.

> Yes, deque is a great tool, but it's not the same tool as a linked
> list, and doesn't support the same algorithms.

Not all of them, but some. I.e. if you used your cons-lists as queue
or stack, then deque is a good replacement.

>> That wasn't saying "just make it do something different". That was
>> saying "you can have linked lists in python, that are O(N) summable".
> 
> Which is exactly the point you were arguing against. If you now
> agree with everyone else, fine. There are types that can be
> efficiently concatenated, but not with sum. That's why everyone
> else thinks you shouldn't encourage people to use sum for general
> concatenation.

Really, I don't understand that point. Are you saying, that sum()
must remain slow for FREQUENTLY USED standard types just because
there MAY BE some other types for which it would still be slow?

> Using a global variable (or a class attribute, which is the
> same thing) means that sum isn't reentrant, or thread-safe, or
> generator-safe.

Is it now? No? Then what changes?

(BTW, your __iadd__ becomes not-thread-safe, not sum)

PS: Talking about all collections in the world, theoretically it may
be possible to extend my "special case" to all collection types in
the world, but there's a small issue with that idea that I don't know
how to handle...