[Python-ideas] Fast sum() for non-numbers

Sergey sergemp at mail.ru
Mon Jul 8 22:22:34 CEST 2013


On Jul 5, 2013 Andrew Barnert wrote:

>> Yes, you can! You just need to implement a fast __iadd__ or __add__
>> for your type to make it O(n) summable.
> 
> Well, in Python 3.3, or even 2.3, you just need to implement a
> fast __add__. So, if that's an acceptable answer, then we don't need
> to do anything.
[...]
>> And you can always do that, can't you?
> 
> No. You can't implement a fast __iadd__ for tuple.

Well... Yes, I can! I can't make __iadd__ faster, because tuple has
no __iadd__, however I can make a faster __add__. But as long as sum()
is the only (?) function suffering from this problem it was easier to
do that optimization in sum() instead.

> If that's going to be extendable to other types

Are there any other types (except strings, that are blocked anyway)?

Looks like tuple is the only built-in type having no fast __iadd__,
and sum() is the only function suffering from that. So, to make sum()
"fast for everything by default" we only need to make sum use __iadd__
and write a special case for sum+tuples.

>>   To make advantage of faster __iadd__ implementation sum() uses it
>>   if possible. sum() has linear time complexity for built-in types
>>   and all types providing constant-time `__iadd__` or `__add__`.  

> Note that integer addition isn't actually constant, it's linear on
> the word size of the integers. Which means sum isn't linear for
> integers.

That still makes sum O(N) with N being total number of bytes in all
integers. Or how do you suggest to explain sum() complexity in docs?
That text seems fair enough for me. Something like:
    sum() has complexity O(N)*C where N is the total number of
    elements and C is complexity of single __iadd__ operation or
    __add__ operation if __iadd__ is not supported.
looks too cumbersome.

>> Personally I don't think such implementation would be very useful.
>
> Given that nearly every program every written in Lisp and its
> successors makes heavy use of such an implementation, I submit
> that your intuition may be wrong.

They don't have much choice. And they're not using sum().

We're talking about python, and discussing use of sum() in python for
such lists in particular. It's just you said:

 > I'm hostile to any attempt to encourage people to treat sum() as
 > the preferred way to concatenate large amounts of data, because
 > that will surely lead them into bad habits and will end with them
 > trying to sum() a lot of tuples or linked lists or something and
 > getting O(n**2) performance.

Which implies that using such implementation with sum could lead to
O(N**2) performance. But it could not, because such implementation
is not addable. So, no problem.

To have a problem you must modify your implementation. And if you're
changing it anyway, you have the power to solve this problem too.

Explicitly writing about sum() complexity in documentation makes sure
that you're aware of possible slowness.

>> (It's certainly unusual for python's “batteries included” philosophy).
> 
> What does "batteries included" have to do with excluding data types?

Excluding data types? What do you mean?

I wasn't sure what you meant when you said about problems with linked
lists, so I was thinking that you mean something like "if one day
linked lists get their way into python standard libraries and people
will try using sum() on them..." That's why I said that such
implementation would be too skimped.

>> What I would call a linked-list is a separate type where your nodes
>> are just its internal representation.
> 
> If you make the lists and nodes separate types, and the nodes
> private, you have to create yet a third type, like the list::iterator
> type in C++

If there would be a need for it, why not? Or maybe I won't need it
(then I get something like collections.deque, a good tool by the way).

>> I don't like the idea that `a` implicitly changes when I change `b`.
> 
> Then you don't like linked lists. Saying "There is a way to make
> it work, just make it do something different, which is O(N) instead
> of O(1) and won't work with the algorithms you want to use" is not an
> answer.

That wasn't saying "just make it do something different". That was
saying "you can have linked lists in python, that are O(N) summable".

> No, you can make one operation fast, at the expense of making
> every other operation slow. That's not a good tradeoff.

In that implementation I'm making many operations fast. Basically,
I make ALL inplace-operations fast. :) As a bonus I get a guarantee
that I won't implicitly modify another variable while modifying this
one. For someone it's not a good tradeoff, for others it is.

> And again, you've also ignored the fact that, performance aside,
> it _breaks every algorithm people use mutable linked lists for_.

Those algos don't use sum(). And lists they use are not summable.
To get a problem you need different lists. So it does not matter.

>> But you CAN have a fast __iadd__ even for your simple a.next case!
>> You only need to initialize `tail` before calling sum() and update
>> it inside __iadd__
> 
> Initialize tail where exactly? In a global variable?

Wherever you want. In a global variable, or in a class variable near
'next'.

-- 


More information about the Python-ideas mailing list