[Python-ideas] Fast sum() for non-numbers

Sergey sergemp at mail.ru
Thu Jul 4 11:54:19 CEST 2013


On Jul 04, 2013 Steven D'Aprano wrote:

This message is long, so here's its short summary:
* Unfortunately list.extend does not look like the obvious way, and
  its slower than alternatives.
* Original patch reduces memory, not increases it, there can be no
  MemoryError because of it.
* sum() can *always* be fast! (patch and tests)
* linked list is O(n) where n is number of lists to add
* using __add__/__iadd__ for containers is a useful feature

> How about the "Obvious Way" to concatenate lists?
> new = []
> for x in seq:
>      new.extend(x)

200% slower than patched sum, 50-100% slower than both itertools and
25% faster than list comprehension. [1] It's basically not even mentioned
among most popular answers to list flattening:
  http://stackoverflow.com/questions/952914/making-a-flat-list-out-of-list-of-lists-in-python
  http://stackoverflow.com/questions/952914/making-a-flat-list-out-of-list-of-lists-in-python
Slower, less known, it's much more characters to type, hey, it's not
even one-liner! :) What makes you think this is the "Obvious Way"?

> -0 on the general idea, -1 on the specific implementation. I'd rather have 
> sum of lists be slow than risk sum of numbers raise MemoryError.

You must be misunderstanding something. Or maybe I've explained it
poorly. Numbers have different code path in sum() that my patch does
not touch. But even if it did, my patch never makes a copy of original
list, it may only reduce amount of memory used, not increase it.
There was an alternative idea (that I never implemented), suggesting
to make a copy of `start` variable, but not the list.

> sum simply cannot *always* be fast. E.g. summing tuples will still
> be slow even with your suggestion.

Yes, it can! That's the point of the original idea!

The original patch [2] optimizes lists, because it was easy to do.
But nothing stops you from optimizing other (two?) types. For example
this patch [3] optimizes lists, tuples and strings.

Basically it works by storing temporary result in a list, and then
converts it to tuple or string in one shot if needed.

Using this patch you get the O(n) sum for lists, tuples and strings
[4]. And combining it with original patch you get the fastest sum()
possible.

Even being O(n) for strings, it's slower than ''.join(), but it is
constantly slower now. I can't beat ''.join() because of function
call overhead. Internally join() converts the sequence into tuple,
thus saving a lot of calls, but using a lot of additional memory,
that's why:
  ''.join('' for _ in xrange(100000000))
eats ~1GB of RAM before giving you an empty string.

> I can't gather any enthusiasm for optimizing sum to speed up
> concatenation.

No problem, just tell what approach you think is the best and I'll
try implementing it.

> I'm hostile to any attempt to encourage people to treat sum() as
> the preferred way to concatenate large amounts of data, because
> that will surely lead them into bad habits and will end with them
> trying to sum() a lot of tuples or linked lists or something and
> getting O(n**2) performance.

Why not just make sum O(n) for everything? I've already done that for
lists, tuples and strings. As for linked lists, the point of linked
list is to insert items fast. So any decent implementation of it
should store a pointer to its head and tail, should implement a O(1)
__iadd__ using tail pointer, and thus falls under my first patch.
There's not much sence in [single] linked list if it has no __iadd__.

> Yes. If Python used & for concatenation, we wouldn't have to worry
> about sum(lists or strings or tuples) being quadratic, because people
> wouldn't call sum on lists, strings or tuples.

Heh. If Python had no sum() we wouldn't have to worry about people
using it. If Python had no lists we wouldn't have to worry about
people concatenating them. If there was no Python we wouldn't have
to worry at all. But the world would be poor without all these great
things...

Seriously, I miss add for set(). I needed it when I had a dictionary
like {x:set(...), ...} and needed a set of all the values from it.
I wanted to use sum(dict.values()), that would be easy and obvious,
but I couldn't, because set() does not support __add__. So I had
to write a few lines of loop instead of a few characters. :(

-- 

[1] Python 2.7.5 with patch, testing list.extend():
$ ./python -mtimeit --setup="x=[[1,2,3]]*10000" "sum(x,[])"
1000 loops, best of 3: 531 usec per loop

$ ./python -mtimeit --setup="x=[[1,2,3]]*10000" "[i for l in x for i in l]"
1000 loops, best of 3: 1.94 msec per loop

$ ./python -mtimeit --setup="x=[[1,2,3]]*10000" --setup="from itertools import chain" "list(chain.from_iterable(x))"
1000 loops, best of 3: 820 usec per loop

$ ./python -mtimeit --setup="x=[[1,2,3]]*10000" --setup="from itertools import chain" "list(chain(*x))"
1000 loops, best of 3: 1.03 msec per loop

$ ./python -mtimeit --setup="x=[[1,2,3]]*10000" "l = []" "for i in x: l.extend(i)"
1000 loops, best of 3: 1.53 msec per loop

[2] http://bugs.python.org/file30705/fastsum.patch

[3] http://bugs.python.org/file30769/fastsum-special.patch
    http://bugs.python.org/issue18305#msg192281

[4] Python 2.7.5, testing new fastsum-special.patch:
=== Lists ===
No patch:
  $ ./python -mtimeit --setup="x=[[1,2,3]]*10000" "sum(x,[])"
  10 loops, best of 3: 885 msec per loop
fastsum.patch:
  $ ./python -mtimeit --setup="x=[[1,2,3]]*10000" "sum(x,[])"
  1000 loops, best of 3: 524 usec per loop
fastsum-special.patch:
  $ ./python -mtimeit --setup="x=[[1,2,3]]*10000" "sum(x,[])"
  1000 loops, best of 3: 298 usec per loop
Result: 3000 times faster.

=== Tuples ===
No patch:
  $ ./python -mtimeit --setup="x=[(1,2,3)]*10000" "sum(x,())"
  10 loops, best of 3: 585 msec per loop
fastsum.patch:
  $ ./python -mtimeit --setup="x=[(1,2,3)]*10000" "sum(x,())"
  10 loops, best of 3: 585 msec per loop
fastsum-special.patch:
  $ ./python -mtimeit --setup="x=[(1,2,3)]*10000" "sum(x,())"
  1000 loops, best of 3: 536 usec per loop
Result: 1000 times faster.

=== Strings ===
No patch (just string check removed):
  $ ./python -mtimeit --setup="x=['abc']*100000" "sum(x,'')"
  10 loops, best of 3: 1.52 sec per loop
fastsum.patch (+ string check removed):
  $ ./python -mtimeit --setup="x=['abc']*100000" "sum(x,'')"
  10 loops, best of 3: 1.52 sec per loop
fastsum-special.patch
  $ ./python -mtimeit --setup="x=['abc']*100000" "sum(x,'')"
  10 loops, best of 3: 27.8 msec per loop
join:
  $ ./python -mtimeit --setup="x=['abc']*100000" "''.join(x)"
  1000 loops, best of 3: 1.66 msec per loop
Result: 50 times faster, but still constantly slower than join.


More information about the Python-ideas mailing list