[pypy-issue] Issue #1852: sum(arrays,[]) is slow (pypy/pypy)

Anton Dubrau issues-reply at bitbucket.org
Thu Aug 28 05:09:21 CEST 2014


New issue 1852: sum(arrays,[]) is slow
https://bitbucket.org/pypy/pypy/issue/1852/sum-arrays-is-slow

Anton Dubrau:

sum(strings,"") is very slow (unnecessarily so), but at least there's the "".join(strings) function.
when appending a list of arrays together in a similar fashion, there's no join method.

Looping over an array and growing a result array using extend is much faster, but still kind of slow (it appears it's superlinear).

The fastest way seems to be to pre-allocate a result array and copy the values into that, but it's kind of cumbersome.

I feel that sum(arrays,[]) (and incidentally sum(strings,"")) is idiomatic, and pypy should specialize on it. It can also easily avoid re-allocations in theory, compared to using extend.
Clearly, the time difference between the different approaches is unreasonable:

some profiling:

def sumExtend(arrays):
    result = []
    for a in arrays:
        result.extend(a)
    return result

def sumPreallocate(arrays):
    length = sum(len(a) for a in arrays)
    result = [0]*length
    index = 0;
    for a in arrays:
        result[index:index+len(a)] = a
        index += len(a)
    return result

>> arrays = [[i%10]*i for i in range(10000)]
>> %time result = sum(arrays,[])
cancelled after two and half hours ... may take days?

>> %time _ = sumPreallocate(arrays)
Wall time: 588 ms

>> %time _ = sumExtend(arrays)
Wall time: 1min 42s




More information about the pypy-issue mailing list