[SciPy-user] Dealing with Large Data Sets

Sun May 11 03:38:50 EDT 2008

Anne Archibald wrote:
> 2008/5/10 Damian Eads <eads at soe.ucsc.edu>:
>> Damian Eads wrote:
>>
>>> which perform the operations in an in-place fashion. If data.sum(axis =
>>> 2) is large, preallocate an array to store the sum,
>>>
>>>    # for summing over columns
>>>    sum_result = numpy.zeros(data.shape[0:2])
>> I meant to include
>>
>>    data **= 2
>>    np.sum(data, axis=2, out=sum_result)
>>
>> which does an in-place, element-wise exponentiate, sums over the
>> columns, and stores the result in sum_result.
> 
> What is the advantage to preallocating the result rather than letting
> sum() do the allocation?

If the computation is repeated millions of times and the sum array is 
large (100s of MBs), then it is certainly advantageous to allocate the 
sum array once than for each computation.

Damian