[Numpy-discussion] List comprehension and loops performances with NumPy arrays

Sun Oct 8 08:16:53 EDT 2017

On Sat, 7 Oct 2017 at 16.59, Nicholas Nadeau <nicholas.nadeau at gmail.com>
wrote:

> Hi Andrea!
>
> Checkout the following SO answers for similar contexts:
> -
> https://stackoverflow.com/questions/22108488/are-list-comprehensions-and-functional-functions-faster-than-for-loops
> -
> https://stackoverflow.com/questions/30245397/why-is-list-comprehension-so-faster
>
> To better visualize the issue, I made a iPython gist (simplifying the code
> a bit): https://gist.github.com/nnadeau/3deb6f18d028009a4495590cfbbfaa40
>
> From a quick view of the disassembled code (I'm not an expert, so correct
> me if I'm wrong), list comprehension has much less overhead compared to
> iterating/looping through the pre-allocated data and building/storing each
> slice.
>

Thank you Nicholas, I suspected that the approach of using list
comprehensions was close to unbeatable, thanks for the analysis!

Andrea.

> Cheers,
>
>
>
> --
> Nicholas Nadeau, P.Eng., AVS
>
> On 7 October 2017 at 05:56, Andrea Gavana <andrea.gavana at gmail.com> wrote:
>
>> Apologies, correct timeit code this time (I had gotten the wrong shape
>> for the output matrix in the loop case):
>>
>> if __name__ == '__main__':
>>
>>     repeat = 1000
>>     items = [Item('item_%d'%(i+1)) for i in xrange(500)]
>>
>>     output = numpy.asarray([item.do_something() for item in items]).T
>>     statements = ['''
>>                   output = numpy.asarray([item.do_something() for item in
>> items]).T
>>                   ''',
>>                   '''
>>                   output = numpy.empty((8, 500))
>>                   for i, item in enumerate(items):
>>                       output[:, i] = item.do_something()
>>                   ''']
>>
>>     methods = ['List Comprehension', 'Empty plus Loop   ']
>>
>>     setup  = 'from __main__ import numpy, items'
>>
>>     for stmnt, method in zip(statements, methods):
>>
>>         elapsed = timeit.repeat(stmnt, setup=setup, number=1,
>> repeat=repeat)
>>         minv, maxv, meanv = min(elapsed), max(elapsed),
>> numpy.mean(elapsed)
>>         elapsed.sort()
>>         best_of_3 = numpy.mean(elapsed[0:3])
>>         result = numpy.asarray((minv, maxv, meanv, best_of_3))*repeat
>>
>>         print method, ': MIN: %0.2f ms , MAX: %0.2f ms , MEAN: %0.2f ms ,
>> BEST OF 3: %0.2f ms'%tuple(result.tolist())
>>
>>
>> Results are the same as before...
>>
>>
>>
>> On 7 October 2017 at 11:52, Andrea Gavana <andrea.gavana at gmail.com>
>> wrote:
>>
>>> Hi All,
>>>
>>>     I have this little snippet of code:
>>>
>>> import timeit
>>> import numpy
>>>
>>> class Item(object):
>>>
>>>     def __init__(self, name):
>>>
>>>         self.name = name
>>>         self.values = numpy.random.rand(8, 1)
>>>
>>>     def do_something(self):
>>>
>>>         sv = self.values.sum(axis=0)
>>>         array = numpy.empty((8, ))
>>>         f = numpy.dot(0.5*numpy.ones((8, )), self.values)[0]
>>>         array.fill(f)
>>>         return array
>>>
>>>
>>> In my real application, the method do_something does a bit more than
>>> that, but I believe the snippet is enough to start playing with it. What I
>>> have is a list of (on average) 500-1,000 classes Item, and I am trying to
>>> retrieve the output of do_something for each of them in a single, big 2D
>>> numpy array.
>>>
>>> My current approach is to use list comprehension like this:
>>>
>>> output = numpy.asarray([item.do_something() for item in items]).T
>>>
>>> (Note: I need the transposed of that 2D array, always).
>>>
>>> But then I though: why not preallocating the output array and make a
>>> simple loop:
>>>
>>> output = numpy.empty((500, 8))
>>> for i, item in enumerate(items):
>>>     output[i, :] = item.do_something()
>>>
>>>
>>> I was expecting this version to be marginally faster - as the previous
>>> one has to call asarray and then transpose the matrix, but I was in for a
>>> surprise:
>>>
>>> if __name__ == '__main__':
>>>
>>>     repeat = 1000
>>>     items = [Item('item_%d'%(i+1)) for i in xrange(500)]
>>>
>>>     statements = ['''
>>>                   output = numpy.asarray([item.do_something() for item
>>> in items]).T
>>>                   ''',
>>>                   '''
>>>                   output = numpy.empty((500, 8))
>>>                   for i, item in enumerate(items):
>>>                       output[i, :] = item.do_something()
>>>                   ''']
>>>
>>>     methods = ['List Comprehension', 'Empty plus Loop   ']
>>>
>>>     setup  = 'from __main__ import numpy, items'
>>>
>>>     for stmnt, method in zip(statements, methods):
>>>
>>>         elapsed = timeit.repeat(stmnt, setup=setup, number=1,
>>> repeat=repeat)
>>>         minv, maxv, meanv = min(elapsed), max(elapsed),
>>> numpy.mean(elapsed)
>>>         elapsed.sort()
>>>         best_of_3 = numpy.mean(elapsed[0:3])
>>>         result = numpy.asarray((minv, maxv, meanv, best_of_3))*repeat
>>>
>>>         print method, ': MIN: %0.2f ms , MAX: %0.2f ms , MEAN: %0.2f ms
>>> , BEST OF 3: %0.2f ms'%tuple(result.tolist())
>>>
>>>
>>> I get this:
>>>
>>> List Comprehension : MIN: 7.32 ms , MAX: 9.13 ms , MEAN: 7.85 ms , BEST
>>> OF 3: 7.33 ms
>>> Empty plus Loop    : MIN: 7.99 ms , MAX: 9.57 ms , MEAN: 8.31 ms , BEST
>>> OF 3: 8.01 ms
>>>
>>>
>>> Now, I know that list comprehensions are renowned for being insanely
>>> fast, but I though that doing asarray plus transpose would by far defeat
>>> their advantage, especially since the list comprehension is used to call a
>>> method, not to do some simple arithmetic inside it...
>>>
>>> I guess I am missing something obvious here... oh, and if anyone has
>>> suggestions about how to improve my crappy code (performance wise), please
>>> feel free to add your thoughts.
>>>
>>> Thank you.
>>>
>>> Andrea.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
>>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20171008/ef53c6d3/attachment-0001.html>