[Numpy-discussion] Optimizing mean(axis=0) on a 3D array

Sat Aug 26 06:26:32 EDT 2006

Martin Spacek wrote:
> Hello,
>
> I'm a bit ignorant of optimization in numpy.
>
> I have a movie with 65535 32x32 frames stored in a 3D array of uint8 
> with shape (65535, 32, 32). I load it from an open file f like this:
>
>  >>> import numpy as np
>  >>> data = np.fromfile(f, np.uint8, count=65535*32*32)
>  >>> data = data.reshape(65535, 32, 32)
>
> I'm picking several thousand frames more or less randomly from 
> throughout the movie and finding the mean frame over those frames:
>
>  >>> meanframe = data[frameis].mean(axis=0)
>
> frameis is a 1D array of frame indices with no repeated values in it. If 
> it has say 4000 indices in it, then the above line takes about 0.5 sec 
> to complete on my system. I'm doing this for a large number of different 
> frameis, some of which can have many more indices in them. All this 
> takes many minutes to complete, so I'm looking for ways to speed it up.
>
> If I divide it into 2 steps:
>
>  >>> temp = data[frameis]
>  >>> meanframe = temp.mean(axis=0)
>
> and time it, I find the first step takes about 0.2 sec, and the second 
> takes about 0.3 sec. So it's not just the mean() step, but also the 
> indexing step that's taking some time.
>   

If frameis is 1-D, then you should be able to use

temp = data.take(frameis,axis=0) 

for the first step.   This can be quite a bit faster (and is a big 
reason why take is still around).   There are several reasons for this 
(one of which is that index checking is done over the entire list when 
using indexing). 

-Travis