[Numpy-discussion] Optimizing mean(axis=0) on a 3D array
Travis Oliphant
oliphant.travis at ieee.org
Sat Aug 26 06:26:32 EDT 2006
Martin Spacek wrote:
> Hello,
>
> I'm a bit ignorant of optimization in numpy.
>
> I have a movie with 65535 32x32 frames stored in a 3D array of uint8
> with shape (65535, 32, 32). I load it from an open file f like this:
>
> >>> import numpy as np
> >>> data = np.fromfile(f, np.uint8, count=65535*32*32)
> >>> data = data.reshape(65535, 32, 32)
>
> I'm picking several thousand frames more or less randomly from
> throughout the movie and finding the mean frame over those frames:
>
> >>> meanframe = data[frameis].mean(axis=0)
>
> frameis is a 1D array of frame indices with no repeated values in it. If
> it has say 4000 indices in it, then the above line takes about 0.5 sec
> to complete on my system. I'm doing this for a large number of different
> frameis, some of which can have many more indices in them. All this
> takes many minutes to complete, so I'm looking for ways to speed it up.
>
> If I divide it into 2 steps:
>
> >>> temp = data[frameis]
> >>> meanframe = temp.mean(axis=0)
>
> and time it, I find the first step takes about 0.2 sec, and the second
> takes about 0.3 sec. So it's not just the mean() step, but also the
> indexing step that's taking some time.
>
If frameis is 1-D, then you should be able to use
temp = data.take(frameis,axis=0)
for the first step. This can be quite a bit faster (and is a big
reason why take is still around). There are several reasons for this
(one of which is that index checking is done over the entire list when
using indexing).
-Travis
More information about the NumPy-Discussion
mailing list