[Numpy-discussion] Invalid value encoutered : how to, prevent numpy.where to do this?

Nathaniel Smith njs at pobox.com
Sat Jan 5 17:20:11 EST 2013


On Sat, Jan 5, 2013 at 10:07 PM, Eric Emsellem <eric.emsellem at eso.org> wrote:
> Thanks!
>
> This makes sense of course. And yes the operation I am trying to do is
> rather complicated so I need to rely on a prior selection.
>
> Now I would need to optimise this for large arrays and the code does go
> through these command line many many times.
>
> When I have to operate on the two different parts of the array, I guess
> just using the following is the fastest way (as you indicated) :
>
> result = np.empty_like(data)
> mask = (data == 0)
> result[mask] = 0.0
> result[~mask] = 1.0/data[~mask]
>
> But if I only need to do this on one side of the selection, I guess I
> would just do:
>
> result = np.empty_like(data)
> mask = (data != 0)
> result[mask] += 1.0 / data[mask]

Note that np.empty_like will return an array full of random memory
contents, and this will leave those random values anywhere that mask
== False. This may or may not be a problem for you.

> I have tried using three version of "mask = " with the rest of the code
> being the same:
>
> 1- mask = where(data != 0)
> 2- mask = np.where(data != 0)
> 3- mask = (data != 0)
>
> and it looks like #3 is the fastest, then #2 (20% slower) then #1 (50%
> slower than #3).
>
> I am not sure why, but Is that making sense? Or is there even a faster
> way (for large data arrays, and complicated operations)?

Yes, these should all do the same thing. And calling a function is
slower than not calling a function, and normal Python 'where' is
slower (for numpy arrays) than the numpy 'where'.

Once you can count on 1.7, using the new where= argument should be the
fastest way to do this (since it totally avoids making temporary
arrays).

-n



More information about the NumPy-Discussion mailing list