[Numpy-discussion] An alternative to vectorize that lets you access the array?

Sun Jul 12 10:01:32 EDT 2020

On Sun, 2020-07-12 at 16:00 +0300, Ram Rachum wrote:
> Hi everyone,
> 
> Here's a problem I've been dealing with. I wonder whether NumPy has a
> tool
> that will help me, or whether this could be a useful feature request.
> 
> In the upcoming EuroPython 20200, I'll do a talk about live-coding a
> music
> synthesizer. It's going to be a fun talk, I'll use the sounddevice
> <https://github.com/spatialaudio/python-sounddevice/> module to make
> a
> program that plays music. Do attend, or watch it on YouTube when it's
> out :)
> 

Sounds like a fun talk :).

> There's a part in my talk that I could make simpler, and thus shave
> 3-4
> minutes of cumbersome explanations. These 3-4 minutes matter a great
> deal
> to me. But for that I need to do something with NumPy and I don't
> know
> whether it's possible or not.
> 
> 
> The sounddevice library takes an ndarray of sound data and plays it.
> Currently I use `vectorize` to produce that array:
> 
>     output_array = np.vectorize(f, otypes='d')(input_array)
> 
> And I'd like to replace it with this code, which is supposed to give
> the
> same output:
> 
>     output_array = np.ndarray(input_array.shape, dtype='d')

Maybe use `np.empty(inpyt_array.shape, dtype="d")` instead.
`np.ndarray` works but is pretty low-level, and I would usually avoid
it for array creation.

>     for i, item in enumerate(input_array):
>         output_array[i] = f(item)
> 

Ok, one hack that you can try, is to replace `item` with `item.item()`,
that will convert the NumPy scalar to a Python scalar, which is quite a
lot more lightweight and faster.  Also it might give PyPy more chance
to optimize `f` I suppose.

> The reason I want the second version is that I can then have
> sounddevice
> start playing `output_array` in a separate thread, while it's being
> calculated. (Yes, I know about the GIL, I believe that sounddevice
> releases
> it.)

`np.vectorize` will definitely not release the GIL, this loop may in
between (I am not sure), but also adds quite a bit of overheads
compared to `vectorize`.  The best thing of course would be if you can
rewrite `f` to accept an array?

> Unfortunately, the for loop is very slow, even when I'm not
> processing the
> data on separate thread. I benchmarked it on both CPython and PyPy3,
> which
> is my target platform. On CPython it's 3 times slower than vectorize,
> and
> on PyPy3 it's 67 times slower than vectorize! That's despite the fact
> that
> the Numpy documentation says "The `vectorize` function is provided
> primarily for convenience, not for performance. The implementation is
> essentially a `for` loop."

PyPy is nice because it makes NumPy just work. Unfortunately, that also
adds some overheads, so at least some slowdown is probably expected.  I
am not sure about why it is so much.
I would not be surprised if a list comprehension is not much faster,
especially on PyPy (assuming you cannot modify `f` to work with
arrays).

> So here are a few questions:
> 
> 1. Is there something like `vectorize`, except you get to access the
> output
> array before it's finished? If not, what do you think about adding
> that as
> an option to `vectorize`?

vectorize should allow an `out=` argument to pass in the output array,
would that help you?  So you can access it, but I am not sure how that
will help you.  Although you could create a big result array and then
access chunks of it:

   final_arr = np.empty(...)
   newly_written = slice(0, 1000)
   run_calculation(final_arr[newly_written])

where newly_written is defined by the input chunk you got, I suppose.

> 
> 2. Is there a more efficient way of writing the `for` loop I've
> written
> above? Or any other kind of solution to my 

As said, the main thing would be to modify `f` in whatever way
possible.  For that it would be useful to know what `f` does exactly.
Maybe you can move `f` to Cython or numba, or maybe write in a way that
works on arrays...

> 
> Thanks for your help,
> Ram Rachum.
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200712/7d2c63bb/attachment-0001.sig>