[Numpy-discussion] Bug in numpy all() function

Wed Feb 6 15:51:29 EST 2008

Anne Archibald wrote:
> On 06/02/2008, Robert Kern <robert.kern at gmail.com> wrote:
> 
>>> I guess the all function doesn't know about generators?
>> Yup. It works on arrays and things it can turn into arrays by calling the C API
>> equivalent of numpy.asarray(). There's a ton of magic and special cases in
>> asarray() in order to interpret nested Python sequences as arrays. That magic
>> works fairly well when we have sequences with known lengths; it fails utterly
>> when given an arbitrary iterator of unknown length. So we punt. Unfortunately,
>> what happens then is that asarray() sees an object that it can't interpret as a
>> sequence to turn into a real array, so it makes a rank-0 array with the iterator
>> object as the value. This evaluates to True.
>>
>> It's possible that asarray() should raise an exception for generators, but it
>> would be a special case. We wouldn't be able to test for arbitrary iterables.
> 
> Would it be possible for asarray() to pull out the first element from
> the iterable, make an array out of it, then assume that all other
> values out of the iterable will have the same shape (raising an error,
> of course, when they aren't)? I guess this has high foot-shooting
> potential, but is it that much worse than numpy's shpe-guessing
> generally?

I'm skeptical. Personally, it comes down to this: if you provide code that 
implements this safely and efficiently without making a confusing API, I'm more 
than happy to consider it for inclusion. But I'm not going to spend time trying 
to write the code.

> It would be handy to be able to use an iterable to fill an array, so
> that you'd never need to store the values in anything else first:
> 
> a = N.array((sin(N.pi*x/n) for x in xrange(n)))

If n is large enough that storage matters,

   a = N.sin(N.linspace(0, np.pi, n))

is always faster, more memory efficient, and more readable. Remember that the 
array will have to be dynamically resized as we go through the iterator. The 
memory movement is going to wipe out much of the benefit of having an iterator 
in the first place.

For 1D arrays, remember that we have numpy.fromiter() already, so we can test this.

In [39]: import numpy as np

In [40]: from math import sin

In [41]: n = 10

In [42]: %timeit np.fromiter((sin(np.pi*x/n) for x in xrange(n)), float)
100000 loops, best of 3: 11.5 µs per loop

In [43]: %timeit np.sin(np.linspace(0, np.pi, n))
10000 loops, best of 3: 26.1 µs per loop

In [44]: n = 100

In [45]: %timeit np.fromiter((sin(np.pi*x/n) for x in xrange(n)), float)
10000 loops, best of 3: 84 µs per loop

In [46]: %timeit np.sin(np.linspace(0, np.pi, n))
10000 loops, best of 3: 32.3 µs per loop

In [47]: n = 1000

In [48]: %timeit np.fromiter((sin(np.pi*x/n) for x in xrange(n)), float)
1000 loops, best of 3: 794 µs per loop

In [49]: %timeit np.sin(np.linspace(0, np.pi, n))
10000 loops, best of 3: 91.8 µs per loop

So, for n=10, the generator wins, but is n=10 really the case that you want to 
use a generator for?

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
  that is made terrible by our own mad attempt to interpret it as though it had
  an underlying truth."
   -- Umberto Eco