[Numpy-discussion] Bug in numpy all() function
Robert Kern
robert.kern at gmail.com
Wed Feb 6 15:51:29 EST 2008
Anne Archibald wrote:
> On 06/02/2008, Robert Kern <robert.kern at gmail.com> wrote:
>
>>> I guess the all function doesn't know about generators?
>> Yup. It works on arrays and things it can turn into arrays by calling the C API
>> equivalent of numpy.asarray(). There's a ton of magic and special cases in
>> asarray() in order to interpret nested Python sequences as arrays. That magic
>> works fairly well when we have sequences with known lengths; it fails utterly
>> when given an arbitrary iterator of unknown length. So we punt. Unfortunately,
>> what happens then is that asarray() sees an object that it can't interpret as a
>> sequence to turn into a real array, so it makes a rank-0 array with the iterator
>> object as the value. This evaluates to True.
>>
>> It's possible that asarray() should raise an exception for generators, but it
>> would be a special case. We wouldn't be able to test for arbitrary iterables.
>
> Would it be possible for asarray() to pull out the first element from
> the iterable, make an array out of it, then assume that all other
> values out of the iterable will have the same shape (raising an error,
> of course, when they aren't)? I guess this has high foot-shooting
> potential, but is it that much worse than numpy's shpe-guessing
> generally?
I'm skeptical. Personally, it comes down to this: if you provide code that
implements this safely and efficiently without making a confusing API, I'm more
than happy to consider it for inclusion. But I'm not going to spend time trying
to write the code.
> It would be handy to be able to use an iterable to fill an array, so
> that you'd never need to store the values in anything else first:
>
> a = N.array((sin(N.pi*x/n) for x in xrange(n)))
If n is large enough that storage matters,
a = N.sin(N.linspace(0, np.pi, n))
is always faster, more memory efficient, and more readable. Remember that the
array will have to be dynamically resized as we go through the iterator. The
memory movement is going to wipe out much of the benefit of having an iterator
in the first place.
For 1D arrays, remember that we have numpy.fromiter() already, so we can test this.
In [39]: import numpy as np
In [40]: from math import sin
In [41]: n = 10
In [42]: %timeit np.fromiter((sin(np.pi*x/n) for x in xrange(n)), float)
100000 loops, best of 3: 11.5 µs per loop
In [43]: %timeit np.sin(np.linspace(0, np.pi, n))
10000 loops, best of 3: 26.1 µs per loop
In [44]: n = 100
In [45]: %timeit np.fromiter((sin(np.pi*x/n) for x in xrange(n)), float)
10000 loops, best of 3: 84 µs per loop
In [46]: %timeit np.sin(np.linspace(0, np.pi, n))
10000 loops, best of 3: 32.3 µs per loop
In [47]: n = 1000
In [48]: %timeit np.fromiter((sin(np.pi*x/n) for x in xrange(n)), float)
1000 loops, best of 3: 794 µs per loop
In [49]: %timeit np.sin(np.linspace(0, np.pi, n))
10000 loops, best of 3: 91.8 µs per loop
So, for n=10, the generator wins, but is n=10 really the case that you want to
use a generator for?
--
Robert Kern
"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
More information about the NumPy-Discussion
mailing list