[Numpy-discussion] creation of ndarray with dtype=np.object : bug?

Wed Dec 3 06:17:35 EST 2014

On Wed, Dec 3, 2014 at 2:21 AM, Emanuele Olivetti <emanuele at relativita.com>
wrote:

> On 12/03/2014 04:32 AM, Ryan Nelson wrote:
> > Emanuele,
> >
> > This doesn't address your question directly. However, I wonder if you
> > could approach this problem from a different way to get what you want.
> >
> > First of all, create a "index" array and then just vstack all of your
> > arrays at once.
> >
> >
>
> Ryan,
>
> Thank you for your solution. Indeed it works. But it seems to me
> that manually creating an index and re-implementing slicing
> should be the last resort. NumPy is *great* and provides excellent
> slicing and assembling tools. For some reason, that I don't fully
> understand, when dtype=np.object the ndarray constructor
> tries to be "smart" and creates unexpected results that cannot
> be controlled.
>
> Another simple example:
> ---
> import numpy as np
> from numpy.random import rand, randint
> n_arrays = 4
> shape0_min = 2
> shape0_max = 4
> for a in range(30):
>      list_of_arrays = [rand(randint(shape0_min, shape0_max), 3) for i in
> range(n_arrays)]
>      array_of_arrays = np.array(list_of_arrays, dtype=np.object)
>      print("shape: %s" % (array_of_arrays.shape,))
> ---
> the usual output is:
> shape: (4,)
> but from time to time, when the randomly generated arrays have - by chance
> - the
> same shape, you get:
> shape: (4, 2, 3)
> which may crash your code at runtime.
>
> To NumPy developers: is there a specific reason for np.array(...,
> dtype=np.object)
> to be "smart" instead of just assembling an array with the provided
> objects?
>

The safe way to create 1D object arrays from a list is by preallocating
them, something like this:

>>> a = [np.random.rand(2, 3), np.random.rand(2, 3)]
>>> b = np.empty(len(a), dtype=object)
>>> b[:] = a
>>> b
array([ array([[ 0.124382  ,  0.04489531,  0.93864908],
       [ 0.77204758,  0.63094413,  0.55823578]]),
       array([[ 0.80151723,  0.33147467,  0.40491018],
       [ 0.09905844,  0.90254708,  0.69911945]])], dtype=object)

It's only a tad more verbose than your current code, and you can always
wrap it in a helper function if you find 2 lines of code to be too many.

As to why np.array tries to be smart, keep in mind that there are other
applications of object arrays than having stacked sequences. The following
code computes the 100-th Fibonacci number using the matrix form of the
recursion (http://en.wikipedia.org/wiki/Fibonacci_number#Matrix_form),
numpy's linear algebra capabilities, and Python's arbitrary precision ints:

>>> a = np.array([[0, 1], [1, 1]], dtype=object)
>>> np.linalg.matrix_power(a, 99)[0, 0]
135301852344706746049L

Trying to do this with any other type would result in either wrong results
due to overflow:

>>> a = np.array([[0, 1], [1, 1]])
>>> np.linalg.matrix_power(a, 99)[0, 0]
-90618175

or lost precision:

>>> a = np.array([[0, 1], [1, 1]], dtype=np.double)
>>> np.linalg.matrix_power(a, 99)[0, 0]
1.3530185234470674e+20

Jaime
-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20141203/b1e40417/attachment.html>