[Numpy-discussion] question about creating numpy arrays

Thu May 20 12:07:24 EDT 2010

On 05/20/2010 10:53 AM, Darren Dale wrote:
> [sorry, my last got cut off]
>
> On Thu, May 20, 2010 at 11:37 AM, Darren Dale<dsdale24 at gmail.com>  wrote:
>    
>> On Thu, May 20, 2010 at 10:44 AM, Benjamin Root<ben.root at ou.edu>  wrote:
>>      
>>>> I gave two counterexamples of why.
>>>>          
>>> The examples you gave aren't counterexamples.  See below...
>>>        
>> I'm not interested in arguing over semantics. I've discovered an issue
>> with how numpy deals with lists of objects that derive from ndarray,
>> and am concerned about the implications for classes that extend
>> ndarray.
>>
>>      
>>> On Wed, May 19, 2010 at 7:06 PM, Darren Dale<dsdale24 at gmail.com>  wrote:
>>>        
>>>> On Wed, May 19, 2010 at 4:19 PM,<josef.pktd at gmail.com>  wrote:
>>>>          
>>>>> On Wed, May 19, 2010 at 4:08 PM, Darren Dale<dsdale24 at gmail.com>  wrote:
>>>>>            
>>>>>> I have a question about creation of numpy arrays from a list of
>>>>>> objects, which bears on the Quantities project and also on masked
>>>>>> arrays:
>>>>>>
>>>>>>              
>>>>>>>>> import quantities as pq
>>>>>>>>> import numpy as np
>>>>>>>>> a, b = 2*pq.m,1*pq.s
>>>>>>>>> np.array([a, b])
>>>>>>>>>                    
>>>>>> array([ 12.,   1.])
>>>>>>
>>>>>> Why doesn't that create an object array? Similarly:
>>>>>>
>>>>>>              
>>>
>>> Consider the use case of a person creating a 1-D numpy array:
>>>   >  np.array([12.0, 1.0])
>>> array([ 12.,  1.])
>>>
>>> How is python supposed to tell the difference between
>>>   >  np.array([a, b])
>>> and
>>>   >  np.array([12.0, 1.0])
>>> ?
>>>
>>> It can't, and there are plenty of times when one wants to explicitly
>>> initialize a small numpy array with a few discrete variables.
>>>
>>>
>>>        
>>>>          
>>>>>>>>> m = np.ma.array([1], mask=[True])
>>>>>>>>> m
>>>>>>>>>                    
>>>>>> masked_array(data = [--],
>>>>>>              mask = [ True],
>>>>>>        fill_value = 999999)
>>>>>>
>>>>>>              
>>>>>>>>> np.array([m])
>>>>>>>>>                    
>>>>>> array([[1]])
>>>>>>
>>>>>>              
>>> Again, this is expected behavior.  Numpy saw an array of an array,
>>> therefore, it produced a 2-D array. Consider the following:
>>>
>>>   >  np.array([[12, 4, 1], [32, 51, 9]])
>>>
>>> I, as a user, expect numpy to create a 2-D array (2 rows, 3 columns) from
>>> that array of arrays.
>>>
>>>        
>>>>          
>>>>>> This has broader implications than just creating arrays, for example:
>>>>>>
>>>>>>              
>>>>>>>>> np.sum([m, m])
>>>>>>>>>                    
>>>>>> 2
>>>>>>              
>>>>>>>>> np.sum([a, b])
>>>>>>>>>                    
>>>>>> 13.0
>>>>>>
>>>>>>              
>>>
>>> If you wanted sums from each object, there are some better (i.e., more
>>> clear) ways to go about it.  If you have a predetermined number of
>>> numpy-compatible objects, say a, b, c, then you can explicitly call the sum
>>> for each one:
>>>   >  a_sum = np.sum(a)
>>>   >  b_sum = np.sum(b)
>>>   >  c_sum = np.sum(c)
>>>
>>> Which I think communicates the programmer's intention better than (for a
>>> numpy array, x, composed of a, b, c):
>>>   >  object_sums = np.sum(x)       #<--- As a numpy user, I would expect a
>>> scalar out of this, not an array
>>>
>>> If you have an arbitrary number of objects (which is what I suspect you
>>> have), then one could easily produce an array of sums (for a list, x, of
>>> numpy-compatible objects) like so:
>>>   >  object_sums = [np.sum(anObject) for anObject in x]
>>>
>>> Performance-wise, it should be no more or less efficient than having numpy
>>> somehow produce an array of sums from a single call to sum.
>>> Readability-wise, it makes more sense because when you are treating objects
>>> separately, a *list* of them is more intuitive than a numpy.array, which is
>>> more-or-less treated as a single mathematical entity.
>>>
>>> I hope that addresses your concerns.
>>>        
>> I appreciate the response, but you are arguing that it is not a
>> problem, and I'm certain that it is. It may not be numpy
>>      
> It may not be numpy's problem, I can accept that. But it is definitely
> a problem for quantities. I'm trying to determine just how big a
> problem it is. I had hoped that one day quantities might become a part
> of numpy or scipy, but this appears to be a fundamental issue and it
> makes me doubt that inclusion would be appropriate.
>
> Thank you for the suggestion about calling the sum method instead of
> numpy's function. That is a reasonable workaround.
>
> Darren
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>    
Hi,
np.array is an array creating function that numpy.array takes a 
array_like input and it *will* try to convert that input into an array. 
(This also occurs when you give np.array a masked array as an input.) 
This a 'feature' especially when you don't use the dtype argument and 
applies to any numpy function that takes array_like inputs.

I do not quantities, but you either have to get the user to use the 
appropriate quantities functions or let it remain 'user beware' when 
they do not use the appropriate functions. In the longer term you have 
to get numpy to 'do the right thing' with quantities objects.

Bruce