[Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes

Wed Jul 7 09:52:37 EDT 2010

On 07/06/2010 01:09 PM, Gael Varoquaux wrote:
> Just to give a data point, my research group and I would be very excited
> at the idea of having Fernando's data arrays in Numpy. We can't offer to
> maintain it, because we are already fairly involved in machine learning
> and neuroimaging specific code, but we would be able to rely on it more
> in our packages, and we love it!
>
> Gaël
>
> On Mon, Jul 05, 2010 at 11:31:02PM -0500, Jonathan March wrote:
>    
>>     Fernando Perez proposed a NumPy enhancement, an ndarray with named axes,
>>     prototyped as DataArray by him, Mike Trumpis, Jonathan Taylor, Matthew
>>     Brett, Kilian Koepsell and Stefan van der Walt.
>>      
>    
>>     At SciPy 2010 on July 1, Fernando convened a BOF (Birds of a Feather)
>>     discussion of this proposal.
>>      
>    
>>     The notes from this BOF can be found at:
>>     [1]http://projects.scipy.org/numpy/wiki/NdarrayWithNamedAxes
>>     (linked from the Plans section of [2]http://projects.scipy.org/numpy )
>>      
>    
>>     HELP NEEDED: Fernando does not have the resources to drive the project
>>     beyond this prototype, which already does what he needs. If this is to go
>>     anywhere, it needs people to do the work. Please step forward.
>>      
>    
>> References
>>      
>    
>>     Visible links
>>     1. http://projects.scipy.org/numpy/wiki/NdarrayWithNamedAxes
>>     2. http://projects.scipy.org/numpy
>>      
This is very interesting work especially if can be used to extend or 
replace the current record arrays (and perhaps structured arrays). If it 
can not then you really need to make a case for yet another data 
structure. Currently we will have all these unnecessary and incompatible 
hybrids rather than a single option - competition is not good.  I really 
dislike the current impasse with numpy's Matrix class and do not wish 
this to happen again. However, I am not saying that you can not create 
another scikit rather that there has to be some consideration if if is 
to go back into numpy/scipy.

As per Wes's reply in this thread, I really do think that a set of 
specific behaviors that are expected for this new data structure need to 
be agreed upon. Currently speed should not an issue until the basic 
functionality is covered. I think that there are at least the following 
concerns that people need to agree on:

1) Indexing especially related to slicing and broadcasting.
2) Joining data structures - what to do when all data structures have 
the same 'metadata' (axes, labels, dtypes) and when each of these 
differ. Also, do you allow union (so the result is includes all axes, 
labels etc present all data structures)  or intersection (keep only the 
axes and labels in common) operations?
3) How do you expect basic mathematical operations to work? For example, 
what does A +1 mean if A has different data types like strings?
4) How should this interact with the rest of numpy?

Bruce