[Numpy-discussion] Using ndarray for 2-dimensional, heterogeneous data

N. Volbers mithrandir42 at web.de
Thu Feb 9 22:12:02 EST 2006


N. Volbers wrote:

> Hello everyone,
>
> I am re-thinking the design of my evaluation software, but I am not 
> quite sure if I am doing the right decision, so let me state my problem:
>
> I am writing a simple evaluation program to read scientific (ASCII) 
> data and plot it both via gnuplot and matplotlib. The data is 
> typically very simple: numbers arranged in columns. Before numpy I was 
> using Numeric arrays to store this data in a list of 1-dimensional 
> arrays, e.g.:
>
> a =  [ array([1,2,3,4]), array([2.3,17.2,19.1,22.2]) ]
>
> This layout made it very easy to add, remove or rearrange columns, 
> because these were simple list operations. It also had the nice effect 
> to allow different data types for different columns. However, row 
> access was hard and I had to use my own iterator object to do so.
>
> When I read about heterogeneous arrays in numpy I started a new 
> implementation which would store the same data as above like this:
>
> b = numpy.array( [(1,2,3,4), (2.3,17.2,19.1,22.2)], 
> dtype={'names':['col1','col2'], 'formats': ['i2','f4']})
>
Sorry, I meant of course

   b = numpy.array( [(1,2.3), (2, 17.2), (3, 19.1), (4, 22.2)], 
dtype={'names':['col1','col2'], 'formats': ['i2','f4']})

> Row operations are much easier now, because I can use numpy's 
> intrinsic capabilities. However column operations require to create a 
> new array based on the old one.
>
> Now I am wondering if the use of such an array has more drawbacks that 
> I am not aware of. E.g. is it possible to mask values in such an array?
>
> And is it slower to get a certain column by using b['col1'] than it 
> would using a homogeneous array c and the notation c[:,0]?
>
> Does anyone else use such a data layout and can report on problems 
> with it?

The mathematical operations I want to use will be limited to operations 
acting on the column e.g. creating a new column = b['col1'] + b['col2'] 
and such. So of course I am aware of the basic difference that slicing 
works different if I have an heterogeneous array due to the fact that 
each row is considered a single item.

Niklas.





More information about the NumPy-Discussion mailing list