[Numpy-discussion] Using ndarray for 2-dimensional, heterogeneous data
N. Volbers
mithrandir42 at web.de
Thu Feb 9 22:12:02 EST 2006
N. Volbers wrote:
> Hello everyone,
>
> I am re-thinking the design of my evaluation software, but I am not
> quite sure if I am doing the right decision, so let me state my problem:
>
> I am writing a simple evaluation program to read scientific (ASCII)
> data and plot it both via gnuplot and matplotlib. The data is
> typically very simple: numbers arranged in columns. Before numpy I was
> using Numeric arrays to store this data in a list of 1-dimensional
> arrays, e.g.:
>
> a = [ array([1,2,3,4]), array([2.3,17.2,19.1,22.2]) ]
>
> This layout made it very easy to add, remove or rearrange columns,
> because these were simple list operations. It also had the nice effect
> to allow different data types for different columns. However, row
> access was hard and I had to use my own iterator object to do so.
>
> When I read about heterogeneous arrays in numpy I started a new
> implementation which would store the same data as above like this:
>
> b = numpy.array( [(1,2,3,4), (2.3,17.2,19.1,22.2)],
> dtype={'names':['col1','col2'], 'formats': ['i2','f4']})
>
Sorry, I meant of course
b = numpy.array( [(1,2.3), (2, 17.2), (3, 19.1), (4, 22.2)],
dtype={'names':['col1','col2'], 'formats': ['i2','f4']})
> Row operations are much easier now, because I can use numpy's
> intrinsic capabilities. However column operations require to create a
> new array based on the old one.
>
> Now I am wondering if the use of such an array has more drawbacks that
> I am not aware of. E.g. is it possible to mask values in such an array?
>
> And is it slower to get a certain column by using b['col1'] than it
> would using a homogeneous array c and the notation c[:,0]?
>
> Does anyone else use such a data layout and can report on problems
> with it?
The mathematical operations I want to use will be limited to operations
acting on the column e.g. creating a new column = b['col1'] + b['col2']
and such. So of course I am aware of the basic difference that slicing
works different if I have an heterogeneous array due to the fact that
each row is considered a single item.
Niklas.
More information about the NumPy-Discussion
mailing list