[Numpy-discussion] Behavior from a change in dtype?

Thu Sep 10 09:48:08 EDT 2009

On Tue, Sep 8, 2009 at 12:53 PM, Christopher Barker
<Chris.Barker at noaa.gov> wrote:
> Skipper Seabold wrote:
>> Hmm, okay, well I came across this in trying to create a recarray like
>> data2 below, so I guess I should just combine the two questions.
>
> key to understanding this is to understand what is going on under the
> hood in numpy. Travis O. gave a nice intro in an Enthought webcast a few
> months ago -- I"m not sure if those are recorded and up on the web, but
> it's worth a look. It was also discussed int eh advanced numpy tutorial
> at SciPy this year -- and that is up on the web:
>
> http://www.archive.org/details/scipy09_advancedTutorialDay1_1
>

Thanks.  I wasn't able to watch the Enthought webcasts on Linux, but
I've seen a few of the video tutorials.  What a great resource.  I'm
really glad this came together.

>
> Anyway, here is my minimal attempt to clarify:
>
>> import numpy as np
>>
>> data = np.array([[10.75, 1, 1],[10.39, 0, 1],[18.18, 0, 1]])
>
> here we are using a standard array constructor -- it will look at the
> data you are passing in (a mixture of python floats and ints), and
> decide that they can best be represented by a numpy array of float64s.
>
> numpy arrays are essentially a pointer to a black of memory, and a bunch
> of attributes that describe how the bytes pointed to are to be
> interpreted. In this case, they are a 9 C doubles, representing a 3x3
> array of doubles.
>
>> dt = np.dtype([('var1', '<f8'), ('var2', '<i8'), ('var3', '<i8')])
>
> (NOTE: I'm on a big-endian machine, so I've used:
> dt = np.dtype([('var1', '>f8'), ('var2', '>i8'), ('var3', '>i8')])
> )
>
> This is a data type descriptor that is analogous to a C struct,
> containing a float64 and two int84s
>
>> # Doesn't work, raises TypeError: expected a readable buffer object
>> data2 = data2.view(np.recarray)
>> data2.astype(dt)
>
> I'm don't understand that error either, but recarrays are about adding
> the ability to access parts of a structured array by name, but you still
> need the dtype to specify the types and names. This does seem to work
> (though may not be giving the results you expect):
>
> In [19]: data2 = data.copy()
> In [20]: data2 = data2.view(np.recarray)
> In [21]: data2 = data2.view(dtype=dt)
>
> or, indeed in the opposite order:
>
> In [24]: data2 = data.copy()
> In [25]: data2 = data2.view(dtype=dt)
> In [26]: data2 = data2.view(np.recarray)
>
>
> So you've done two operations, one is to change the dtype -- the
> interpretation of the bytes in the data buffer, and one is to make this
> a recarray, which allows you to access the "fields" by name:
>
> In [31]: data2['var1']
> Out[31]:
> array([[ 10.75],
>        [ 10.39],
>        [ 18.18]])
>
>> # Works without error (?) with unexpected result
>> data3 = data3.view(np.recarray)
>> data3.dtype = dt
>
> that all depends what you expect! I used "view" above, 'cause I think
> there is less magic, though it's the same thing. I suppose changing the
> dtype in place like that is a tiny bit more efficient -- if you use
> .view() , you are creating a new array pointing to the same data, rather
> than changing the array in place.
>
> But anyway, the dtype describes how the bytes in the memory black are to
> be interpreted, changing it by assigning the attribute or using .view()
> changes the interpretation, but does not change the bytes themselves at
> all, so in this case, you are taking the 8 bytes representing a float64
> of value: 1.0, and interpreting those bytes as an 8 byte int -- which is
> going to give you garbage, essentially.
>
>> # One correct (though IMHO) unintuitive way
>> data = np.rec.fromarrays(data.swapaxes(1,0), dtype=dt)
>
> This is using the np.rec.fromarrays constructor to build a new record
> array with the dtype you want, the data is being converted and copied,
> it won't change the original at all:
>
> So the question remains -- is there a way to convert the floats in
> "data" to ints in place?
>

Ah, ok.  I understand roughly the above.  But, yes, this is my question.

>
> This seems to work:
> In [78]: data = np.array([[10.75, 1, 1],[10.39, 0, 1],[18.18, 0, 1]])
>
> In [79]: data[:,1:3] = data[:,1:3].astype('>i8').view(dtype='>f8')
>
> In [80]: data.dtype = dt
>
> It is making a copy of the integer data in process -- but I think that
> is required, as you are changing the value, not just the interpretation
> of the bytes. I suppose we could have a "astype_inplace" method, but
> that would only work if the two types were the same size, and I'm not
> sure it's a common enough use to be worth it.
>
> What is your real use case? I suspect that what you really should do
> here is define your dtype first, then create the array of data:
>

I have a function that eventually appends an ndarray of floats that
are 0 to 1 to a recarray, and I ran into it trying to debug.  Then I
was just curious about the modification in place.

> data = np.array([(10.75, 1, 1), (10.39, 0, 1), (18.18, 0, 1)], dtype=dt)
>
> which does require that you use tuples, rather than lists to hold the
> "structs".
>

Ah yes, I have had a bit of trouble extending my same function to
structured arrays, but that's another thread if I can't figure it out.

Thanks for the help.

Cheers,

Skipper