[Numpy-discussion] Views of a different dtype

Thu Jan 29 11:33:52 EST 2015

Hi Jamie,

I'm not sure whether to reply here or on github!

I have a comment on the condition "There must be the same total number
of objects before and after".

I originally looked into this to solve
https://github.com/numpy/numpy/issues/3256, which involves taking a view
of a subset of the fields of a structured array. Such a view causes the
final number objects to be less than the original number.

For example, say you have an array with three fields a,b,c, and you only
want a and c. Numpy currently does the equivalent of this (in
_index_fields in numpy/core/_internals.py):

    >>> a = zeros([(1,2,3), (4,5,6)],
    ...           dtype([('a', 'i8'), ('b', 'i8'), ('c', 'i8')]))
    >>> a.view(dtype({'names': ['x', 'y'],
    ...               'formats': ['i8', 'i8'],
    ...               'offsets': [0, 16]}))
    array([(1, 3), (4, 6)],
          dtype={'names':['x','y'], 'formats':['<i8','<i8'],
'offsets':[0,16], 'itemsize':24})

It works although the repr is a bit wordy. However, it currently does
not work if we have 'O' instead of 'i8' since we can't take views of
object arrays. If this was an object array, the resulting view only has
two objects, not the original three.

This does mean that the view can't be "undone" since the view object no
longer "knows" there is an object at offset 8. But that does not seem
bad to me compared to the usefulness of such a view.

I think to fix this, you only need to remove the test for this case in
dtype_view_is_safe, and then return all(o in old_offsets for o in
new_offsets). (and btw, I like the much cleaner logic you have in that
section!)

Allan

On 01/28/2015 07:56 PM, Jaime Fernández del Río wrote:
> HI all,
> 
> There has been some recent discussion going on on the limitations that
> numpy imposes to taking views of an array with a different dtype.
> 
> As of right now, you can basically only take a view of an array if it
> has no Python objects and neither the old nor the new dtype are
> structured. Furthermore, the array has to be either C or Fortran contiguous.
> 
> This seem to be way too strict, but the potential for disaster getting a
> loosening of the restrictions wrong is big, so it should be handled with
> care.
> 
> Allan Haldane and myself have been looking into this separately and
> discussing some of the details over at github, and we both think that
> the only true limitation that has to be imposed is that the offsets of
> Python objects within the new and old dtypes remain compatible. I have
> expanded Allan's work from here:
> 
> https://github.com/ahaldane/numpy/commit/e9ca367
> 
> to make it as flexible as I have been able. An implementation of the
> algorithm in Python, with a few tests, can be found here:
> 
> https://gist.github.com/jaimefrio/b4dae59fa09fccd9638c#file-dtype_compat-py
> 
> I would appreciate getting some eyes on it for correctness, and to make
> sure that it won't break with some weird dtype.
> 
> I am also trying to figure out what the ground rules for stride and
> shape conversions when taking a view with a different dtype should be. I
> submitted a PR (gh-5508) a couple for days ago working on that, but I am
> not so sure that the logic is completely sound. Again, to get more eyes
> on it, I am going to reproduce my thoughts here on the hope of getting
> some feedback.
> 
> The objective would be to always allow a view of a different dtype
> (given that the dtypes be compatible as described above) to be taken if:
> 
>   * The itemsize of the dtype doesn't change.
>   * The itemsize changes, but the array being viewed is the result of
>     slicing and transposing axes of a contiguous array, and it is still
>     contiguous, defined as stride == dtype.itemsize, along its
>     smallest-strided dimension, and the itemsize of the newtype exactly
>     divides the size of that dimension.
>   * Ideally taking a view should be a reversible process, i.e. if
>     oldtype = arr.dtype, then doing arr.view(newtype).view(oldtype)
>     should give you back a view of arr with the same original shape,
>     strides and dtype.
> 
> This last point can get tricky if the minimal stride dimension has size
> 1, as there could be several of those, e.g.:
> 
>     >>> a = np.ones((3, 4, 1), dtype=float)[:, :2, :].transpose(0, 2, 1)
>     >>> a.flags.contiguous
>     False
>     >>> a.shape
>     (3, 1, 2)
>     >>> a.strides  # the stride of the size 1 dimension could be
>     anything, ignore it!
>     (32, 8, 8)
> 
>     b = a.view(complex)  # this fails right now, but should work
>     >>> b.flags.contiguous
>     False
>     >>> b.shape
>     (3, 1, 1)
>     >>> b.strides  # the stride of the size 1 dimensions could be
>     anything, ignore them!
>     (32, 16, 16)
> 
>     c = b.view(float)  # which of the two size 1 dimensions should we
>     expand?
> 
> 
> "In the face of ambiguity refuse the temptation to guess" dictates that
> last view should raise an error, unless we agree and document some
> default. Any thoughts?
> 
> Then there is the endless complication one could get into with arrays
> created with as_strided. I'm not smart enough to figure when and when
> not those could work, but am willing to retake the discussion if someone
> wiser si interested.
> 
> With all these in mind, my proposal for the new behavior is that taking
> a view of an array with a different dtype would require:
> 
>  1. That the newtype and oldtype be compatible, as defined by the
>     algorithm checking object offsets linked above.
>  2. If newtype.itemsize == oldtype.itemsize no more checks are needed,
>     make it happen!
>  3. If the array is C/Fortran contiguous, check that the size in bytes
>     of the last/first dimension is evenly divided by newtype.itemsize.
>     If it does, go for it.
>  4. For non-contiguous arrays:
>      1. Ignoring dimensions of size 1, check that no stride is smaller
>         than either oldtype.itemsize or newtype.itemsize. If any is
>         found this is an as_strided product, sorry, can't do it!
>      2. Ignoring dimensions of size 1, find a contiguous dimension, i.e.
>         stride == oldtype.itemsize
>          1. If found, check that it is the only one with that stride,
>             that it is the minimal stride, and that the size in bytes of
>             that dimension is evenly divided by newitem,itemsize.
>          2. If none is found, check if there is a size 1 dimension that
>             is also unique (unless we agree on a default, as mentioned
>             above) and that newtype.itemsize evenly divides
>             oldtype.itemsize.
> 
> Apologies for the long, dense content, but any thought or comments are
> very welcome.
> 
> Jaime
> 
> -- 
> (\__/)
> ( O.o)
> ( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus
> planes de dominación mundial.
> 
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>