[Numpy-discussion] Return item rather than scalar for user defined types

Tue Aug 26 16:27:48 EDT 2014

Hello All,

Yesterday I opened PR #4889 <https://github.com/numpy/numpy/pull/4998> to
solve a problem I have been having w.r.t. xdress and Nathaniel asked me
bring the issue up here. The PR itself is quite small (6 lines?) and is
easy to review.

The opening text of my PR is pasted below because I believe that is a
pretty good description of the issue. But briefly, pulling user defined
dtypes out of an array do not behave idiomatically because you get a numpy
scalar rather than a more representative Python object. For user-defined
dtypes - which are typically more complex and possibly stateful than the
builtin dtypes, I believe that it makes much more sense to get actual
Python representation back a la the getitem() function.

In fact, I think that this case also applies to the object dtype. However,
changing that usage would likely break downstream code and would be
inconsistent with how other builtin types are returned. In future major
versions of numpy it would be ideal if the dtypes themselves could flag how
they wished to be returned - either as a scalar or as the Python item.

Thoughts?

Be Well
Anthony

This updates what is effectively the __getitem__() method. For arrays such
that the dtype is a user defined type, you receive the return that dtype's
getitem() rather than a numpy scalar of the dtype. This allow the custom
type to present a single Python API as well as an associated dtype. It also
prevents users from having to subclass ndarray to get the appropriate
behaviour.

For example, suppose that we have a dtype representing a C++
std::vector<int> and then we had a numpy array of this dtype. From Python,
it might look like

>>> arrarray([array([0, 0, 0, 0, 0], dtype=int32),
       array([0, 1, 2, 3, 4], dtype=int32),
       array([0, 2, 4, 6, 8], dtype=int32)], dtype='xd_vector_int')

Without this PR, you'd have to do the following to access the most deeply
nested elements:

>>> arr.item(2)[4]8

This is because you cannot index a scalar:

>>> arr[2][4]IndexError: invalid index to scalar variable

With this PR, the idiomatic expression is now allowable because arr[2] is
the associated Python type:

>>> arr[2][4]8

This is a pretty big deal for xdress <http://xdress.org/> which creates
many custom dtypes and provided a Python interface into those. See
xdress/xdress#265 <https://github.com/xdress/xdress/pull/265> for what
prompted this.

Thanks for considering!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140826/f4cf22f4/attachment.html>