[Numpy-discussion] Proposal: NEP 41 -- First step towards a new Datatype System

Ryan May rmay31 at gmail.com
Tue Mar 17 19:34:07 EDT 2020


On Tue, Mar 17, 2020 at 4:35 PM Chris Meyer <cmeyer1969 at gmail.com> wrote:

> > On Mar 17, 2020, at 1:02 PM, Sebastian Berg <sebastian at sipsolutions.net>
> wrote:
> >
> > in the spirit of trying to keep this moving, can I assume that the main
> > reason for little discussion is that the actual changes proposed are
> > not very far reaching as of now?  Or is the reason that this is a
> > fairly complex topic that you need more time to think about it?
> > If it is the latter, is there some way I can help with it?  I tried to
> > minimize how much is part of this initial NEP.
>
> One reason for not responding is that it seems a lot of discussion of this
> has already taken place and this NEP is presented more as a conclusion
> summary rather than a discussion point.
>
> I implement scientific imaging software and overall this NEP looks useful.
>
> My only caveat is that I don’t think tracking physical units should be a
> primary use case. Units are fundamentally different than data types, even
> though there are libraries out there that treat them more like data types.
>

I strongly disagree. Right now, you need to implement a custom container to
handle units, which makes it exceedingly difficult to then properly
interact with other array_like objects, like dask, pandas, and xarray;
handling units is completely orthogonal to handling slicing operations,
data access, etc. so having to implement a container is overkill. Unit
information describes information about the type of each of the elements
within an array, including describing how operations between individual
elements work. This sounds exactly like a dtype to me.


> For instance, it makes sense to have the same physical unit but with
> different storage types. For instance, data with nanometer physical units
> can be stored as a float32 or as an int16 and be equally useful.
>

Yes, you would have the unit tracking as a mixin that would allow different
storage types, absolutely.


> In addition, a unit is something that is mutated by the operation. For
> instance, reducing a 2D image with physical units by a factor of two in
> each dimension produces a different unit scaling (1km/pixel goes to
> 2km/pixel); whereas cropping the center half does not (1km/pixel stays as
> 1km/pixel).
>

I'm not sure what your point is. Dtypes can change for some operations
(np.sqrt(np.arange(5)) produces a float) while staying the same for others
(e.g. addition)


> Finally, units may be different for each axis in multidimensional data.
> For instance, we want a float32 array with two dimensions with the units on
> one dimension being time and the other dimension being spatial. (3 seconds
> x 50 nm).
>

The units for an array describe the elements *within* the array, they would
have nothing to do with the dimensions. So for an array of image data, e.g.
brightness temperatures, you would have physical units (e.g. Kelvin). You
would have separate arrays of coordinates describing the spatial extent of
the data along the relevant dimensions--each of these arrays of coordinates
would have their own physical quantity information.

Ryan

-- 
Ryan May
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200317/2550a0bf/attachment.html>


More information about the NumPy-Discussion mailing list