[Numpy-discussion] Accepting NEP 42 — New and extensible DTypes

Thu Oct 8 08:51:16 EDT 2020

Hi all,

after another thorough revision of NEP 42 (much thanks to Ben!), I
propose accepting the NEP, with the note that details are expected
change.

I am always happy to clarify and review the document based on feedback,
but I feel the important technical points should be very clear and
settled.
Exposing all of the proposed API may need additional detailed API
discussion. My focus is still a bit on the big picture design choices
that the NEP makes need to move forward and settle the implementation
internal to NumPy, although I am happy to discuss the details!

The title of the NEP is:

     NEP 42 — New and extensible DTypes

And available at:

     https://numpy.org/neps/nep-0042-new-dtypes.html

While enabling new user-defined DTypes is the main goal, the main work
is the internal restructure of NumPy's own DTypes necessary to allow
that.

I have pasted the "Abstract" and "Motivation and scope" section below,
which give a good overview of the issues and we are trying to address.
It is followed by the "Usage and impact" section which gives a big-
picture overview of the design.
I will refer to the full NEP for more detailed technical decisions and
explanations.

Cheers,

Sebastian

PS: In some places NEP 42 references NEP 43, for which I hope to merge
the draft soon, the current status is here:

     https://github.com/numpy/numpy/pull/16723

However, this should be mainly interested for those wishing to go into
more technical details.

***********************************************************************
*******
Abstract
***********************************************************************
*******

NumPy's dtype architecture is monolithic -- each dtype is an instance
of  a
single class. There's no principled way to expand it for new dtypes,
and the
code is difficult to read and maintain.

As :ref:`NEP 41 <NEP41>` explains, we are proposing a new architecture
that is
modular and open to user additions. dtypes will derive from a new
``DType``
class serving as the extension point for new types.
``np.dtype("float64")``
will return an instance of a ``Float64`` class, a subclass of root
class
``np.dtype``.

This NEP is one of two that lay out the design and API of this new
architecture. This NEP addresses dtype implementation; NEP 43 addresses
universal functions.

.. note::

    Details of the private and external APIs may change to reflect user
    comments and implementation constraints. The underlying principles
and
    choices should not change significantly.

***********************************************************************
*******
Motivation and scope
***********************************************************************
*******

Our goal is to allow user code to create fully featured dtypes for a
broad
variety of uses, from physical units (such as meters) to domain-
specific
representations of geometric objects. :ref:`NEP 41 <NEP41>` describes a
number
of these new dtypes and their benefits.

Any design supporting dtypes must consider:

- How shape and dtype are determined when an array is created
- How array elements are stored and accessed
- The rules for casting dtypes to other dtypes

In addition:

- We want dtypes to comprise a class hierarchy open to new types and to
  subhierarchies, as motivated in :ref:`NEP 41 <NEP41>`.

And to provide this,

- We need to define a user API.

All these are the subjects of this NEP.

- The class hierarchy, its relation to the Python scalar types, and its
  important attributes are described in `nep42_DType class`_.

- The functionality that will support dtype casting is described in
`Casting`_.

- The implementation of item access and storage, and the way shape and
dtype
  are determined when creating an array, are described in
:ref:`nep42_array_coercion`.

- The functionality for users to define their own DTypes is described
in
  `Public C-API`_.

The API here and in NEP 43 is entirely on the C side. A Python-side
version
will be proposed in a future NEP. A future Python API is expected to be
similar, but provide a more convenient API to reuse the functionality
of
existing DTypes. It could also provide shorthands to create structured
DTypes
similar to Python's
`dataclasses <https://docs.python.org/3.8/library/dataclasses.html>`_.

***********************************************************************
*******
Usage and impact
***********************************************************************
*******

We believe the few structures in this section are sufficient to
consolidate
NumPy's present functionality and also to support complex user-defined
DTypes.

The rest of the NEP fills in details and provides support for the
claim.

Again, though Python is used for illustration, the implementation is a
C API only; a
future NEP will tackle the Python API.

After implementing this NEP, creating a DType will be possible by
implementing
the following outlined DType base class,
that is further described in `nep42_DType class`_:

    class DType(np.dtype):
        type : type        # Python scalar type
        parametric : bool  # (may be indicated by superclass)

        @property
        def canonical(self) -> bool:
            raise NotImplementedError

        def ensure_canonical(self : DType) -> DType:
            raise NotImplementedError

For casting, a large part of the functionality is provided by the
"methods" stored
in ``_castingimpl``

        @classmethod
        def common_dtype(cls : DTypeMeta, other : DTypeMeta) ->
DTypeMeta:
            raise NotImplementedError

        def common_instance(self : DType, other : DType) -> DType:
            raise NotImplementedError

        # A mapping of "methods" each detailing how to cast to another
DType
        # (further specified at the end of the section)
        _castingimpl = {}

For array-coercion, also part of casting:

        def __dtype_setitem__(self, item_pointer, value):
            raise NotImplementedError

        def __dtype_getitem__(self, item_pointer, base_obj) -> object:
            raise NotImplementedError

        @classmethod
        def __discover_descr_from_pyobject__(cls, obj : object) ->
DType:
            raise NotImplementedError

        # initially private:
        @classmethod
        def _known_scalar_type(cls, obj : object) -> bool:
            raise NotImplementedError

Other elements of the casting implementation is the ``CastingImpl``:

    casting = Union["safe", "same_kind", "unsafe"]

    class CastingImpl:
        # Object describing and performing the cast
        casting : casting

        def resolve_descriptors(self, Tuple[DType] : input) ->
(casting, Tuple[DType]):
            raise NotImplementedError

        # initially private:
        def _get_loop(...) -> lowlevel_C_loop:
            raise NotImplementedError

which describes the casting from one DType to another. In
NEP 43 this ``CastingImpl`` object is used unchanged to
support universal functions.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20201008/ffda6793/attachment.sig>