[Numpy-discussion] Attribute hiding APIs for PyArrayObject

Charles R Harris charlesr.harris at gmail.com
Wed Oct 31 19:00:52 EDT 2018


On Wed, Oct 31, 2018 at 3:59 PM Allan Haldane <allanhaldane at gmail.com>
wrote:

> On 10/30/18 5:04 AM, Matti Picus wrote:
> > TL;DR - should we revert the attribute-hiding constructs in
> > ndarraytypes.h and unify PyArrayObject_fields with PyArrayObject?
> >
> >
> > Background
> >
> >
> > NumPy 1.8 deprecated direct access to PyArrayObject fields. It made
> > PyArrayObject "opaque", and hid the fields behind a PyArrayObject_fields
> > structure
> >
> https://github.com/numpy/numpy/blob/v1.15.3/numpy/core/include/numpy/ndarraytypes.h#L659
> > with a comment about moving this to a private header. In order to access
> > the fields, users are supposed to use PyArray_FIELDNAME functions, like
> > PyArray_DATA and PyArray_NDIM. It seems there were thoughts at the time
> > that numpy might move away from a C-struct based
> >
> > underlying data structure. Other changes were also made to enum names,
> > but those are relatively painless to find-and-replace.
> >
> >
> > NumPy has a mechanism to manage deprecating APIs, C users define
> > NPY_NO_DEPRICATED_API to a desired level, say NPY_1_8_API_VERSION, and
> > can then access the API "as if" they were using NumPy 1.8. Users who do
> > not define NPY_NO_DEPRICATED_API get a warning when compiling, and
> > default to the pre-1.8 API (aliasing of PyArrayObject to
> > PyArrayObject_fields and direct access to the C struct fields). This is
> > convenient for downstream users, both since the new API does not provide
> > much added value, and it is much easier to write a->nd than
> > PyArray_NDIM(a). For instance, pandas uses direct assignment to the data
> > field for fast json parsing
> >
> https://github.com/pandas-dev/pandas/blob/master/pandas/_libs/src/ujson/python/JSONtoObj.c#L203
> > via chunks. Working around the new API in pandas would require more
> > engineering. Also, for example, cython has a mechanism to transpile
> > python code into C, mapping slow python attribute lookup to fast C
> > struct field access
> >
> https://cython.readthedocs.io/en/latest/src/userguide/extension_types.html#external-extension-types
> >
> >
> >
> > In a parallel but not really related universe, cython recently upgraded
> > the object mapping so that we can quiet the annoying "size changed"
> > runtime warning https://github.com/numpy/numpy/issues/11788 without
> > requiring warning filters, but that requires updating the numpy.pxd file
> > provided with cython, and it was proposed that NumPy actually vendor its
> > own file rather than depending on the cython one
> > (https://github.com/numpy/numpy/issues/11803).
> >
> >
> > The problem
> >
> >
> > We have now made further changes to our API. In NumPy 1.14 we changed
> > UPDATEIFCOPY to WRITEBACKIFCOPY, and in 1.16 we would like to deprecate
> > PyArray_SetNumericOps and PyArray_GetNumericOps. The strange warning
> > when NPY_NO_DEPRICATED_API is annoying. The new API cannot be supported
> > by cython without some deep surgery
> > (https://github.com/cython/cython/pull/2640). When I tried dogfooding an
> > updated numpy.pxd for the only cython code in NumPy, mtrand.pxy, I came
> > across some of these issues (https://github.com/numpy/numpy/pull/12284).
> > Forcing the new API will require downstream users to refactor code or
> > re-engineer constructs, as in the pandas example above.
>
> I haven't understood the cython issue, but just want to mention that for
> optimization purposes it's nice to be able to modify the fields, like in
> the pandas/json example above.
>
> In particular, PyArray_ConcatenateArrays uses some tricks which
> temporarily clobber the data pointer and shape of an array to
> concatenate arrays efficiently. It seems fairly safe to me. These tricks
> would be nice to re-use in a C port of the new block code we merged
> recently.
>
> Those optimizations aren't possible if only using PyArray_Object.
>
>
It's OK for numpy internals to directly access the structures, as
presumably they will be updated if anything changes. Maybe it would be
useful for Cython to have a flag like Py_LIMITED_API?

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20181031/ab2413ec/attachment.html>


More information about the NumPy-Discussion mailing list