[Numpy-discussion] draft NEP for breaking ufunc ABI in a controlled way

Nathaniel Smith njs at pobox.com
Thu Sep 24 03:20:23 EDT 2015


On Tue, Sep 22, 2015 at 7:57 AM, Antoine Pitrou <solipsis at pitrou.net> wrote:
>
> Hi,
>
> This e-mail is an attempt at proposing an API to solve Numba's needs.

Thanks!

> Attribute access
> ----------------
>
> int PyUFunc_Nin(PyUFuncObject *)
>
>   Replaces ufunc->nin.
>
> int PyUFunc_Nout(PyUFuncObject *)
>
>   Replaces ufunc->nout.
>
> int PyUFunc_Nargs(PyUFuncObject *)
>
>   Replaces ufunc->nargs.
>
> PyObject *PyUFunc_Name(PyUFuncObject *)
>
>   Replaces ufunc->name, returns a unicode object.
>   (alternative: return a const char *)

These all seem trivially supportable going forward.

> For introspection, the following would be nice too:
>
> int PyUFunc_Identity(PyFuncObject *)
>
>   Replaces ufunc->identity.

Hmm, I can imagine cases where we might want to change how this works.
(E.g. if np.dot were a ufunc then the existing identity settings
wouldn't work very well... and I have some vague memory that there
might already some delicate code in a few places because of
difficulties in defining "zero" and "one" for arbitrary dtypes.)

> const char *PyUFunc_Signature(PyUFuncObject *, int i)
>
>   Gives a pointer to the types of the i'th signature.
>   (equivalent today to &ufunc->ntypes[i * ufunc->nargs])

I assume the 'i' part isn't actually interesting here (since there's
no longer any parallel vector of function pointers accessible), and
the high-level semantics that you're looking for are "please give me
the set of signatures that have a loop defined"?

[Edit: Also, see the discussion below about integer type pointers. The
consequences here are that we can certainly provide an operation like
this, but if we do then we might be abandoning it in a few releases
(e.g. it might start telling you about only a subset of defined
signatures). So can you expand a bit on what you mean by "would be
nice" above?]

> Lifetime control
> ----------------
>
> PyObject *PyUFunc_SetObject(PyUFuncObject *, PyObject *)
>
>   Sets the ufunc's "object" to the given object.  The object has no
>   special semantics except that it is DECREF'ed when the ufunc is
>   deallocated (this is today's ufunc->obj).  The DECREF should happen
>   only after the ufunc has accessed any internal resources (since the
>   DECREF could deallocate some of those resources).

I understand why you need a "base" object like this for individual
loops, but if ufuncs start managing the ufunc-level memory buffers
internally, then is this still useful? I guess I'm curious to see an
example.

> PyObject *PyUFunc_GetObject(PyUFuncObject *)
>
>   Return the ufunc's current "object".

Oh, are you planning to actually use this to attach some arbitrary
metadata, not just attach deallocation callbacks?

> Loop registration
> -----------------
>
> int PyUFunc_RegisterLoopForSignature(
>     PyUFuncObject* ufunc,
>     PyUFuncGenericFunction function, int *arg_types,
>     void *data, PyObject *obj)
>
>   Register a loop implementation for the given arg_types (built-in
>   types, presumably). This either appends the loop to the types and
>   functions array (reallocating it if necessary), or replaces an
>   existing one with the same signature.
>
>   A copy of arg_types is done, such that the caller does not have to
>   manage its lifetime. The optional "PyObject *obj" is an object which
>   gets DECREF'ed when the loop is relinquished (for example when the
>   ufunc is destroyed, or when the loop gets replaced with another by
>   calling this function again).
>
>
> I cannot say I'm 100% sure this is sufficient, but this seems it should
> cover our current needs.
>
> Note this is a minimal proposal. For example, Numpy could instead decide
> to pass and return all argument types as PyArray_Descr pointers rather
> than raw integers, and that would probably work for us too.

Hmm, that's an interesting and tricky point, actually -- I think the
way it will work eventually is that signatures will be specified in
terms of "dtypetypes" (i.e., subclasses of dtype, rather than ints
*or* instances of dtype = PyArray_Descrs). But I guess that's just a
challenge we'll have to think about when implementing this stuff --
either it means that the new ufunc API will have to wait a bit for
more of the new dtype machinery to be ready, or we'll have to
temporarily bridge the gap with an loop registration API that takes
new-style loop callbacks but uses int signatures (and then later turn
it into a thin wrapper around the final API).

-n

-- 
Nathaniel J. Smith -- http://vorpus.org



More information about the NumPy-Discussion mailing list