[Numpy-discussion] moving forward around ABI/API compatibilities (was numpy 1.7.x branch)

Dag Sverre Seljebotn d.s.seljebotn at astro.uio.no
Tue Jun 26 16:35:05 EDT 2012


On 06/26/2012 05:02 PM, Travis Oliphant wrote:
>>>
>>> (I have not read the whole cython discussion yet)
>>
>> So here's the summary. It's rather complicated but also incredibly neat
>> :-) And technical details can be hidden behind a tight API.
>
> Could you provide a bit more context for this list.     I think this is an important technology concept.   I'd like to understand better how well it jives with Numba-produced APIs and how we can make use of it in NumPy.
>
> Where exactly would this be used in the NumPy API?  What would it replace?

Right. I thought I did that :-) I realize I might sometimes be too 
brief, part of the "problem" is I'm used to Cython development where I 
can start a sentence and then Mark Florisson or Robert Bradshaw can 
finish it.

I'll try to step through how PyArray_DIMS could work under a refactored 
API from a C client. To do this I gloss over some of the finer points 
etc. and just make a premature decision here and there. Almost none of 
the types or functions below already exists, I'll assume we implement 
them (I do have a good start on the reference implementation).

We'll add a new C-level slot called "numpy:SHAPE" to the ndarray type, 
and hook the PyArray_DIMS to use this slot.

Inside NumPy
------------

The PyArray_Type (?) definition changes from being a PyTypeObject to a 
PyExtensibleTypeObject, and PyExtensibleType_Ready is called instead of 
PyType_Ready. This builds the perfect lookup table etc. I'll omit the 
details.

The caller
----------

First we need some macro module initialization setup (part of NumPy 
include files):

/* lower-64-bits of md5 of "numpy:SHAPE" */
#define NPY_SHAPE_SLOT_PREHASH 0xa8cf70dc5f598f40ULL
/* hold an interned "numpy:SHAPE" string */
static char *_Npy_interned_numpy_SHAPE;

Then initialize interned key in import_array():

... import_array(...)
{
...
PyCustomSlotsInternerContext interner = PyCustomSlots_GetInterner();
_Npy_interned_numpy_SHAPE = PyCustomSlots_InternLiteral("numpy:SHAPE");
...
}

Then, let's get rid of that PyArrayObject (in the *API*; of course 
there's still some struct representing the NumPy array internally but 
its layout is no longer exposed anywhere). That means always using 
PyObject, just like the Python API does, e.g., PyDict_GetItem gets a 
PyObject even if it must be a dict. But for backwards compatability, 
let's throw in:

typedef PyObject PyArrayObject;

Now, change PyArray_Check a bit (likely/unlikely indicates branch hints, 
e.g. __builtin_expect in gcc). Some context:

typedef struct {
   char *interned_key;
   uintptr_t flags;
   void *funcptr;
} PyCustomSlot;

Then:

static inline int PyArray_Check(PyObject *arr) {
     /* "it is an array if it has the "numpy:SHAPE" slot"
        This is a bad choice of test but for simplicity... */
   if (likely(PyCustomSlots_Check(arr->ob_type)) {
     PyCustomSlot *slot;
     slot = PyCustomSlots_Find(arr->ob_type,
         NPY_SHAPE_SLOT_PREHASH, _Npy_interned_numpy_SHAPE)
     if (likely(slot != NULL)) return 1;
   }
   return 0;
}

Finally, we can write our new PyArray_DIMS:

static inline npy_intp *PyArray_DIMS(PyObject *arr) {
     PyCustomSlot *slot = PyCustomSlots_FindAssumePresent(arr->tp_base,
         NPY_SHAPE_SLOT_PREHASH);
     return (*slot->funcptr)(arr);
}

What goes on here is:

  - PyCustomSlots_Check checks whether the metaclass 
(arr->ob_type->tp_base) is the PyExtensibleType_Type, which is a class 
we agree upon by SEP

  - PyCustomSlots_Find takes the prehash of the key which through the 
parametrized hash function gives the position in the hash table. At that 
position in the PyCustomSlot array, one either finds the element (by 
comparing the interned key by pointer value), or the element is not in 
the table (so no loops or branch misses).

  - Finally, inside PyArray_DIMS we assume that PyArray_Check has 
already been called. Thus, since we know the slot is in the table, we 
can skip even the check and shave off a nanosecond.

What is replaced
----------------

Largely the macros and existing function pointers imported by 
import_array. However, some of the functions (in particular constructors 
etc.) would work just like before. Only OOP "methods" change their 
behaviour.

Compared to the macros, there should be ~4-7 ns penalty per call on my 
computer (1.9 GHz). However, compared to making PyArray_SHAPE a function 
going through the import_array function table, the cost is only a couple 
of ns.

>> Me and Robert have talked a lot about this and will move forward with it
>> for Cython. Obviously I don't expect others than me to pick it up for
>> NumPy so we'll see... I'll write up a specification document sometimes
>> over the next couple of weeks as we need that even if only for Cython.
>
> We will look forward to what you come up with.

Will keep you posted,

Dag



More information about the NumPy-Discussion mailing list