[Numpy-discussion] Getting C-function pointers from Python to C

Tue Apr 10 09:10:04 EDT 2012

On 04/10/2012 03:00 PM, Nathaniel Smith wrote:
> On Tue, Apr 10, 2012 at 1:39 PM, Dag Sverre Seljebotn
> <d.s.seljebotn at astro.uio.no>  wrote:
>> On 04/10/2012 12:37 PM, Nathaniel Smith wrote:
>>> On Tue, Apr 10, 2012 at 1:57 AM, Travis Oliphant<travis at continuum.io>    wrote:
>>>> On Apr 9, 2012, at 7:21 PM, Nathaniel Smith wrote:
>>>>
>>>> ...isn't this an operation that will be performed once per compiled
>>>> function? Is the overhead of the easy, robust method (calling ctypes.cast)
>>>> actually measurable as compared to, you know, running an optimizing
>>>> compiler?
>>>>
>>>> Yes, there can be significant overhead.   The compiler is run once and
>>>> creates the function.   This function is then potentially used many, many
>>>> times.    Also, it is entirely conceivable that the "build" step happens at
>>>> a separate "compilation" time, and Numba actually loads a pre-compiled
>>>> version of the function from disk which it then uses at run-time.
>>>>
>>>> I have been playing with a version of this using scipy.integrate and
>>>> unfortunately the overhead of ctypes.cast is rather significant --- to the
>>>> point of making the code-path using these function pointers to be useless
>>>> when without the ctypes.cast overhed the speed up is 3-5x.
>>>
>>> Ah, I was assuming that you'd do the cast once outside of the inner
>>> loop (at the same time you did type compatibility checking and so
>>> forth).
>>>
>>>> In general, I think NumPy will need its own simple function-pointer object
>>>> to use when handing over raw-function pointers between Python and C.   SciPy
>>>> can then re-use this object which also has a useful C-API for things like
>>>> signature checking.    I have seen that ctypes is nice but very slow and
>>>> without a compelling C-API.
>>>
>>> Sounds reasonable to me. Probably nicer than violating ctypes's
>>> abstraction boundary, and with no real downsides.
>>>
>>>> The kind of new C-level cfuncptr object I imagine has attributes:
>>>>
>>>> void *func_ptr;
>>>> char *signature string  /* something like 'dd->d' to indicate a function
>>>> that takes two doubles and returns a double */
>>>
>>> This looks like it's setting us up for trouble later. We already have
>>> a robust mechanism for describing types -- dtypes. We should use that
>>> instead of inventing Yet Another baby type system. We'll need to
>>> convert between this representation and dtypes anyway if you want to
>>> use these pointers for ufunc loops... and if we just use dtypes from
>>> the start, we'll avoid having to break the API the first time someone
>>> wants to pass a struct or array or something.
>>
>> For some of the things we'd like to do with Cython down the line,
>> something very fast like what Travis describes is exactly what we need;
>> specifically, if you have Cython code like
>>
>> cdef double f(func):
>>      return func(3.4)
>>
>> that may NOT be called in a loop.
>>
>> But I do agree that this sounds overkill for NumPy+numba at the moment;
>> certainly for scipy.integrate where you can amortize over N function
>> samples. But Travis perhaps has a usecase I didn't think of.
>
> It sounds sort of like you're disagreeing with me but I can't tell
> about what, so maybe I was unclear :-).
>
> All I was saying was that a list-of-dtype-objects was probably a
> better way to write down a function signature than some ad-hoc string
> language. In both cases you'd do some type-compatibility-checking up
> front and then use C calling afterwards, and I don't see why
> type-checking would be faster or slower for one representation than
> the other. (Certainly one wouldn't have to support all possible dtypes
> up front, the point is just that they give us more room to grow
> later.)

My point was that with Cython you'd get cases where there is no 
"up-front", you have to check-and-call as essentially one operation. The 
Cython code above would result in something like this:

if (strcmp("dd->d", signature) == 0) {
    /* guess on signature and have fast C dispatch for exact match */
}
else {
    /* fall back to calling as Python object */
}

The strcmp would probably be inlined and unrolled, but you get the idea.

With LLVM available, and if Cython started to use it, we could generate 
more such branches on the fly, making it more attractive.

Dag