[Python-Dev] C-level duck typing

Dag Sverre Seljebotn d.s.seljebotn at astro.uio.no
Wed May 16 16:59:16 CEST 2012


On 05/16/2012 02:47 PM, Mark Shannon wrote:
> Stefan Behnel wrote:
>> Dag Sverre Seljebotn, 16.05.2012 12:48:
>>> On 05/16/2012 11:50 AM, "Martin v. Löwis" wrote:
>>>>> Agreed in general, but in this case, it's really not that easy. A C
>>>>> function call involves a certain overhead all by itself, so calling
>>>>> into
>>>>> the C-API multiple times may be substantially more costly than, say,
>>>>> calling through a function pointer once and then running over a
>>>>> returned C
>>>>> array comparing numbers. And definitely way more costly than
>>>>> running over
>>>>> an array that the type struct points to directly. We are not talking
>>>>> about
>>>>> hundreds of entries here, just a few. A linear scan in 64 bit steps
>>>>> over
>>>>> something like a hundred bytes in the L1 cache should hardly be
>>>>> measurable.
>>>> I give up, then. I fail to understand the problem. Apparently, you want
>>>> to do something with the value you get from this lookup operation, but
>>>> that something won't involve function calls (or else the function call
>>>> overhead for the lookup wouldn't be relevant).
>>> In our specific case the value would be an offset added to the
>>> PyObject*,
>>> and there we would find a pointer to a C function (together with a
>>> 64-bit
>>> signature), and calling that C function (after checking the 64 bit
>>> signature) is our final objective.
>>
>> I think the use case hasn't been communicated all that clearly yet. Let's
>> give it another try.
>>
>> Imagine we have two sides, one that provides a callable and the other
>> side
>> that wants to call it. Both sides are implemented in C, so the callee
>> has a
>> C signature and the caller has the arguments available as C data
>> types. The
>> signature may or may not match the argument types exactly (float vs.
>> double, int vs. long, ...), because the caller and the callee know
>> nothing
>> about each other initially, they just happen to appear in the same
>> program
>> at runtime. All they know is that they could call each other through
>> Python
>> space, but that would require data conversion, tuple packing, calling,
>> tuple unpacking, data unpacking, and then potentially the same thing
>> on the
>> way back. They want to avoid that overhead.
>>
>> Now, the caller needs to figure out if the callee has a compatible
>> signature. The callee may provide more than one signature (i.e. more than
>> one C call entry point), perhaps because it is implemented to deal with
>> different input data types efficiently, or perhaps because it can
>> efficiently convert them to its expected input. So, there is a
>> signature on
>> the caller side given by the argument types it holds, and a couple of
>> signature on the callee side that can accept different C data input. Then
>> the caller needs to find out which signatures there are and match them
>> against what it can efficiently call. It may even be a JIT compiler that
>> can generate an efficient call signature on the fly, given a suitable
>> signature on callee side.
>
>>
>> An example for this is an algorithm that evaluates a user provided
>> function
>> on a large NumPy array. The caller knows what array type it is operating
>> on, and the user provided function may be designed to efficiently operate
>> on arrays of int, float and double entries.
>
> Given that use case, can I suggest the following:
>
> Separate the discovery of the function from its use.
> By this I mean first lookup the function (outside of the loop)
> then use the function (inside the loop).

We would obviously do that when we can. But Cython is a compiler/code 
translator, and we don't control usecases. You can easily make up 
usecases (= Cython code people write) where you can't easily separate 
the two.

For instance, the Sage projects has hundreds of thousands of lines of 
object-oriented Cython code (NOT just array-oriented, but also graphs 
and trees and stuff), which is all based on Cython's own fast vtable 
dispatches a la C++. They might want to clean up their code and more 
generic callback objects some places.

Other users currently pass around C pointers for callback functions, and 
we'd like to tell them "pass around these nicer Python callables 
instead, honestly, the penalty is only 2 ns per call". (*Regardless* of 
how you use them, like making sure you use them in a loop where we can 
statically pull out the function pointer acquisition. Saying "this is 
only non-sluggish if you do x, y, z puts users off.)

I'm not asking you to consider the details of all that. Just to allow 
some kind of high-performance extensibility of PyTypeObject, so that we 
can *stop* bothering python-dev with specific requirements from our 
parallel universe of nearly-all-Cython-and-Fortran-and-C++ codebases :-)

Dag


More information about the Python-Dev mailing list