[Python-Dev] C-level duck typing

Wed May 16 22:59:51 CEST 2012

On 05/16/2012 10:24 PM, Robert Bradshaw wrote:
> On Wed, May 16, 2012 at 11:33 AM, "Martin v. Löwis"<martin at v.loewis.de>  wrote:
>>> Does this use case make sense to everyone?
>>>
>>> The reason why we are discussing this on python-dev is that we are looking
>>> for a general way to expose these C level signatures within the Python
>>> ecosystem. And Dag's idea was to expose them as part of the type object,
>>> basically as an addition to the current Python level tp_call() slot.
>>
>> The use case makes sense, yet there is also a long-standing solution already
>> to expose APIs and function pointers: the capsule objects.
>>
>> If you want to avoid dictionary lookups on the server side, implement
>> tp_getattro, comparing addresses of interned strings.
>
> Yes, that's an idea worth looking at. The point implementing
> tp_getattro to avoid dictionary lookups overhead is a good one, worth
> trying at least. One drawback is that this approach does require the
> GIL (as does _PyType_Lookup).
>
> Regarding the C function being faster than the dictionary lookup (or
> at least close enough that the lookup takes time), yes, this happens
> all the time. For example one might be solving differential equations
> and the "user input" is essentially a set of (usually simple) double
> f(double) and its derivatives.

To underline how this is performance critical to us, perhaps a full 
Cython example is useful.

The following Cython code is a real world usecase. It is not too 
contrived in the essentials, although simplified a little bit. For 
instance undergrad engineering students could pick up Cython just to 
play with simple scalar functions like this.

from numpy import sin
# assume sin is a Python callable and that NumPy decides to support
# our spec to also support getting a "double (*sinfuncptr)(double)".

# Our mission: Avoid to have the user manually import "sin" from C,
# but allow just using the NumPy object and still be fast.

# define a function to integrate
cpdef double f(double x):
     return sin(x * x) # guess on signature and use "fastcall"!

# the integrator
def integrate(func, double a, double b, int n):
     cdef double s = 0
     cdef double dx = (b - a) / n
     for i in range(n):
         # This is also a fastcall, but can be cached so doesn't
         # matter...
         s += func(a + i * dx)
     return s * dx

integrate(f, 0, 1, 1000000)

There are two problems here:

  - The "sin" global can be reassigned (monkey-patched) between each 
call to "f", no way for "f" to know. Even "sin" could do the 
reassignment. So you'd need to check for reassignment to do caching...

  - The fastcall inside of "f" is separated from the loop in 
"integrate". And since "f" is often in another module, we can't rely on 
static full program analysis.

These problems with monkey-patching disappear if the lookup is negligible.

Some rough numbers:

  - The overhead with the tp_flags hack is a 2 ns overhead (something 
similar with a metaclass, the problems are more how to synchronize that 
metaclass across multiple 3rd party libraries)

  - Dict lookup 20 ns

  - The sin function is about 35 ns. And, "f" is probably only 2-3 ns, 
and there could very easily be multiple such functions, defined in 
different modules, in a chain, in order to build up a formula.

Dag