[Cython] CEP1000: Native dispatch through callables

Fri Apr 13 14:48:54 CEST 2012

On 13 April 2012 12:59, Dag Sverre Seljebotn <d.s.seljebotn at astro.uio.no> wrote:
> On 04/13/2012 01:38 PM, Stefan Behnel wrote:
>>
>> Robert Bradshaw, 13.04.2012 12:17:
>>>
>>> On Fri, Apr 13, 2012 at 1:52 AM, Dag Sverre Seljebotn wrote:
>>>>
>>>> On 04/13/2012 01:38 AM, Robert Bradshaw wrote:
>>>>>
>>>>> Have you given any thought as to what happens if __call__ is
>>>>> re-assigned for an object (or subclass of an object) supporting this
>>>>> interface? Or is this out of scope?
>>>>
>>>>
>>>> Out-of-scope, I'd say. Though you can always write an object that
>>>> detects if
>>>> you assign to __call__...
>>
>>
>> +1 for out of scope. This is a pure C level feature.
>>
>>
>>>>> Minor nit: I don't think should_dereference is worth branching on, if
>>>>> one wants to save the allocation one can still use a variable-sized
>>>>> type and point to oneself. Yes, that's an extra dereference, but the
>>>>> memory is already likely close and it greatly simplifies the logic.
>>>>> But I could be wrong here.
>>>>
>>>>
>>>>
>>>> Those minor nits are exactly what I seek; since Travis will have the
>>>> first
>>>> implementation in numba<->SciPy, I just want to make sure that what he
>>>> does
>>>> will work efficiently work Cython.
>>>
>>>
>>> +1
>>>
>>> I have to admit building/invoking these var-arg-sized __nativecall__
>>> records seems painful. Here's another suggestion:
>>>
>>> struct {
>>>     void* pointer;
>>>     size_t signature; // compressed binary representation, 95% coverage
>
>
> Once you start passing around functions that take memory view slices as
> arguments, that 95% estimate will be off I think.
>

It kind of depends on which arguments types and how many arguments you
will allow, and whether or not collisions would be fine (which would
imply ID comparison + strcmp()).

>>>     char* long_signature; // used if signature is not representable in
>>> a size_t, as indicated by signature = 0
>>> } record;
>>>
>>> These char* could optionally be allocated at the end of the record*
>>> for optimal locality. We could even dispense with the binary
>>> signature, but having that option allows us to avoid strcmp for stuff
>>> like d)d and ffi)f.
>>
>>
>> Assuming we use literals and a const char* for the signature, the C
>> compiler would cut down the number of signature strings automatically for
>> us. And a pointer comparison is the same as a size_t comparison.
>
>
> I'll go one further: Intern Python bytes objects. It's just a PyObject*, but
> it's *required* (or just strongly encouraged) to have gone through
>
> sig = sys.modules['_nativecall']['interned_db'].setdefault(sig, sig)
>
> Obviously in a PEP you'd have a C-API function for such interning
> (completely standalone utility). Performance of interning operation itself
> doesn't matter...
>
> Unless CPython has interning features itself, like in Java? Was that present
> back in the day and then ripped out?
>
> Requiring interning is somewhat less elegant in one way, but it makes a lot
> of other stuff much simpler.
>
> That gives us
>
> struct {
>    void *pointer;
>    PyBytesObject *signature;
> } record;
>
> and then you allocate a NULL-terminated arrays of these for all the
> overloads.
>

Interesting. What I like about size_t it that it could define a
deterministic ordering, which means specializations could be stored in
a binary search tree in array form. Cython would precompute the size_t
for the specialization it needs (and maybe account for promotions as
well).

>>
>> That would only apply at a per-module level, though, so it would require
>> an
>> indirection for the signature IDs. But it would avoid a global registry.
>>
>> Another idea would be to set the signature ID field to 0 at the beginning
>> and call a C-API function to let the current runtime assign an ID>  0,
>> unique for the currently running application. Then every user would only
>> have to parse the signature once to adapt to the respective ID and could
>> otherwise branch based on it directly.
>>
>> For Cython, we could generate a static ID variable for each typed call
>> that
>> we found in the sources. When encountering a C signature on a callable,
>> either a) the ID variable is still empty (initial case), then we parse the
>> signature to see if it matches the expected signature. If it does, we
>> assign the corresponding ID to the static ID variable and issue a direct
>> call. If b) the ID field is already set (normal case), we compare the
>> signature IDs directly and issue a C call it they match. If the IDs do not
>> match, we issue a normal Python call.
>>
>>
>>>> Right... if we do some work to synchronize the types for Cython modules
>>>> generated by the same version of Cython, we're left with 3-4 types for
>>>> Cython, right? Then a couple for numba and one for f2py; so on the order
>>>> of
>>>> 10?
>>>
>>>
>>> No, I think each closure is its own type.
>>
>>
>> And that even applies to fused functions, right? They'd have one closure
>> for each type combination.
>>
>>
>>>> An alternative is do something funny in the type object to get across
>>>> the
>>>> offset-in-object information (abusing the docstring, or introduce our
>>>> own
>>>> flag which means that the type object has an additional non-standard
>>>> field
>>>> at the end).
>>>
>>>
>>> It's a hack, but the flag + non-standard field idea might just work...
>>
>>
>> Plus, it wouldn't have to stay a non-standard field. If it's accepted into
>> CPython 3.4, we could safely use it in all existing versions of CPython.
>
>
> Sounds good. Perhaps just find a single "extended", then add a new flag
> field in our payload, in case we need to extend the types object yet again
> later and run out of unused flag bits (TBD: figure out how many unused flag
> bits there are).
>
> Dag
>
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel

Maybe it would be a good idea if there was a third project that
defined this functionality in header files which projects could
include (or in case of Cython directly inject into the generated C
files). E.g. a function to check for the native interface, and a
function that given an array of signature strings and function
pointers builds the ABI information (and computes the ID), and one
that given an ID and signature string finds the right specialization.
The project should also expose a simple type system for the types we
care about, and be able to generate signature strings and IDs for
signatures.

An optimization for the common case would be to only look at the first
entry in the ABI information directly and compare that for the
non-overloaded case, and otherwise do a logarithmic lookup, with a
final fallback to calling through the Python layer.