[Cython] New early-binding concept [was: CEP1000]

Vitja Makarov vitja.makarov at gmail.com
Thu Apr 19 10:35:55 CEST 2012


2012/4/19 Dag Sverre Seljebotn <d.s.seljebotn at astro.uio.no>:
> On 04/19/2012 08:41 AM, Stefan Behnel wrote:
>>
>> Dag Sverre Seljebotn, 18.04.2012 23:35:
>>>
>>> from numpy import sqrt, sin
>>>
>>> cdef double f(double x):
>>>     return sqrt(x * x) # or sin(x * x)
>>>
>>> Of course, here one could get the pointer in the module at import time.
>>
>>
>> That optimisation would actually be very worthwhile all by itself. I mean,
>> we know what signatures we need for globally imported functions throughout
>> the module, so we can reduce the call to a single jump through a function
>> pointer (although likely with a preceding NULL check, which the branch
>> prediction would be happy to give us for free). At least as long as sqrt
>> is
>> not being reassigned, but that should hit the 99% case.
>>
>>
>>> However, here:
>>>
>>> from numpy import sqrt
>
>
> Correction: "import numpy as np"
>
>>>
>>> cdef double f(double x):
>>>     return np.sqrt(x * x) # or np.sin(x * x)
>>>
>>> the __getattr__ on np sure is larger than any effect we discuss.
>>
>>
>> Yes, that would have to stay a .pxd case, I guess.
>
>
> How about this mini-CEP:
>
> Modules are allowed to specify __nomonkey__ (or __const__, or
> __notreassigned__), a list of strings naming module-level variables where
> "we don't hold you responsible if you assume no monkey-patching of these".
>
> When doing "import numpy as np", then (assuming "np" is never reassigned in
> the module), at import time we check all names looked up from it in
> __nomonkey__, and if so treat them as "from numpy import sqrt as 'np.sqrt'",
> i.e. the "np." is just a namespace mechanism.
>
> Needs a bit more work, it ignores the possibility that others could
> monkey-patch "np" in the Cython module.
>
> Problem with .pxd is that currently you need to pick one overload (np.sqrt
> works for n-dimensional arrays too, or takes a list and returns an array).
> And even after adding 3-4 language features to Cython to make this work,
> you're stuck with having to reimplement parts of NumPy in the pxd files just
> so that you can early bind from Cython.
>

Sorry, I'm a bit late.

When should __nomonkey__ be checked at compile time or at import time?

It seems to me that compiler must guess function signature at compile
time. And then check it at runtime.

What if integer signature is guessed for sqrt() based on the argument
type sqrt(16) should this call fallback to PyObject_Call() or cast an
integer to a double at some point?

I've tried to implement trivial approach for CyFunction. Trivial means
that function accepts PyObjects as arguments and returns an PyObject,
so trivial signature is only one integer: 1 + len(args). If signature
match occurs dirct C-function is called and PyObject_Call() is used
otherwise. I didn't succeed because of argument cloning problems, we
discussed before.

About dict lookups: it's possible to speedup dict lookup by a constant
key if we have access to dict's internal implementation. I've
implemented it for module-level lookups here:

https://github.com/vitek/cython/commit/1d134fe54a74e6fc6d39d09973db499680b2a8d9

And it gave 4 times speed up for dummy test:

def foo():
    cdef int i, r = 0
    o = foo
    for i in range(10000000):
        if o is foo:
            r += 1

%timeit foo()
1 loops, best of 3: 229 ms per loop

%timeit foo_optimized()
10 loops, best of 3: 54.1 ms per loop

-- 
vitja.


More information about the cython-devel mailing list