[Cython] New early-binding concept [was: CEP1000]

Thu Apr 19 13:20:38 CEST 2012

2012/4/19 Dag Sverre Seljebotn <d.s.seljebotn at astro.uio.no>:
> On 04/19/2012 10:35 AM, Vitja Makarov wrote:
>>
>> 2012/4/19 Dag Sverre Seljebotn<d.s.seljebotn at astro.uio.no>:
>>>
>>> On 04/19/2012 08:41 AM, Stefan Behnel wrote:
>>>>
>>>>
>>>> Dag Sverre Seljebotn, 18.04.2012 23:35:
>>>>>
>>>>>
>>>>> from numpy import sqrt, sin
>>>>>
>>>>> cdef double f(double x):
>>>>>     return sqrt(x * x) # or sin(x * x)
>>>>>
>>>>> Of course, here one could get the pointer in the module at import time.
>>>>
>>>>
>>>>
>>>> That optimisation would actually be very worthwhile all by itself. I
>>>> mean,
>>>> we know what signatures we need for globally imported functions
>>>> throughout
>>>> the module, so we can reduce the call to a single jump through a
>>>> function
>>>> pointer (although likely with a preceding NULL check, which the branch
>>>> prediction would be happy to give us for free). At least as long as sqrt
>>>> is
>>>> not being reassigned, but that should hit the 99% case.
>>>>
>>>>
>>>>> However, here:
>>>>>
>>>>> from numpy import sqrt
>>>
>>>
>>>
>>> Correction: "import numpy as np"
>>>
>>>>>
>>>>> cdef double f(double x):
>>>>>     return np.sqrt(x * x) # or np.sin(x * x)
>>>>>
>>>>> the __getattr__ on np sure is larger than any effect we discuss.
>>>>
>>>>
>>>>
>>>> Yes, that would have to stay a .pxd case, I guess.
>>>
>>>
>>>
>>> How about this mini-CEP:
>>>
>>> Modules are allowed to specify __nomonkey__ (or __const__, or
>>> __notreassigned__), a list of strings naming module-level variables where
>>> "we don't hold you responsible if you assume no monkey-patching of
>>> these".
>>>
>>> When doing "import numpy as np", then (assuming "np" is never reassigned
>>> in
>>> the module), at import time we check all names looked up from it in
>>> __nomonkey__, and if so treat them as "from numpy import sqrt as
>>> 'np.sqrt'",
>>> i.e. the "np." is just a namespace mechanism.
>>>
>>> Needs a bit more work, it ignores the possibility that others could
>>> monkey-patch "np" in the Cython module.
>>>
>>> Problem with .pxd is that currently you need to pick one overload
>>> (np.sqrt
>>> works for n-dimensional arrays too, or takes a list and returns an
>>> array).
>>> And even after adding 3-4 language features to Cython to make this work,
>>> you're stuck with having to reimplement parts of NumPy in the pxd files
>>> just
>>> so that you can early bind from Cython.
>>>
>>
>> Sorry, I'm a bit late.
>>
>> When should __nomonkey__ be checked at compile time or at import time?
>
>
> At import time. At compile time we generate one (potential) function pointer
> per call-signature we might try. At import time we fill them in if they are
> in __nomonkey__ using CEP 1000. At call time we likely() around the pointer
> being non-empty, since the cost of a dict-lookup is so large anyway.
>
>
>> It seems to me that compiler must guess function signature at compile
>> time. And then check it at runtime.
>
>
> Yes. Just like Fortran 77, where you don't declare functions, just call
> them.At least with Cython it'll just go slower if you get them wrong, we
> won't get a crash :-)
>
> If you want to help the compiler along explicitly, you would instead do
> something like
>
> cdef double (*sqrt_double)(double)
> from numpy import sqrt
>
>
>> What if integer signature is guessed for sqrt() based on the argument
>> type sqrt(16) should this call fallback to PyObject_Call() or cast an
>> integer to a double at some point?
>
>
> a) np.sqrt could export functions for all basic types (this is how NumPy
> currently works under the hood anyway)
>
> b) It doesn't help here, but I also imagine Cython doing a 3-step or 4-step
> call down the line:
>
> - Direct call using the types given.
> - Promote all scalars to 64 bit, try again.
> [- (Optional if an FFI library or LLVM is available): Parse signature string
> of function and build call dynamically using a FFI library)]
> - Python call
>
> Without an FFI library, I think giving the user a speedup if he/she writes
> sqrt(3.) rather than sqrt(3) is fine...
>
> c) An optimize-for-host-libraries compilation flag could of course just
> probe for the signatures, similar to profile-guided optimization.
>

Ok, np.sqrt() supports different signatures how would cython know
which C-function to use?

>
>> I've tried to implement trivial approach for CyFunction. Trivial means
>> that function accepts PyObjects as arguments and returns an PyObject,
>> so trivial signature is only one integer: 1 + len(args). If signature
>> match occurs dirct C-function is called and PyObject_Call() is used
>> otherwise. I didn't succeed because of argument cloning problems, we
>> discussed before.
>>
>> About dict lookups: it's possible to speedup dict lookup by a constant
>> key if we have access to dict's internal implementation. I've
>> implemented it for module-level lookups here:
>>
>>
>> https://github.com/vitek/cython/commit/1d134fe54a74e6fc6d39d09973db499680b2a8d9
>>
>> And it gave 4 times speed up for dummy test:
>>
>> def foo():
>>     cdef int i, r = 0
>>     o = foo
>>     for i in range(10000000):
>>         if o is foo:
>>             r += 1
>>
>> %timeit foo()
>> 1 loops, best of 3: 229 ms per loop
>>
>> %timeit foo_optimized()
>> 10 loops, best of 3: 54.1 ms per loop
>>
>
> Cool! Am I right that that translates to 5.4 ns? That's pretty good. (What
> CPU did you use?)
>

Yes, the test was run on Intel I3 3.2Ghz

> Still, a function pointer call done at import time appears to be roughly 1
> ns, so a full sqrt bound the way I proposed would be 2-3 ns, so 5.4ns in
> addition still relatively large.
>
> But, it does mean that __nomonkey__, if not completely invalid, is perhaps
> not exactly high-priority. For a JIT that would consult it at compile time
> the gain would be higher though.
>
>

-- 
vitja.