[Cython] New early-binding concept [was: CEP1000]

Fri Apr 20 08:58:18 CEST 2012

On Thu, Apr 19, 2012 at 11:49 PM, Dag Sverre Seljebotn
<d.s.seljebotn at astro.uio.no> wrote:
> On 04/20/2012 08:21 AM, Stefan Behnel wrote:
>>
>> Robert Bradshaw, 20.04.2012 02:52:
>>>
>>> On Thu, Apr 19, 2012 at 3:53 AM, mark florisson wrote:
>>>>
>>>> On 19 April 2012 08:17, Dag Sverre Seljebotn wrote:
>>>>>
>>>>> On 04/19/2012 08:41 AM, Stefan Behnel wrote:
>>>>>>
>>>>>> Dag Sverre Seljebotn, 18.04.2012 23:35:
>>>>>>>
>>>>>>>
>>>>>>> from numpy import sqrt, sin
>>>>>>>
>>>>>>> cdef double f(double x):
>>>>>>>     return sqrt(x * x) # or sin(x * x)
>>>>>>>
>>>>>>> Of course, here one could get the pointer in the module at import
>>>>>>> time.
>>>>>>
>>>>>>
>>>>>> That optimisation would actually be very worthwhile all by itself. I
>>>>>> mean,
>>>>>> we know what signatures we need for globally imported functions
>>>>>> throughout
>>>>>> the module, so we can reduce the call to a single jump through a
>>>>>> function
>>>>>> pointer (although likely with a preceding NULL check, which the branch
>>>>>> prediction would be happy to give us for free). At least as long as
>>>>>> sqrt
>>>>>> is not being reassigned, but that should hit the 99% case.
>>>>>>
>>>>>>> However, here:
>>>>>>>
>>>>>>> from numpy import sqrt
>>>>>
>>>>> Correction: "import numpy as np"
>>>>>>>
>>>>>>>
>>>>>>> cdef double f(double x):
>>>>>>>     return np.sqrt(x * x) # or np.sin(x * x)
>>>>>>>
>>>>>>> the __getattr__ on np sure is larger than any effect we discuss.
>>>>>>
>>>>>>
>>>>>> Yes, that would have to stay a .pxd case, I guess.
>>>>>
>>>>>
>>>>> How about this mini-CEP:
>>>>>
>>>>> Modules are allowed to specify __nomonkey__ (or __const__, or
>>>>> __notreassigned__), a list of strings naming module-level variables
>>>>> where
>>>>> "we don't hold you responsible if you assume no monkey-patching of
>>>>> these".
>>>>>
>>>>> When doing "import numpy as np", then (assuming "np" is never
>>>>> reassigned in
>>>>> the module), at import time we check all names looked up from it in
>>>>> __nomonkey__, and if so treat them as "from numpy import sqrt as
>>>>> 'np.sqrt'",
>>>>> i.e. the "np." is just a namespace mechanism.
>>>>
>>>>
>>>> I like the idea. I think this could be generalized to a 'final'
>>>> keyword, that could also enable optimizations for cdef class
>>>> attributes. So you'd say
>>>>
>>>> cdef final object np
>>>> import numpy as np
>>>>
>>>> For class attributes this would tell the compiler that it will not be
>>>> rebound, which means you could check if attributes are initialized in
>>>> the initializer, or just pull such checks (as wel as bounds checks),
>>>> at least for memoryviews, out of loops, without worrying whether it
>>>> will be reassigned in the meantime.
>>>
>>>
>>> final is a nice way to describe this. If we were to introduce a new
>>> keyword, static might do as well.
>>>
>>> It seems more natural to do this in the numpy.pxd file (perhaps it
>>> could just be declared as a final object) and that would allow us to
>>> not worry about re-assignment. Cython could then try to keep that
>>> contract for any modules it compiles. (This is, however, a bit more
>>> restrictive, though one can always cimport and import modules under
>>> different names.)
>>
>>
>> However, it's actually not the module that's "final" in this regard but
>> the
>> functions it exports - *they* do not change and neither do their C
>> signatures. So the "final" modifier should stick to the functions
>> (possibly
>> declared at the "cdef extern" line), which would then allow us to resolve
>> and cache the C function pointers at import time.

Yes, I was thinking about decorating the functions, not the module.

> Are there any advantages at getting this information at compile time rather
> than import time?
>
> If you got the full signature it would be a different matter (for type
> inference etc.); you could essentially do something like
>
> cdef final double sin(double)
> cdef final float sin(float)
> cdef final double cos(double)
>
> ...and you would know types at compile-time, and get pointers for those at
> import time.
>
>
>>
>> That mimics the case of the current "final" classes and methods, where we
>> take off the method pointers at compile time. And inside of numpy.pxd is
>> the perfect place to declare this, not as part of the import.
>
>
> However,
>
> a) a __finals__ in the NumPy Python module is something the NumPy project
> can maintain, and which can be different on different releases etc. (OK,
> NumPy is special because it is so high profile, but any other library)
>
> b) a __finals__ is something PyPy, Numba, etc. could benefit from as well
>
> Of course, one doesn't exclude the other. And if a library implements
> CEP1000 + provides __finals__, it would be trivial to run a pxd generator on
> it.

This seems rather orthogonal to the CEP 1000 proposal; there are lots
of optimizations that could be done by knowing a member of an object
will not be re-assigned.

One can currently write

cdef np_sin
from numpy import sin as np_sin

which would accomplish the same thing, right?

- Robert