[Cython] New early-binding concept [was: CEP1000]

Robert Bradshaw robertwb at gmail.com
Fri Apr 20 09:02:59 CEST 2012


On Thu, Apr 19, 2012 at 11:55 PM, Dag Sverre Seljebotn
<d.s.seljebotn at astro.uio.no> wrote:
> On 04/20/2012 08:49 AM, Dag Sverre Seljebotn wrote:
>>
>> On 04/20/2012 08:21 AM, Stefan Behnel wrote:
>>>
>>> Robert Bradshaw, 20.04.2012 02:52:
>>>>
>>>> On Thu, Apr 19, 2012 at 3:53 AM, mark florisson wrote:
>>>>>
>>>>> On 19 April 2012 08:17, Dag Sverre Seljebotn wrote:
>>>>>>
>>>>>> On 04/19/2012 08:41 AM, Stefan Behnel wrote:
>>>>>>>
>>>>>>> Dag Sverre Seljebotn, 18.04.2012 23:35:
>>>>>>>>
>>>>>>>>
>>>>>>>> from numpy import sqrt, sin
>>>>>>>>
>>>>>>>> cdef double f(double x):
>>>>>>>> return sqrt(x * x) # or sin(x * x)
>>>>>>>>
>>>>>>>> Of course, here one could get the pointer in the module at import
>>>>>>>> time.
>>>>>>>
>>>>>>>
>>>>>>> That optimisation would actually be very worthwhile all by itself.
>>>>>>> I mean,
>>>>>>> we know what signatures we need for globally imported functions
>>>>>>> throughout
>>>>>>> the module, so we can reduce the call to a single jump through a
>>>>>>> function
>>>>>>> pointer (although likely with a preceding NULL check, which the
>>>>>>> branch
>>>>>>> prediction would be happy to give us for free). At least as long
>>>>>>> as sqrt
>>>>>>> is not being reassigned, but that should hit the 99% case.
>>>>>>>
>>>>>>>> However, here:
>>>>>>>>
>>>>>>>> from numpy import sqrt
>>>>>>
>>>>>> Correction: "import numpy as np"
>>>>>>>>
>>>>>>>>
>>>>>>>> cdef double f(double x):
>>>>>>>> return np.sqrt(x * x) # or np.sin(x * x)
>>>>>>>>
>>>>>>>> the __getattr__ on np sure is larger than any effect we discuss.
>>>>>>>
>>>>>>>
>>>>>>> Yes, that would have to stay a .pxd case, I guess.
>>>>>>
>>>>>>
>>>>>> How about this mini-CEP:
>>>>>>
>>>>>> Modules are allowed to specify __nomonkey__ (or __const__, or
>>>>>> __notreassigned__), a list of strings naming module-level variables
>>>>>> where
>>>>>> "we don't hold you responsible if you assume no monkey-patching of
>>>>>> these".
>>>>>>
>>>>>> When doing "import numpy as np", then (assuming "np" is never
>>>>>> reassigned in
>>>>>> the module), at import time we check all names looked up from it in
>>>>>> __nomonkey__, and if so treat them as "from numpy import sqrt as
>>>>>> 'np.sqrt'",
>>>>>> i.e. the "np." is just a namespace mechanism.
>>>>>
>>>>>
>>>>> I like the idea. I think this could be generalized to a 'final'
>>>>> keyword, that could also enable optimizations for cdef class
>>>>> attributes. So you'd say
>>>>>
>>>>> cdef final object np
>>>>> import numpy as np
>>>>>
>>>>> For class attributes this would tell the compiler that it will not be
>>>>> rebound, which means you could check if attributes are initialized in
>>>>> the initializer, or just pull such checks (as wel as bounds checks),
>>>>> at least for memoryviews, out of loops, without worrying whether it
>>>>> will be reassigned in the meantime.
>>>>
>>>>
>>>> final is a nice way to describe this. If we were to introduce a new
>>>> keyword, static might do as well.
>>>>
>>>> It seems more natural to do this in the numpy.pxd file (perhaps it
>>>> could just be declared as a final object) and that would allow us to
>>>> not worry about re-assignment. Cython could then try to keep that
>>>> contract for any modules it compiles. (This is, however, a bit more
>>>> restrictive, though one can always cimport and import modules under
>>>> different names.)
>>>
>>>
>>> However, it's actually not the module that's "final" in this regard
>>> but the
>>> functions it exports - *they* do not change and neither do their C
>>> signatures. So the "final" modifier should stick to the functions
>>> (possibly
>>> declared at the "cdef extern" line), which would then allow us to resolve
>>> and cache the C function pointers at import time.
>>
>>
>> Are there any advantages at getting this information at compile time
>> rather than import time?
>>
>> If you got the full signature it would be a different matter (for type
>> inference etc.); you could essentially do something like
>>
>> cdef final double sin(double)
>> cdef final float sin(float)
>> cdef final double cos(double)
>
>
> In fact, "final" is sort of implied whenever a pxd is implied. The mere act
> of providing a pxd means you expect early binding to happen. So I think this
> boils down to simply allowing to resolve ABIs declared in pxd files through
> CEP 1000 instead of assuming it is a Cython module:
>
> cdef double sin(double)
> cdef double cos(double)
>
> We could first look for the Cython ABI at import time, and if that isn't
> there, fall back to CEP 1000. And in time, deprecate the Cython ABI in
> favour of CEP 1000 (and follow-up CEPs to make it complete enough).

Makes sense.

> The __nomonkey__ was something else, a proposal about a pxd-less approach.
> We can do both.

If __nomonkey__ is inspected at runtime, then the calling module would
have to opportunistically guess what might be in that list at compile
time, and still generate the lookup code just in case. I guess this
idea doesn't seem very fleshed out yet; its advantages, caveats, and
semantics are still quite fuzzy.


More information about the cython-devel mailing list