[Cython] Fwd: Re: [cython-users] checking for "None" in nogil function

Tue May 8 01:13:33 CEST 2012

On Mon, May 7, 2012 at 8:48 AM, Dag Sverre Seljebotn
<d.s.seljebotn at astro.uio.no> wrote:
> On 05/07/2012 03:04 PM, Stefan Behnel wrote:
>>
>> Dag Sverre Seljebotn, 07.05.2012 13:48:
>>
>>> Here you go:
>>>
>>> def foo(np.ndarray[double] a, np.ndarray[double] out=None):
>>>     if out is None:
>>>         out = np.empty_like(a)
>>
>>
>> Ah, right - output arguments. Hadn't thought of those.
>>
>> Still, since you pass None explicitly as a default argument, this code
>> wouldn't be impacted by disallowing None for buffers by default. That case
>> is already handled specially in the compiler. But a better default would
>> prevent the *first* argument from being None.
>>
>> So, basically, it would do the right thing straight away in your case and
>> generate safer and more efficient code for it, whereas now you have to
>> test
>> 'a' for being None explicitly and Cython won't understand that hint due to
>> insufficient static analysis. At least, since my last commit you can make
>> Cython do the same thing by declaring it "not None".
>
>
> Yes, thanks!
>
>
>>>>> It's really no different from cdef classes.
>>>>
>>>>
>>>> I find it at least a bit more surprising because a buffer unpacking
>>>> argument is a rather strong hint that you expect something that supports
>>>> this protocol. The fact that you type your function argument with it
>>>> hints
>>>> at the intention to properly unpack it on entry. I'm sure there are lots
>>>> of
>>>> users who were or will be surprised when they realise that that doesn't
>>>> exclude None values.
>>>
>>>
>>> Whereas I think there would be more users surprised by the opposite.
>>
>>
>> We've had enough complaints from users about None being allowed for typed
>> arguments already to consider it at least a gotcha of the language.
>>
>> The main reason we didn't change this behaviour back then was that it
>> would
>> clearly break user code and we thought we could do without that. That's
>> different from considering it "right" and "good".
>>
>>
>>>>>> And I remember that we wanted to change the default settings for
>>>>>> extension
>>>>>> type arguments from "or None" to "not None" years ago but never
>>>>>> actually
>>>>>> did it.
>>>>>
>>>>>
>>>>> I remember that there was such a debate, but I certainly don't remember
>>>>> that this was the conclusion :-)
>>>>
>>>>
>>>> Maybe not, yes.
>>>>
>>>>
>>>>> I didn't agree with that view then and
>>>>> I don't now. I don't remember what Robert's view was...
>>>>>
>>>>> As far as I can remember (which might be biased towards my personal
>>>>> view), the conclusion was that we left the current semantics in place,
>>>>> relying on better control flow analysis to make None-checks cheaper,
>>>>> and
>>>>> when those are cheap enough, make the nonecheck directive default to
>>>>> True
>>>>
>>>>
>>>> At least for buffer arguments, it silently corrupts data or segfaults in
>>>> the current state of affairs, as you pointed out. Not exactly ideal.
>>>
>>>
>>> No different than writing to a field in a cdef class...
>>
>>
>> Hmm, aren't those None checked? At least cdef method calls are AFAIR.
>
>
> Not at all. That's my whole point -- currently, the rule for None in Cython
> is "it's your responsibility to never do a native operation on None".
>
> I don't like that either, but that's just inherited from Pyrex (and many
> projects would get speed regressions etc.).
>
> I'm not against changing that to "we safely None-check", if done nicely --
> it's just that that should be done everywhere at once.
>
> In current master (and as far back as I can remember), this code:
>
> cdef class A:
>    cdef int field
>    cdef int method(self):
>        print self.field
> def f():
>    cdef A a = None
>    a.field = 3
>    a.method()
>
> Turns into:
>
>  __pyx_v_a = ((struct __pyx_obj_5test2_A *)Py_None);
>  __pyx_v_a->field = 3;
>  ((struct __pyx_vtabstruct_5test2_A *)
> __pyx_v_a->__pyx_vtab)->method(__pyx_v_a);
>
>
>
>
>> I think we should really get back to the habit of making code safe first
>> and fast afterwards.
>
>
> Nobody has argued otherwise for some time (since the cdivision thread I
> believe), this is all about Pyrex legacy. Guess part of the story is that
> there's lots of performance-sensitive code in SAGE using cdef classes which
> was written in Pyrex before Cython was around...

I think there's a difference between making a new feature fast instead
of safe, and introducing a (significant) performance regression to add
safety to existing code. Also, the proposed change of "or None" is
backwards incompatible, and the eventual solution (as far as I
understand it) is to switch back to allowing None (for consistency
everywhere else they occur) once we have cheap non checks in place.

We can't get around the fact that cdef classes might be None, due to
attributes (which must be initialized to something initially). Doing a
None check on every buffer access in a loop falls into the significant
performance regression, but ideally we could pull it out.

- Robert