[Cython] Fwd: Re: [cython-users] checking for "None" in nogil function

Mon May 7 15:04:18 CEST 2012

Dag Sverre Seljebotn, 07.05.2012 13:48:
> On 05/07/2012 01:10 PM, Stefan Behnel wrote:
>> Dag Sverre Seljebotn, 07.05.2012 12:40:
>>> On 05/07/2012 11:17 AM, Stefan Behnel wrote:
>>>> Dag Sverre Seljebotn, 07.05.2012 10:44:
>>>>> On 05/07/2012 07:48 AM, Stefan Behnel wrote:
>>>>>> I wonder why a memory view should be allowed to be None in the first
>>>>>> place.
>>>>>> Buffer arguments aren't (because they get unpacked on entry), so why
>>>>>> should memory views?
>>>>>
>>>>> ? At least when I implemented it, buffers get unpacked but the case of a
>>>>> None buffer is treated specially, and you're fully allowed (and
>>>>> segfault if you [] it).
>>>>
>>>> Hmm, ok, maybe I just got confused by the code then.
>>>>
>>>> I think the docs should state that buffer arguments are best used together
>>>> with the "not None" declaration then.
>>>
>>> I use them with "=None" default values all the time... then do a
>>> None-check manually.
>>
>> Interesting. Could you given an example? What's the advantage over letting
>> Cython raise an error for you? And, since you are using it as a default
>> argument, why would someone want to call your code entirely without a
>> buffer argument?
> 
> Here you go:
> 
> def foo(np.ndarray[double] a, np.ndarray[double] out=None):
>     if out is None:
>         out = np.empty_like(a)

Ah, right - output arguments. Hadn't thought of those.

Still, since you pass None explicitly as a default argument, this code
wouldn't be impacted by disallowing None for buffers by default. That case
is already handled specially in the compiler. But a better default would
prevent the *first* argument from being None.

So, basically, it would do the right thing straight away in your case and
generate safer and more efficient code for it, whereas now you have to test
'a' for being None explicitly and Cython won't understand that hint due to
insufficient static analysis. At least, since my last commit you can make
Cython do the same thing by declaring it "not None".

>>> It's really no different from cdef classes.
>>
>> I find it at least a bit more surprising because a buffer unpacking
>> argument is a rather strong hint that you expect something that supports
>> this protocol. The fact that you type your function argument with it hints
>> at the intention to properly unpack it on entry. I'm sure there are lots of
>> users who were or will be surprised when they realise that that doesn't
>> exclude None values.
> 
> Whereas I think there would be more users surprised by the opposite.

We've had enough complaints from users about None being allowed for typed
arguments already to consider it at least a gotcha of the language.

The main reason we didn't change this behaviour back then was that it would
clearly break user code and we thought we could do without that. That's
different from considering it "right" and "good".

>>>> And I remember that we wanted to change the default settings for extension
>>>> type arguments from "or None" to "not None" years ago but never actually
>>>> did it.
>>>
>>> I remember that there was such a debate, but I certainly don't remember
>>> that this was the conclusion :-)
>>
>> Maybe not, yes.
>>
>>
>>> I didn't agree with that view then and
>>> I don't now. I don't remember what Robert's view was...
>>>
>>> As far as I can remember (which might be biased towards my personal
>>> view), the conclusion was that we left the current semantics in place,
>>> relying on better control flow analysis to make None-checks cheaper, and
>>> when those are cheap enough, make the nonecheck directive default to
>>> True
>>
>> At least for buffer arguments, it silently corrupts data or segfaults in
>> the current state of affairs, as you pointed out. Not exactly ideal.
> 
> No different than writing to a field in a cdef class...

Hmm, aren't those None checked? At least cdef method calls are AFAIR.

I think we should really get back to the habit of making code safe first
and fast afterwards.

>> That's another reason why I see a difference between the behaviour of
>> extension types and that of buffer arguments. Buffer indexing is also way
>> more performance critical than the average method call or attribute access
>> on a cdef class.
> 
> Perhaps, but that's a bit hand-wavy to turn into a principle of language
> design? "This is performance critical, so therefore we suddenly invert the
> normal rule"?

Since when is the "normal rule" to consider performance so important that
we prefer a crash over raising an exception? That's the current state of
buffer arguments, after all, so we already inverted the "normal rule", IMHO.

> I just think we should be consistent, not have more special rules for
> buffers than we need to.

Agreed. So, would you accept that we add a None check to every buffer
indexing access now and try to eliminate them over time (or with user
interaction)?

> The intention all the time was that "np.ndarray[double]" is just a
> glorified "np.ndarray". People expect it to behave like an optimized
> "np.ndarray". If "np.ndarray" can be None, why can't "np.ndarray[double]"?

Because it uses syntax that is expected to unpack the buffer. If that
buffer doesn't exist, I'd expect an error. It's like using interfaces: I
want something here that implements the buffer interface. If it doesn't -
reject it.

Besides, I hope you are aware that your argumentation stands on the (IMHO
questionable) fact that "np.ndarray" by itself can be None by default. If
np.ndarray should not be be allowed to be None by default, why should
np.ndarray[double]? That argument works in both ways.

> BTW, with the coming of memoryviews, me and Mark talked about just
> deprecating the "mytype[...]" meaning buffers, and rather treat it as
> np.ndarray, array.array etc. being some sort of "template types". That is,
> we disallow "object[int]" and require some special declarations in the
> relevant pxd files.

Hmm, yes, it's unfortunate that we have two different types of syntax now,
one that declares the item type before the brackets and one that declares
it afterwards.

>>> (Java is sort of prior art that this can indeed be done?).
>>
>> Java was designed to have a JIT compiler underneath which handles external
>> parameters, and its compilers are way smarter than Cython. I agree that
>> there is still a lot we can do based on better static analysis, but there
>> will always be limits.
> 
> Any static analysis will be able to get you to the point of "not None" if
> the user has a manual test.

Sure. It will at least expect a fatal error at the first operation that
won't work with a None value and know that it can't be None afterwards.

> And the Python way is often to just spell
> things out rather than brevity; I think an explicit if-test is much more
> newbie friendly than "not None", "or None", etc.

... with a good default being even more pythonic. ;)

Stefan