[Cython] Fwd: Re: [cython-users] checking for "None" in nogil function

Mon May 7 13:48:18 CEST 2012

On 05/07/2012 01:10 PM, Stefan Behnel wrote:
> Dag Sverre Seljebotn, 07.05.2012 12:40:
>> moving to dev list
>
> Makes sense.
>
>> On 05/07/2012 11:17 AM, Stefan Behnel wrote:
>>> Dag Sverre Seljebotn, 07.05.2012 10:44:
>>>> On 05/07/2012 07:48 AM, Stefan Behnel wrote:
>>>>> I wonder why a memory view should be allowed to be None in the first
>>>>> place.
>>>>> Buffer arguments aren't (because they get unpacked on entry), so why
>>>>> should memory views?
>>>>
>>>> ? At least when I implemented it, buffers get unpacked but the case of a
>>>> None buffer is treated specially, and you're fully allowed (and segfault if
>>>> you [] it).
>>>
>>> Hmm, ok, maybe I just got confused by the code then.
>>>
>>> I think the docs should state that buffer arguments are best used together
>>> with the "not None" declaration then.
>
> ... which made me realise that that wasn't even supported. I can't believe
> no-one ever reported that as a bug...
>
> https://github.com/cython/cython/commit/f2de49fd0ac82a02a070b931bf4d2dab47135d0b
>
> It's still not supported for memory views.
>
> BTW, is there a reason why we shouldn't allow a "not None" declaration for
> cdef functions? Obviously, the caller would have to do the check in that
> case. Hmm, maybe it's not that important, because None checks are best done
> at entry points from user code, which usually means Python code. It seems
> like "not None" is not supported on cpdef functions, though.
>
>
>> I use them with "=None" default values all the time... then do a
>> None-check manually.
>
> Interesting. Could you given an example? What's the advantage over letting
> Cython raise an error for you? And, since you are using it as a default
> argument, why would someone want to call your code entirely without a
> buffer argument?

Here you go:

def foo(np.ndarray[double] a, np.ndarray[double] out=None):
     if out is None:
         out = np.empty_like(a)
     # compute result in out
     return out

The pattern of handing in the memory area to write to is one of the 
fundamental basics of numerical computing; you often just can't 
implement an algorithm if the called function returns the result in a 
newly-allocated array. I can explain why that is in detail, but I'd 
rather you just trusted the testimony of somebody doing numerical 
computation...

It's just a convenience, but often (in particular when testing) it's 
incredibly convenient to not have to bother with allocating the output 
array.

Another pattern is:

def do_something(np.ndarray[double] a,
                  np.ndarray[double] sin_of_a=None):
     ...

so if your caller happened to already have computed something, the 
function uses it, but OTOH the "something" is a function of the inputs 
and can be computed on the fly. AND, sometimes it can be computed on the 
fly in ways more efficient than what the caller could have done, because 
of memory bus issues etc. etc.

Both of these can be "fixed" by a) not allowing the convenient 
shorthand, or b) declare the argument "object" first and then type it 
after the "preamble".

So the REAL reason I'm arguing this case is consistency with cdef classes.

>
>
>> It's really no different from cdef classes.
>
> I find it at least a bit more surprising because a buffer unpacking
> argument is a rather strong hint that you expect something that supports
> this protocol. The fact that you type your function argument with it hints
> at the intention to properly unpack it on entry. I'm sure there are lots of
> users who were or will be surprised when they realise that that doesn't
> exclude None values.

Whereas I think there would be more users surprised by the opposite.

So there -- we won't know who's right without actually finding some 
users. And chances are we are both right, since users are different from 
one another.

>
>
>>> And I remember that we wanted to change the default settings for extension
>>> type arguments from "or None" to "not None" years ago but never actually
>>> did it.
>>
>> I remember that there was such a debate, but I certainly don't remember
>> that this was the conclusion :-)
>
> Maybe not, yes.
>
>
>> I didn't agree with that view then and
>> I don't now. I don't remember what Robert's view was...
>>
>> As far as I can remember (which might be biased towards my personal
>> view), the conclusion was that we left the current semantics in place,
>> relying on better control flow analysis to make None-checks cheaper, and
>> when those are cheap enough, make the nonecheck directive default to
>> True
>
> At least for buffer arguments, it silently corrupts data or segfaults in
> the current state of affairs, as you pointed out. Not exactly ideal.

No different than writing to a field in a cdef class...

>
> That's another reason why I see a difference between the behaviour of
> extension types and that of buffer arguments. Buffer indexing is also way
> more performance critical than the average method call or attribute access
> on a cdef class.

Perhaps, but that's a bit hand-wavy to turn into a principle of language 
design? "This is performance critical, so therefore we suddenly invert 
the normal rule"?

I just think we should be consistent, not have more special rules for 
buffers than we need to.

The intention all the time was that "np.ndarray[double]" is just a 
glorified "np.ndarray". People expect it to behave like an optimized 
"np.ndarray". If "np.ndarray" can be None, why can't "np.ndarray[double]"?

BTW, with the coming of memoryviews, me and Mark talked about just 
deprecating the "mytype[...]" meaning buffers, and rather treat it as 
np.ndarray, array.array etc. being some sort of "template types". That 
is, we disallow "object[int]" and require some special declarations in 
the relevant pxd files.

>> (Java is sort of prior art that this can indeed be done?).
>
> Java was designed to have a JIT compiler underneath which handles external
> parameters, and its compilers are way smarter than Cython. I agree that
> there is still a lot we can do based on better static analysis, but there
> will always be limits.

Any static analysis will be able to get you to the point of "not None" 
if the user has a manual test. And the Python way is often to just spell 
things out rather than brevity; I think an explicit if-test is much more 
newbie friendly than "not None", "or None", etc.

Performance beyond that is rather theoretical for the moment.

I agree that for memoryviews that can be passed in acquired-state to 
cdef functions there is the question of eliminating an extra branch or 
so, but that is still far-fetched, and I'd rather Mark raise the issue 
if it comes an issue than the two of us bikeshedding over it.

I'll try to make this my last post to this thread, I feel we're slipping 
into Dag-and-Stefan-endless-thread territory...

Dag