[Cython] Fwd: Re: [cython-users] checking for "None" in nogil function

Mon May 7 13:51:00 CEST 2012

On 05/07/2012 01:48 PM, Dag Sverre Seljebotn wrote:
> On 05/07/2012 01:10 PM, Stefan Behnel wrote:
>> Dag Sverre Seljebotn, 07.05.2012 12:40:
>>> moving to dev list
>>
>> Makes sense.
>>
>>> On 05/07/2012 11:17 AM, Stefan Behnel wrote:
>>>> Dag Sverre Seljebotn, 07.05.2012 10:44:
>>>>> On 05/07/2012 07:48 AM, Stefan Behnel wrote:
>>>>>> I wonder why a memory view should be allowed to be None in the first
>>>>>> place.
>>>>>> Buffer arguments aren't (because they get unpacked on entry), so why
>>>>>> should memory views?
>>>>>
>>>>> ? At least when I implemented it, buffers get unpacked but the case
>>>>> of a
>>>>> None buffer is treated specially, and you're fully allowed (and
>>>>> segfault if
>>>>> you [] it).
>>>>
>>>> Hmm, ok, maybe I just got confused by the code then.
>>>>
>>>> I think the docs should state that buffer arguments are best used
>>>> together
>>>> with the "not None" declaration then.
>>
>> ... which made me realise that that wasn't even supported. I can't
>> believe
>> no-one ever reported that as a bug...
>>
>> https://github.com/cython/cython/commit/f2de49fd0ac82a02a070b931bf4d2dab47135d0b
>>
>>
>> It's still not supported for memory views.
>>
>> BTW, is there a reason why we shouldn't allow a "not None" declaration
>> for
>> cdef functions? Obviously, the caller would have to do the check in that
>> case. Hmm, maybe it's not that important, because None checks are best
>> done
>> at entry points from user code, which usually means Python code. It seems
>> like "not None" is not supported on cpdef functions, though.
>>
>>
>>> I use them with "=None" default values all the time... then do a
>>> None-check manually.
>>
>> Interesting. Could you given an example? What's the advantage over
>> letting
>> Cython raise an error for you? And, since you are using it as a default
>> argument, why would someone want to call your code entirely without a
>> buffer argument?
>
> Here you go:
>
> def foo(np.ndarray[double] a, np.ndarray[double] out=None):
> if out is None:
> out = np.empty_like(a)
> # compute result in out
> return out
>
> The pattern of handing in the memory area to write to is one of the
> fundamental basics of numerical computing; you often just can't
> implement an algorithm if the called function returns the result in a
> newly-allocated array. I can explain why that is in detail, but I'd
> rather you just trusted the testimony of somebody doing numerical
> computation...
>
> It's just a convenience, but often (in particular when testing) it's
> incredibly convenient to not have to bother with allocating the output
> array.
>
> Another pattern is:
>
> def do_something(np.ndarray[double] a,
> np.ndarray[double] sin_of_a=None):
> ...
>
> so if your caller happened to already have computed something, the
> function uses it, but OTOH the "something" is a function of the inputs
> and can be computed on the fly. AND, sometimes it can be computed on the
> fly in ways more efficient than what the caller could have done, because
> of memory bus issues etc. etc.
>
> Both of these can be "fixed" by a) not allowing the convenient
> shorthand, or b) declare the argument "object" first and then type it
> after the "preamble".
>
> So the REAL reason I'm arguing this case is consistency with cdef classes.
>
>
>
>>
>>
>>> It's really no different from cdef classes.
>>
>> I find it at least a bit more surprising because a buffer unpacking
>> argument is a rather strong hint that you expect something that supports
>> this protocol. The fact that you type your function argument with it
>> hints
>> at the intention to properly unpack it on entry. I'm sure there are
>> lots of
>> users who were or will be surprised when they realise that that doesn't
>> exclude None values.
>
> Whereas I think there would be more users surprised by the opposite.
>
> So there -- we won't know who's right without actually finding some
> users. And chances are we are both right, since users are different from
> one another.
>
>>
>>
>>>> And I remember that we wanted to change the default settings for
>>>> extension
>>>> type arguments from "or None" to "not None" years ago but never
>>>> actually
>>>> did it.
>>>
>>> I remember that there was such a debate, but I certainly don't remember
>>> that this was the conclusion :-)
>>
>> Maybe not, yes.
>>
>>
>>> I didn't agree with that view then and
>>> I don't now. I don't remember what Robert's view was...
>>>
>>> As far as I can remember (which might be biased towards my personal
>>> view), the conclusion was that we left the current semantics in place,
>>> relying on better control flow analysis to make None-checks cheaper, and
>>> when those are cheap enough, make the nonecheck directive default to
>>> True
>>
>> At least for buffer arguments, it silently corrupts data or segfaults in
>> the current state of affairs, as you pointed out. Not exactly ideal.
>
> No different than writing to a field in a cdef class...

Also, I believe that in the strided case, the strides are all set to 0, 
and the data-pointer is NULL, so you will never corrupt data, you will 
always try to access *NULL and segfault.

Though If you put mode='c' and a very high index you'll corrupt data.

Dag

>
>>
>> That's another reason why I see a difference between the behaviour of
>> extension types and that of buffer arguments. Buffer indexing is also way
>> more performance critical than the average method call or attribute
>> access
>> on a cdef class.
>
> Perhaps, but that's a bit hand-wavy to turn into a principle of language
> design? "This is performance critical, so therefore we suddenly invert
> the normal rule"?
>
> I just think we should be consistent, not have more special rules for
> buffers than we need to.
>
> The intention all the time was that "np.ndarray[double]" is just a
> glorified "np.ndarray". People expect it to behave like an optimized
> "np.ndarray". If "np.ndarray" can be None, why can't "np.ndarray[double]"?
>
> BTW, with the coming of memoryviews, me and Mark talked about just
> deprecating the "mytype[...]" meaning buffers, and rather treat it as
> np.ndarray, array.array etc. being some sort of "template types". That
> is, we disallow "object[int]" and require some special declarations in
> the relevant pxd files.
>
>>> (Java is sort of prior art that this can indeed be done?).
>>
>> Java was designed to have a JIT compiler underneath which handles
>> external
>> parameters, and its compilers are way smarter than Cython. I agree that
>> there is still a lot we can do based on better static analysis, but there
>> will always be limits.
>
> Any static analysis will be able to get you to the point of "not None"
> if the user has a manual test. And the Python way is often to just spell
> things out rather than brevity; I think an explicit if-test is much more
> newbie friendly than "not None", "or None", etc.
>
> Performance beyond that is rather theoretical for the moment.
>
> I agree that for memoryviews that can be passed in acquired-state to
> cdef functions there is the question of eliminating an extra branch or
> so, but that is still far-fetched, and I'd rather Mark raise the issue
> if it comes an issue than the two of us bikeshedding over it.
>
> I'll try to make this my last post to this thread, I feel we're slipping
> into Dag-and-Stefan-endless-thread territory...
>
> Dag