[Cython] buffer syntax vs. memory view syntax

Thu May 10 10:44:29 CEST 2012

On 10 May 2012 08:37, Dag Sverre Seljebotn <d.s.seljebotn at astro.uio.no> wrote:
> On 05/09/2012 09:08 PM, mark florisson wrote:
>>
>> On 9 May 2012 19:56, Robert Bradshaw<robertwb at gmail.com>  wrote:
>>>
>>> On Tue, May 8, 2012 at 3:35 AM, mark florisson
>>> <markflorisson88 at gmail.com>  wrote:
>>>>
>>>> On 8 May 2012 10:47, Dag Sverre Seljebotn<d.s.seljebotn at astro.uio.no>
>>>>  wrote:
>>>>>
>>>>>
>>>>> After some thinking I believe I can see more clearly where Mark is
>>>>> coming
>>>>> from. To sum up, it's either
>>>>>
>>>>> A) Keep both np.ndarray[double] and double[:] around, with clearly
>>>>> defined
>>>>> and separate roles. np.ndarray[double] implementation is revamped to
>>>>> allow
>>>>> fast slicing etc., based on the double[:] implementation.
>>>>>
>>>>> B) Deprecate np.ndarray[double] sooner rather than later, but make
>>>>> double[:]
>>>>> have functionality that is *really* close to what np.ndarray[double]
>>>>> currently does. In most cases one should be able to basically replace
>>>>> np.ndarray[double] with double[:] and the code should continue to work
>>>>> just
>>>>> like before; difference is that if you pass in anything else than a
>>>>> NumPy
>>>>> array, it will likely fail with a runtime AttributeError at some point
>>>>> rather than fail a PyType_Check.
>>>>
>>>>
>>>> That's a good summary. I have a big preference for B here, but I agree
>>>> that treating a typed memoryview as both a user object (possibly
>>>> converted through callback) and a typed memoryview "subclass" is quite
>>>> magicky.
>>>
>>>
>>> With the talk of overlay modules and go-style interface, being able to
>>> specify the type of an object as well as its bufferness could become
>>> more interesting than it even is now. The notion of supporting
>>> multiple interfaces, e.g.
>>>
>>> cdef np.ndarray&  double[:] my_array
>>>
>>>
>>> could obviate the need for np.ndarray[double]. Until we support
>>> something like this, or decide to reject it, I think we need to keep
>>> the old-style syntax around. (np.ndarray[double] could even become
>>> this intersection type to gain all the new features before we decide
>>> on a appropriate syntax).
>>
>>
>> It's kind of interesting but also kind of a pain to declare everywhere
>> like that. Buffer syntax should by no means deprecated in the near
>> future, but at some point it will be better to have one way to do
>> things, whether slightly magicky or more convoluted or not. Also, as
>> Dag mentioned, if we want fused extension types it makes more sense to
>> remove buffer syntax to disambiguate this and avoid context-dependent
>> special casing (e.g. np.ndarray and array.array).
>
>
> I don't think it hurts to have two ways of doing things if they are
> sufficiently well-motivated, sufficiently well-defined, and sufficiently
> different from one another.
>
> The original reason I wanted double[:] was to stop tying ourselves to NumPy
> and don't promise to be compatible, because of the polymorphic aspect of
> NumPy. I think in the future, the Python behaviour of, say, +, in np.ndarray
> is going to be different from what we have today. You'll have the + fetching
> data over the network in some cases, or treating NA in special ways (I think
> there might be over a thousand about NA on the NumPy now?). In short, lots
> of stuff can be going on that we can't emulate in Cython.
>
> OTOH, perhaps that doesn't matter -- we just raise an exception for the
> NumPy arrays that we can't deal with, and move on...
>

Basically, the only thing that both np.ndarray and memoryviews
guarantee is that they operate through the buffer interface, and that
they obtain this view at certain points (assignment). Hence, if you
decide to resize your array, or swap your axes or whatever, then your
object view may no longer be consistent with your buffer. When or if
your buffer view changes isn't even defined, but kind of dictated by
the implementation.

Hence, if memoryviews overload +, then that + will always be triggered
on a typed view. I do suppose that if people rely on type inference
getting the type right, things start to get messy. As for NA, maybe
they will extend the buffer interface at some point, but on the other
hand Python people may feel that it will be too specific of a use case
(wild guess). Unti then, keep your separate masks around :)

Anyway, a valid point. It's hard to see where this is going and how
future proof it is.

>>>> I wouldn't particularly mind something concise like 'm.obj'.
>>>> The AttributeError would be the case as usual, when a python object
>>>> doesn't have the right interface.
>>>
>>>
>>> Having to insert the .obj in there does make it more painful to
>>> convert existing Python code.
>>
>>
>> Yes, hence my slight bias towards magicky. But I do fully agree with
>> all opposing arguments that say "too much magic". I just prefer to be
>> pragmatic here :)
>
>
> It's a very big decision. I think two or three alternatives are starting to
> crystallise; but to choose between them I think it calls for a CEP with code
> examples, and a request for comment on both cython-users and
> numpy-discussion.
>
> Until that happens, avoiding any magic seems like a conservative
> forward-compatible default.
>
> Dag
>
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel