[Cython] buffer syntax vs. memory view syntax

mark florisson markflorisson88 at gmail.com
Tue May 8 11:22:24 CEST 2012


On 8 May 2012 09:36, Dag Sverre Seljebotn <d.s.seljebotn at astro.uio.no> wrote:
> On 05/08/2012 10:18 AM, Stefan Behnel wrote:
>>
>> Dag Sverre Seljebotn, 08.05.2012 09:57:
>>>
>>> On 05/07/2012 11:21 PM, mark florisson wrote:
>>>>
>>>> On 7 May 2012 19:40, Dag Sverre Seljebotn wrote:
>>>>>
>>>>> mark florisson wrote:
>>>>>>
>>>>>> On 7 May 2012 17:00, Dag Sverre Seljebotn wrote:
>>>>>>>
>>>>>>> On 05/07/2012 04:16 PM, Stefan Behnel wrote:
>>>>>>>>
>>>>>>>> Stefan Behnel, 07.05.2012 15:04:
>>>>>>>>>
>>>>>>>>> Dag Sverre Seljebotn, 07.05.2012 13:48:
>>>>>>>>>>
>>>>>>>>>> BTW, with the coming of memoryviews, me and Mark talked about just
>>>>>>>>>> deprecating the "mytype[...]" meaning buffers, and rather treat it
>>>>>>>>>> as np.ndarray, array.array etc. being some sort of "template
>>>>>>>>>> types".
>>>>>>>>>> That is,
>>>>>>>>>> we disallow "object[int]" and require some special declarations in
>>>>>>>>>> the relevant pxd files.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hmm, yes, it's unfortunate that we have two different types of
>>>>>>>>> syntax now,
>>>>>>>>> one that declares the item type before the brackets and one that
>>>>>>>>> declares it afterwards.
>>>>>>>>
>>>>>>>> Should we consider the
>>>>>>>> buffer interface syntax deprecated and focus on the memory view
>>>>>>>> syntax?
>>>>>>>
>>>>>>>
>>>>>>> I think that's the very-long-term intention. Then again, it may be
>>>>>>> too early
>>>>>>> to really tell yet, we just need to see how the memory views play out
>>>>>>> in
>>>>>>> real life and whether they'll be able to replace np.ndarray[double]
>>>>>>> among real users. We don't want to shove things down users throats.
>>>>>>>
>>>>>>> But the use of the trailing-[] syntax needs some cleaning up. Me and
>>>>>>> Mark agreed we'd put this proposal forward when we got around to it:
>>>>>>>
>>>>>>>   - Deprecate the "object[double]" form, where [dtype] can be stuck
>>>>>>> on
>>>>>>>   any extension type
>>>>>>>
>>>>>>>   - But, do NOT (for the next year at least) deprecate
>>>>>>>   np.ndarray[double],
>>>>>>>   array.array[double], etc. Basically, there should be a magic flag
>>>>>>> in
>>>>>>>   extension type declarations saying "I can be a buffer".
>>>>>>>
>>>>>>> For one thing, that is sort of needed to open up things for templated
>>>>>>> cdef classes/fused types cdef classes, if that is ever implemented.
>>>>>>
>>>>>>
>>>>>> Deprecating is definitely a good start. I think at least if you only
>>>>>> allow two types as buffers it will be at least reasonably clear when
>>>>>> one is dealing with fused types or buffers.
>>>>>>
>>>>>> Basically, I think memoryviews should live up to demands of the users,
>>>>>> which would mean there would be no reason to keep the buffer syntax.
>>>>>
>>>>>
>>>>> But they are different approaches -- use a different type/API, or just
>>>>> try to speed up parts of NumPy..
>>>>>
>>>>>> One thing to do is make memoryviews coerce cheaply back to the
>>>>>> original objects if wanted (which is likely). Writting
>>>>>> np.asarray(mymemview) is kind of annoying.
>>>>>
>>>>>
>>>>> It is going to be very confusing to have type(mymemview),
>>>>> repr(mymemview), and so on come out as NumPy arrays, but not have the
>>>>> full API of NumPy. Unless you auto-convert on getattr to...
>>>>
>>>>
>>>> Yeah, the idea is as very simple, as you mention, just keep the object
>>>> around cached, and when you slice construct one lazily.
>>>>
>>>>> If you want to eradicate the distinction between the backing array and
>>>>> the memory view and make it transparent, I really suggest you kick back
>>>>> alive np.ndarray (it can exist in some 'unrealized' state with delayed
>>>>> construction after slicing, and so on). Implementation much the same
>>>>> either way, it is all about how it is presented to the user.
>>>>
>>>>
>>>> You mean the buffer syntax?
>>>>
>>>>> Something like mymemview.asobject() could work though, and while not
>>>>> much shorter, it would have some polymorphism that np.asarray does not
>>>>> have (based probably on some custom PEP 3118 extension)
>>>>
>>>>
>>>> I was thinking you could allow the user to register a callback, and
>>>> use that to coerce from a memoryview back to an object (given a
>>>> memoryview object). For numpy this would be np.asarray, and the
>>>> implementation is allowed to cache the result (which it will).
>>>> It may be too magicky though... but it will be convenient. The
>>>> memoryview will act as a subclass, meaning that any of its methods
>>>> will override methods of the converted object.
>>>
>>>
>>> My point was that this seems *way* to magicky.
>>>
>>> Beyond "confusing users" and so on that are sort of subjective, here's a
>>> fundamental problem for you: We're making it very difficult to type-infer
>>> memoryviews. Consider:
>>>
>>> cdef double[:] x = ...
>>> y = x
>>> print y.shape
>>>
>>> Now, because y is not typed, you're semantically throwing in a conversion
>>> on line 2, so that line 3 says that you want the attribute access to be
>>> invoked on "whatever object x coerced back to". And we have no idea what
>>> kind of object that is.
>>>
>>> If you don't transparently convert to object, it'd be safe to
>>> automatically
>>> infer y as a double[:].
>>
>>
>> Why can't y be inferred as the type of x due to the assignment?
>>
>>
>>> On a related note, I've said before that I dislike the notion of
>>>
>>> cdef double[:] mview = obj
>>>
>>> I'd rather like
>>>
>>> cdef double[:] mview = double[:](obj)
>>
>>
>> Why? We currently allow
>>
>>     cdef char* s = some_py_bytes_string
>>
>> Auto-coercion is a serious part of the language, and I don't see the
>> advantage of requiring the redundancy in the case above. It's clear enough
>> to me what the typed assignment is intended to mean: get me a buffer view
>> on the object, regardless of what it is.
>>
>>
>>> I support Robert in that "np.ndarray[double]" is the syntax to use when
>>> you
>>> want this kind of transparent "be an object when I need to and a memory
>>> view when I need to".
>>>
>>> Proposal:
>>>
>>>  1) We NEVER deprecate "np.ndarray[double]", we commit to keeping that in
>>> the language. It means exactly what you would like double[:] to mean,
>>> i.e.
>>> a variable that is memoryview when you need to and an object otherwise.
>>> When you use this type, you bear the consequences of early-binding things
>>> that could in theory be overridden.
>>>
>>>  2) double[:] is for when you want to access data of *any* Python object
>>> in
>>> a generic way. Raw PEP 3118. In those situations, access to the
>>> underlying
>>> object is much less useful.
>>>
>>>   2a) Therefore we require that you do "mview.asobject()" manually; doing
>>> "mview.foo()" is a compile-time error
>>
>>
>> Sounds good. I think that would clean up the current syntax overlap very
>> nicely.
>>
>>
>>>   2b) To drive the point home among users, and aid type inference and
>>> overall language clarity, we REMOVE the auto-acquisition and require that
>>> you do
>>>
>>>     cdef double[:] mview = double[:](obj)
>>
>>
>> I don't see the point, as noted above. Either "obj" is statically typed
>> and
>> the bare assignment becomes a no-op, or it's not typed and the assignment
>> coerces by creating a view. As with all other typed assignments.
>>
>>
>>>   2c) Perhaps: Do not even coerce to a Python memoryview and disallow
>>> "print mview"; instead require that you do "print mview.asmemoryview()"
>>> or
>>> "print memoryview(mview)" or somesuch.
>>
>>
>> This seems to depend on 2b.
>
>
> This I don't understand. The question of 2c) is the analogue to
> auto-coercion of "char*" to bytes; approving 2c) would put memoryviews in
> line with char*.
>
> Then again, we could in future auto-coerce char* to a ctypes pointer, and in
> that case, coercing a memoryview to an object representing that memoryview
> would be OK.

Character pointers coerce to strings. Hell, even structs coerce to and
from python dicts, so disallowing the same for memoryviews would just
be inconsistent and inconvenient.

> Either way, you would never get back the same object that you coerced from!
>
> Dag
>
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel


More information about the cython-devel mailing list