[Cython] Upcoming cython/numpy breakage with stride checking

Tue Apr 9 14:05:20 CEST 2013

On 9 April 2013 02:04, Dave Hirschfeld <dave.hirschfeld at gmail.com> wrote:

> Dag Sverre Seljebotn <d.s.seljebotn at ...> writes:
>
> >
> > cdef np.ndarray[double, mode='fortran'] arr
> >
> > that relies on PEP 3118 contiguous-flags and I did no checking myself.
> > Lots of Cython code does this instead of memoryviews (I still write my
> > own code that way).
> >
> > The memory views OTOH does their own checking, but I also see plenty of
> > references to PyBUF_C_CONTIGUOUS etc. inside
> > Cython/Utility/MemoryView.pyx, so perhaps it does both. Mark would have
> > the definitive answer here.
> >
> > Dag Sverre
> >
>
> Is it the case that MemoryViews are doing some redundant checking?
>
> If so, is that the cause of the order of mangnitude performance difference
> between the MemoryView and ndarray syntax that I observed in the following
> thread?
>
> http://thread.gmane.org/gmane.comp.python.cython.devel/14626
>
> Sorry for hijacking the thread, but it would be fantastic if the
> performance
> issues could be addressed at the same time if they are indeed linked
>
>
> Thanks,
> Dave
>
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel
>

Hey Dave,

I can't see what 'x' is in that thread, but I assume it's a numpy array.
The difference with the memoryviews and the buffer syntax is that
memoryviews require conversion to a different format. The validation part
is more or less the same. But this conversion then means that if you go
back to object, Cython will return a cython.memoryview. Going then back to
a numpy array means you again go through the buffer interface and buffer
format parsing, in addition to chaining views together (doing this often,
e.g. in a loop, will even make you run out of memory).

There are advantages and disadvantages to memoryviews. For instance the
advantage is that we can pass them around without the GIL, and slice them
quickly compared to NumPy (the buffer syntax). However, the conversion cost
can be prohibitive.

We actually have a check that avoids creating a new memoryview object and
avoids the buffer format re-parse altogether. However, the memoryview class
is duplicated in every cython module, which means a memoryview object from
another module will fail this check. This is a general problem in Cython
that could be worked around for memoryviews, but in general the lack of a
Cython runtime is a blessing for distribution purposes and a wart for most
other purposes.

In any case, this is only a small part of the problem. I think memoryviews
could perhaps better be implemented by keeping an ndarray alive and by
adding quicker NumPy view construction if it's been sliced by simply
throwing in pre-constructed dtypes. Conversion back from objects could be
fast if we could all agree on a standard that allows us to verify the type
in a few cycles, with a fallback to the buffer interface.

Cheers,

Mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/cython-devel/attachments/20130409/3bd9f300/attachment.html>