[Cython] Multidimensional indexing of C++ objects

Sat Jul 4 00:43:51 CEST 2015

Hi everyone,
I'm a GSOC student working to make a Cython API for DyND. DyND
<https://github.com/libdynd/libdynd> is a relatively new n-dimensional
array library in C++ that is based on NumPy. A full set of Python bindings
(created using Cython) are provided as a separate package. The goal of my
project is to make it so that DyND arrays can be used easily within Cython
so that an n-dimensional array object can be used without any of the
corresponding Python overhead.

Currently, there isn't a good way to assign to multidimensional slices
within Cython. Since the indexing operator in C++ is limited to a single
argument, we use the call operator to represent multidimensional indexing,
and then use a proxy class to perform assignment to a slice.
Currently, in C++, assigning to a slice along the second axis of a DyND
array looks like this:

a(irange(), 1).vals() = 0;

Unfortunately, in Cython, only the index operator can be used for
assignment, so following the C++ syntax isn't currently possible. Does
anyone know of a good way to address this? I'm willing to spend some time
implementing a new feature if we can reach a consensus on a good way to
deal with this. Here are some possible solutions I've thought of:

1. We could allow assignment to C++ method and function calls that return
references. This has the advantage that it matches the existing syntax in
C++ for dealing with C++ objects. Though Cython is a Python-like language,
the ability to manipulate C++ objects directly is a key part of its feature
set. Since the native way to do things like multidimensional indexing in
C++ is via the call operator, it seems sensible to allow assignment to
C++-level call operations in Cython as well. This could be enabled via a
Cython compiler directive and be disabled by default. Using a compiler
directive like this would result in an interface similar to the one already
used for cdivision, wrap-around indexing, and index bounds checking. The
user would avoid unexpected results by default, but be able to get the
needed functionality simply by enabling it.

2. We could recommend that all assignment operations of this nature be
wrapped in a fake method that wraps the assignment in it's c-level name.
This has the advantage that it works in current and past versions of
Cython, but it is a rather unusual hack. For example, something like the
following would work right now:

# declared as a method in a pxd file:
void assign "vals() = "(int value) except +

# used in a pyx file to perform assignment to a slice of an array a:
a(irange(), 1).assign(0)

For DyND, at least for now, this would be a workable solution since the
difference lies primarily in the placement of the parenthesis and the
presence of the assignment operator. The syntax is less clear than it could
be, but it would work. On the other hand, other libraries may not be so
lucky since this involves replacing assignment to a slice with a method
call. For example, the expression template libraries Eigen and Blaze-lib
would encounter incompatibility to varying degrees if someone were to try
using them within Cython. This method also has the disadvantage that it
creates an interface that is fundamentally different from both the Python
and C++ interfaces.

I have also considered, writing a proxy class that can serve as an
effective temporary value while a multidimensional index is constructed
from a series of calls to operator[]. This is a reasonable approach, but it
leads to unnecessary code bloat. It also complicates the interface exposed
to users, since operator[] would be needed for left hand values and
operator() would be needed for right hand values. This would also make it
so that users that want to use these C++ classes in Cython would have to
include and link against another set of headers and libraries to be able to
use the proxy class. The burden of maintainability for Python bindings
created in this way would be greater as well. This also isn't a viable
approach for using any C++ class that overloads both operators.

Another option I have considered is allowing Cython's indexing operator to
dispatch to a different function. Currently, user-defined cname entries for
overloaded operators are not used. If this were changed for the indexing
operator, indexing could be performed at the C++ level using some other
method. This doesn't look like a viable approach though, since, for this to
really work, users would need some way to call different methods when a C++
object is being indexed and when it is being assigned to. Using operator[]
for left-hand values and operator() for right-hand values is a possible
solution, but that isn't a very consistent interface. Doing this would also
increase the complexity of the existing code for indexing in the Cython
compiler and could lead to name collisions for classes that overload both
operator[] and operator().

Are any of these acceptable ways to go forward? Does anyone have any better
ideas? My preference would definitely be toward allowing C++ calls
returning references to be used as lvalues, but I'd really appreciate any
alternative solutions.

Thanks!
-Ian Henriksen
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/cython-devel/attachments/20150703/cbe8581a/attachment.html>