[SciPy-dev] Inclusion of cython code in scipy

Thu Apr 24 06:39:17 EDT 2008

2008/4/24 Prabhu Ramachandran <prabhu at aero.iitb.ac.in>:
>  Lets take a simple case of someone wanting to handle a growing
>  collection of say a million particles and do something to them.  How do
>  you do that in cython/pyrex and get the performance of C and interface
>  to numpy?  Worse, even if it were possible, you'll still need to know
>  something about allocating memory in C and manipulating pointers.  I can
>  do that with C++ and SWIG today.

That's the point: you, being a well-established programmer can do it
easily, but most Python programmers would struggle doing that through
some C or C++ API.  I think this would be pretty easy to do in Cython:

1. Write a function, say create_workspace(nr_elements), that creates a
new ndarray and returns it:

    cdef ndarray results_arr = np.empty((nr_elements,), dtype=np.double)

2. Grab a pointer to the memory (this should become a lot easier after
GSOC 2008):

    cdef double* results = <double*>results_arr.data

3. Run your loop in which you produce data points.  The moment you
have more results than
the output array can hold, call create_workspace(current_size**2), and
use normal numpy indexing to copy the old results to the new location:

    new_results_arr[:current_size] = old_results_arr

4. Rinse and repeat

The beauty of the Cython approach is that you

a) Never have to worry about INCREF and DECREF

b) Can use Python calls within C functions.  You don't want to do that
in your fast inner loop, but take the example above: we only copy
arrays infrequently, and then we'd like to have the full power of
numpy indexing.  Suddenly, sorting, averaging, summing becomes a
one-liner, just like in Python, at the expense of one Python call (and
this won't affect execution time in the above example).

c) Debug in a much cleaner way than C++ or C code: fewer memory leaks,
introspection of source etc.

Cheers
Stéfan