[SciPy-dev] Inclusion of cython code in scipy
Francesc Altet
faltet at carabos.com
Thu Apr 24 07:42:14 EDT 2008
A Thursday 24 April 2008, Stéfan van der Walt escrigué:
> 2008/4/24 Prabhu Ramachandran <prabhu at aero.iitb.ac.in>:
> > Lets take a simple case of someone wanting to handle a growing
> > collection of say a million particles and do something to them.
> > How do you do that in cython/pyrex and get the performance of C and
> > interface to numpy? Worse, even if it were possible, you'll still
> > need to know something about allocating memory in C and
> > manipulating pointers. I can do that with C++ and SWIG today.
>
> That's the point: you, being a well-established programmer can do it
> easily, but most Python programmers would struggle doing that through
> some C or C++ API. I think this would be pretty easy to do in
> Cython:
>
> 1. Write a function, say create_workspace(nr_elements), that creates
> a new ndarray and returns it:
>
> cdef ndarray results_arr = np.empty((nr_elements,),
> dtype=np.double)
>
> 2. Grab a pointer to the memory (this should become a lot easier
> after GSOC 2008):
>
> cdef double* results = <double*>results_arr.data
>
> 3. Run your loop in which you produce data points. The moment you
> have more results than
> the output array can hold, call create_workspace(current_size**2),
> and use normal numpy indexing to copy the old results to the new
> location:
>
> new_results_arr[:current_size] = old_results_arr
>
> 4. Rinse and repeat
>
> The beauty of the Cython approach is that you
>
> a) Never have to worry about INCREF and DECREF
>
> b) Can use Python calls within C functions. You don't want to do
> that in your fast inner loop, but take the example above: we only
> copy arrays infrequently, and then we'd like to have the full power
> of numpy indexing. Suddenly, sorting, averaging, summing becomes a
> one-liner, just like in Python, at the expense of one Python call
> (and this won't affect execution time in the above example).
>
> c) Debug in a much cleaner way than C++ or C code: fewer memory
> leaks, introspection of source etc.
Stéfan has shown excellent points about Pyrex/Cython. Let me just add
that if you start to have a large library of extensions, you can also
avoid the cost of Python calls if what you want is to use one extension
method from another extension method.
For example, when I know that a method is going to be public, I'm very
used to declare two versions: one that is callable directly from
another extension (i.e. without the Python call cost) and another that
is callable from Python. So, in the code:
def getitem(self, long nslot, ndarray nparr, long start):
self.getitem_(nslot, nparr.data, start)
cdef getitem_(self, long nslot, void *data, long start):
cdef void *cachedata
cachedata = self.getitem1_(nslot)
# Copy the data in cache to destination
memcpy(<char *>data + start * self.itemsize, cachedata,
self.slotsize * self.itemsize)
calling MyClass.getitem_() from another extension will save you the
Python call. This is not really important for most of occasions, but
it can certainly be in others.
My two cents,
--
>0,0< Francesc Altet http://www.carabos.com/
V V Cárabos Coop. V. Enjoy Data
"-"
More information about the SciPy-Dev
mailing list