[SciPy-Dev] Optimizing scipy.misc.comb

Tue Mar 11 22:05:34 EDT 2014

Hey everyone,
I'm working on a project that relies on scipy, numpy, and scikits to
segment three-dimensional images of brain tissue into discrete neurons. The
project is here <https://github.com/janelia-flyem/gala> iand the contest is
here <http://brainiac2.mit.edu/SNEMI3D/>.

I just profiled the code to discover that 8.8% of our (gigantic) runtime is
spent in the comb (n-choose-k) function in scipy/misc/common.py. I
re-implemented the n-choose-k functionality in Cython and found dramatic
speed improvements.

Scipy (float)  time across 100000 trials: 5.127797 seconds
Scipy (long)  time across 100000 trials: 1.583159 seconds
Cython         time across 100000 trials: 0.074289 seconds

I also wrote tests that ensure that it produces the same answer as the
scipy implementation. The code is at the end of the email. It currently
only handles one number at a time rather than an ndarray, and it always
does it with exact-long-int. I have two questions -

1) Why is float precision the default for comb? By my measurements, its
both slower and less precise. What's the idea there?

2) Should I contribute this, and if so where should it go? I'm happy to
write tests, a wrapper to allow it to handle ndarrays, etc, but I need to
know that I'm barking up the right tree. I also am unclear on how scipy
handles Cython and where would be the appropriate place to move the
function.

Thanks so much!
Neal Donnelly

code:

cdef long _nchoosek(int n, int k):
>     cdef long accumulator = 1
>     cdef long i
>     for i in range(1, k+1):
>         accumulator *= (n+1-i)
>         accumulator /= i
>     return accumulator
>
> def nchoosek(n, k):
>     return _nchoosek(n, k)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20140311/84311941/attachment.html>