[SciPy-Dev] Optimizing scipy.misc.comb

Eric Moore ewm at redtetrahedron.org
Wed Mar 12 06:50:30 EDT 2014


On Tuesday, March 11, 2014, Neal Donnelly <nmckenna at princeton.edu> wrote:

> Hey everyone,
> I'm working on a project that relies on scipy, numpy, and scikits to
> segment three-dimensional images of brain tissue into discrete neurons. The
> project is here <https://github.com/janelia-flyem/gala> iand the contest
> is here <http://brainiac2.mit.edu/SNEMI3D/>.
>
> I just profiled the code to discover that 8.8% of our (gigantic) runtime
> is spent in the comb (n-choose-k) function in scipy/misc/common.py. I
> re-implemented the n-choose-k functionality in Cython and found dramatic
> speed improvements.
>
> Scipy (float)  time across 100000 trials: 5.127797 seconds
> Scipy (long)  time across 100000 trials: 1.583159 seconds
> Cython         time across 100000 trials: 0.074289 seconds
>
> I also wrote tests that ensure that it produces the same answer as the
> scipy implementation. The code is at the end of the email. It currently
> only handles one number at a time rather than an ndarray, and it always
> does it with exact-long-int. I have two questions -
>
> 1) Why is float precision the default for comb? By my measurements, its
> both slower and less precise. What's the idea there?
>
>

Your cython code uses a C long which is at best 64 bit. In cases where comb
gets really big the value will overflow the int. It takes a good bit more
to overflow a double. Presumably the cases you care about are all small
enough that this isn't a problem.


> 2) Should I contribute this, and if so where should it go? I'm happy to
> write tests, a wrapper to allow it to handle ndarrays, etc, but I need to
> know that I'm barking up the right tree. I also am unclear on how scipy
> handles Cython and where would be the appropriate place to move the
> function.
>
> Thanks so much!
> Neal Donnelly
>
> code:
>
> cdef long _nchoosek(int n, int k):
>>     cdef long accumulator = 1
>>     cdef long i
>>     for i in range(1, k+1):
>>         accumulator *= (n+1-i)
>>         accumulator /= i
>>     return accumulator
>>
>> def nchoosek(n, k):
>>     return _nchoosek(n, k)
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20140312/010f3c2a/attachment.html>


More information about the SciPy-Dev mailing list