[SciPy-User] fast small matrix multiplication with cython?
Skipper Seabold
jsseabold at gmail.com
Thu Dec 9 17:01:55 EST 2010
On Thu, Dec 9, 2010 at 4:33 PM, Skipper Seabold <jsseabold at gmail.com> wrote:
> On Wed, Dec 8, 2010 at 11:28 PM, <josef.pktd at gmail.com> wrote:
>>>
>>> It looks like I don't save too much time with just Python/scipy
>>> optimizations. Apparently ~75% of the time is spent in l-bfgs-b,
>>> judging by its user time output and the profiler's CPU time output(?).
>>> Non-cython versions:
>>>
>>> Brief and rough profiling on my laptop for ARMA(2,2) with 1000
>>> observations. Optimization uses fmin_l_bfgs_b with m = 12 and iprint
>>> = 0.
>>
>> Completely different idea: How costly are the numerical derivatives in l-bfgs-b?
>> With l-bfgs-b, you should be able to replace the derivatives with the
>> complex step derivatives that calculate the loglike function value and
>> the derivatives in one iteration.
>>
>
> I couldn't figure out how to use it without some hacks. The
> fmin_l_bfgs_b will call both f and fprime as (x, *args), but
> approx_fprime or approx_fprime_cs need actually approx_fprime(x, func,
> args=args) and call func(x, *args). I changed fmin_l_bfgs_b to make
> the call like this for the gradient, and I get (different computer)
>
>
> Using approx_fprime_cs
> -----------------------------------
> 861609 function calls (861525 primitive calls) in 3.337 CPU seconds
>
> Ordered by: internal time
>
> ncalls tottime percall cumtime percall filename:lineno(function)
> 70 1.942 0.028 3.213 0.046 kalmanf.py:504(loglike)
> 840296 1.229 0.000 1.229 0.000 {numpy.core._dotblas.dot}
> 56 0.038 0.001 0.038 0.001 {numpy.linalg.lapack_lite.zgesv}
> 270 0.025 0.000 0.025 0.000 {sum}
> 90 0.019 0.000 0.019 0.000 {numpy.linalg.lapack_lite.dgesdd}
> 46 0.013 0.000 0.014 0.000
> function_base.py:494(asarray_chkfinite)
> 162 0.012 0.000 0.014 0.000 arima.py:117(_transparams)
>
>
> Using approx_grad = True
> ---------------------------------------
> 1097454 function calls (1097370 primitive calls) in 3.615 CPU seconds
>
> Ordered by: internal time
>
> ncalls tottime percall cumtime percall filename:lineno(function)
> 90 2.316 0.026 3.489 0.039 kalmanf.py:504(loglike)
> 1073757 1.164 0.000 1.164 0.000 {numpy.core._dotblas.dot}
> 270 0.025 0.000 0.025 0.000 {sum}
> 90 0.020 0.000 0.020 0.000 {numpy.linalg.lapack_lite.dgesdd}
> 182 0.014 0.000 0.016 0.000 arima.py:117(_transparams)
> 46 0.013 0.000 0.014 0.000
> function_base.py:494(asarray_chkfinite)
> 46 0.008 0.000 0.023 0.000 decomp_svd.py:12(svd)
> 23 0.004 0.000 0.004 0.000 {method 'var' of
> 'numpy.ndarray' objects}
>
>
> Definitely less function calls and a little faster, but I had to write
> some hacks to get it to work.
>
This is more like it! With fast recursions in Cython:
15186 function calls (15102 primitive calls) in 0.750 CPU seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
18 0.622 0.035 0.625 0.035
kalman_loglike.pyx:15(kalman_loglike)
270 0.024 0.000 0.024 0.000 {sum}
90 0.019 0.000 0.019 0.000 {numpy.linalg.lapack_lite.dgesdd}
156 0.013 0.000 0.013 0.000 {numpy.core._dotblas.dot}
46 0.013 0.000 0.014 0.000
function_base.py:494(asarray_chkfinite)
110 0.008 0.000 0.010 0.000 arima.py:118(_transparams)
46 0.008 0.000 0.023 0.000 decomp_svd.py:12(svd)
23 0.004 0.000 0.004 0.000 {method 'var' of
'numpy.ndarray' objects}
26 0.004 0.000 0.004 0.000 tsatools.py:109(lagmat)
90 0.004 0.000 0.042 0.000 arima.py:197(loglike_css)
81 0.004 0.000 0.004 0.000
{numpy.core.multiarray._fastCopyAndTranspose}
I can live with this for now.
Skipper
More information about the SciPy-User
mailing list