[SciPy-User] fast small matrix multiplication with cython?

Wed Dec 8 23:15:55 EST 2010

On Wed, Dec 8, 2010 at 8:08 PM, Skipper Seabold <jsseabold at gmail.com> wrote:
> On Tue, Dec 7, 2010 at 7:37 PM, Charles R Harris
> <charlesr.harris at gmail.com> wrote:
>>
>>
>> On Tue, Dec 7, 2010 at 5:09 PM, <josef.pktd at gmail.com> wrote:
>>>
>>> On Tue, Dec 7, 2010 at 6:45 PM, Charles R Harris
>>> <charlesr.harris at gmail.com> wrote:
>>> >
>>> >
>>> > On Tue, Dec 7, 2010 at 10:39 AM, Skipper Seabold <jsseabold at gmail.com>
>>> > wrote:
>>> >>
>>> >> On Tue, Dec 7, 2010 at 12:17 PM, Charles R Harris
>>> >> <charlesr.harris at gmail.com> wrote:
>>> >> >
>>> >> >
>>> >> > On Tue, Dec 7, 2010 at 10:05 AM, <josef.pktd at gmail.com> wrote:
>>> >>
>>> >> <snip>
>>> >>
>>> >> >> It's still a linear filter, non-linear optimization comes in because
>>> >> >> the exact loglikelihood function for ARMA is non-linear in the
>>> >> >> coefficients.
>>> >> >> (There might be a way to calculate the derivative in the same loop,
>>> >> >> but that's a different issue.)
>>> >> >>
>>> >> >
>>> >> > The unscented Kalman filter is a better way to estimate the
>>> >> > covariance
>>> >> > of a
>>> >> > non-linear process, think of it as a better integrator. If the
>>> >> > propagation
>>> >> > is easy to compute, which seems to be the case here, it will probably
>>> >> > save
>>> >> > you some time. You might even be able to use the basic idea and skip
>>> >> > the
>>> >> > Kalman part altogether.
>>> >> >
>>> >> > My general aim here is to optimize the algorithm first before getting
>>> >> > caught
>>> >> > up in the details of matrix multiplication in c. Premature
>>> >> > optimization
>>> >> > and
>>> >> > all that.
>>> >> >
>>> >>
>>> >> Hmm I haven't seen this mentioned much in what I've been reading or
>>> >> the documentation on existing software for ARMA processes, so I never
>>> >> thought much about it.  I will have a closer look.  Well, google turns
>>> >> up this thread...
>>> >>
>>> >
>>> > I've started reading up a bit on what you are doing and the application
>>> > doesn't use extended Kalman filters, so the suggestion to use unscented
>>> > Kalman filters is irrelevant. Sorry about that ;) I'm still wading
>>> > through
>>> > the various statistical notation thickets to see if there might be a
>>> > better
>>> > form to use for the problem but I don't see one at the moment.
>>>
>>> There are faster ways to get the likelihood for a simple ARMA process
>>> than using a Kalman Filter. I think the main advantage and reason for
>>> the popularity of Kalman Filter for this is that it is easier to
>>> extend. So using too many tricks that are specific to the simple ARMA
>>> might take away much of the advantage of getting a fast Kalman Filter.
>>>
>>> I didn't read much of the details for the Kalman Filter for this, but
>>> that was my conclusion from the non-Kalman Filter literature.
>>>
>
> That's the idea.  The advantages of the KF are that it's inherently
> structural and it's *very* general.  The ARMA case was just a jumping
> off point, but has also proved to be a sticking point.  I'd like to
> have a fast and general linear Gaussian KF available for larger state
> space models, as it's the baseline workhorse for estimating linearized
> large macroeconomic models at the moment.
>
>>
>> Well, there are five forms of the standard Kalman filter that I am somewhat
>> familiar with and some are better suited to some applications than others.
>> But at this point I don't see that there is any reason not to use the common
>> form for the ARMA case. It would be interesting to see some profiling since
>> the matrix inversions are likely to dominate as the number of variables go
>> up.
>>
>
> If interested, I am using Durbin and Koopman's "Time Series Analysis
> by State Space Methods" and Andrew Harvey's "Forecasting, Structural
> Time Series Models, and the Kalman Filter" as my main references for
> this.  The former is nice and concise but has a lot of details,
> suggestions, and use cases.
>
> I have looked some more and it does seem that the filter converges to
> its steady state after maybe 2 % of the iterations depending on the
> properties of the series, so for the ARMA case I can switch to the
> fast recursions only updating the state (not quite sure on the time
> savings yet), but I am moving away from my goal of a fast and general
> KF implementation...
>
> About the matrix inversions, the ARMA model right now is only
> univariate, so there is no real inverting of matrices.  The suggestion
> of Durbin and Koopman for larger, multivariate cases is to split it
> into a series of univariate problems in order to avoid inversion.
> They provide in the book some benchmarks on computational efficiency
> in terms of multiplications needed based on their experience writing
> http://www.ssfpack.com/index.html.
>
> Skipper
>

It looks like I don't save too much time with just Python/scipy
optimizations.  Apparently ~75% of the time is spent in l-bfgs-b,
judging by its user time output and the profiler's CPU time output(?).
 Non-cython versions:

Brief and rough profiling on my laptop for ARMA(2,2) with 1000
observations.  Optimization uses fmin_l_bfgs_b with m = 12 and iprint
= 0.

Full Kalman Filter, starting parameters found via iterations in Python
-----------------------------------------------------------------------------------------------------
         1696041 function calls (1695957 primitive calls) in 7.622 CPU seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      114    3.226    0.028    5.123    0.045 kalmanf.py:504(loglike)
  1368384    1.872    0.000    1.872    0.000 {numpy.core._dotblas.dot}
      102    1.736    0.017    2.395    0.023 arima.py:196(loglike_css)
   203694    0.622    0.000    0.622    0.000 {sum}
       90    0.023    0.000    0.023    0.000 {numpy.linalg.lapack_lite.dgesdd}
      218    0.020    0.000    0.024    0.000 arima.py:117(_transparams)
       46    0.015    0.000    0.016    0.000
function_base.py:494(asarray_chkfinite)
     1208    0.013    0.000    0.013    0.000 {numpy.core.multiarray.array}
   102163    0.012    0.000    0.012    0.000 {method 'append' of
'list' objects}
       46    0.010    0.000    0.028    0.001 decomp_svd.py:12(svd)
<snip>

Full Kalman Filter, starting parameters found with scipy.signal.lfilter
--------------------------------------------------------------------------------------------------
         1249493 function calls (1249409 primitive calls) in 4.596 CPU seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      102    2.862    0.028    4.437    0.043 kalmanf.py:504(loglike)
  1224360    1.556    0.000    1.556    0.000 {numpy.core._dotblas.dot}
      270    0.029    0.000    0.029    0.000 {sum}
       90    0.025    0.000    0.025    0.000 {numpy.linalg.lapack_lite.dgesdd}
      194    0.018    0.000    0.021    0.000 arima.py:117(_transparams)
       46    0.016    0.000    0.017    0.000
function_base.py:494(asarray_chkfinite)
       46    0.011    0.000    0.029    0.001 decomp_svd.py:12(svd)
<snip>

Kalman Filter with fast recursions, starting parameters with lfilter
---------------------------------------------------------------------------------------------
         1097454 function calls (1097370 primitive calls) in 4.465 CPU seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
       90    2.860    0.032    4.305    0.048 kalmanf.py:504(loglike)
  1073757    1.431    0.000    1.431    0.000 {numpy.core._dotblas.dot}
      270    0.029    0.000    0.029    0.000 {sum}
       90    0.025    0.000    0.025    0.000 {numpy.linalg.lapack_lite.dgesdd}
      182    0.016    0.000    0.019    0.000 arima.py:117(_transparams)
       46    0.016    0.000    0.018    0.000
function_base.py:494(asarray_chkfinite)
       46    0.011    0.000    0.030    0.001 decomp_svd.py:12(svd)
<snip>

Skipper