[SciPy-User] question: OLS with sparse design and dense parameter

josef.pktd at gmail.com josef.pktd at gmail.com
Thu May 1 11:00:21 EDT 2014


On Mon, Apr 28, 2014 at 2:45 AM, Severin Holzer-Graf
<sholzergr at gmail.com> wrote:
> I needed SuiteSparseQR for my work and added the backslash function of
> it to scikits.sparse.
> The modification is available here: https://github.com/duga3/scikits-sparse
>
> 2014-04-28 8:42 GMT+02:00 Nathaniel Smith <njs at pobox.com>:
>> I guess the classic solution for ols are sparse cholesky or sparse qr.
>> scikits.sparse has a cholmod wrapper, and could be modified to have a
>> suitesparseqr wrapper relatively easily.
>>
>> On 28 Apr 2014 03:31, <josef.pktd at gmail.com> wrote:
>>>
>>> (sunday night quest for another sparse recipe)
>>>
>>> This is a sparse algorithm question:
>>>
>>> scipy.sparse.linalg has a choice of linear system solvers, but I have not
>>> much idea about the relative merits. Is there a recommendation for the
>>> following case?
>>>
>>> suppose I want to do estimate a standard linear model by OLS
>>> y = x * b,  find b = argmin(y - x*b)
>>>
>>> x is sparse, with blocks, medium large (20000 by 2000) but if it works
>>> quite a bit larger,
>>> but we don't have a small number of dominant singular values, the solution
>>> of the linear system will be dense.
>>>
>>> the basic parts of my minimum example
>>>
>>> nobsi, n_groups, k_vars = 10, 2000, 3
>>>
>>> #nobsi, n_groups, k_vars = 2, 3, 3
>>>
>>>
>>> xi = np.arange(nobsi)[:,None]**np.arange(3) #vander
>>>
>>> x_sp = sparse.kron(sparse.eye(n_groups), xi).tocsr()
>>>
>>> (balanced kron structure is just to make a quick example)
>>>
>>>
>>> x_sp.shape
>>>
>>> (20000, 6000)
>>>
>>>
>>> first try
>>>
>>>
>>> xpx = x_sp.T.dot(x_sp) # sparse
>>>
>>> xpy = x_sp.T.dot(y) # dense
>>>
>>>
>>> p_dense = np.linalg.solve(xpx.toarray(), xpy)
>>>
>>> p_sp1 = sparse.linalg.spsolve(xpx, xpy)
>>>
>>> p_sp2 = sparse.linalg.lsmr(x_sp, y)[0]
>>>
>>>
>>> timing: 15.6659998894 0.00500011444092 0.00600004196167
>>>
>>>
>>> Is this too small to worry, and anything works, or is there a
>>> recommendation?
>>>
>>> definitely don't use dense.
>>>
>>>
>>> Thanks,
>>>
>>>
>>> Josef

Thanks for the answers,

I would like to stick to scipy for now.
I don't know if sparse usage in statsmodels will get heavy enough to
require additional optional dependencies.
At least not any time soon.

scipy.sparse has several solvers for linear systems and one of them
should work well enough for my purposes (traditional
statistics/econometrics instead of "Big Data"), for now.

Josef


>>>
>>> _______________________________________________
>>> SciPy-User mailing list
>>> SciPy-User at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>
>>
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user



More information about the SciPy-User mailing list