[SciPy-User] Least squares speed

Nathaniel Smith njs at pobox.com
Tue Oct 1 17:24:23 EDT 2013


Here are several ways to solve the least squares problem XB = Y:

scipy.linalg.lstsq(x, y)
np.linalg.lstsq(x, y)
np.dot(scipy.linalg.pinv(x), y)
np.dot(np.linalg.pinv(x), y)

>From the documentation, I would expect these to be ordered by speed,
fastest up top and slowest at the bottom. It turns out they *are*
ordered by speed, except, on my system (linux x86-64, numpy 1.7.1,
scipy 0.12.0, ubuntu-installed atlas), it's exactly backwards, with
very substantial differences (more than a factor of 20!):

# Typical smallish problem for me:
In [40]: x = np.random.randn(780, 2)

In [41]: y = np.random.randn(780, 32 * 275)

In [42]: %timeit scipy.linalg.lstsq(x, y)
1 loops, best of 3: 494 ms per loop

In [43]: %timeit np.linalg.lstsq(x, y)
1 loops, best of 3: 356 ms per loop

In [44]: %timeit np.dot(scipy.linalg.pinv(x), y)
10 loops, best of 3: 62.2 ms per loop

In [45]: %timeit np.dot(np.linalg.pinv(x), y)
10 loops, best of 3: 23.2 ms per loop

Is this expected? I'm particularly confused at why scipy's lstsq
should be almost 40% slower than numpy's. (And shouldn't we have a
fast one-function way to solve least squares problems?)

Any reason not to use the last option? Is it as numerically stable as lstsq?

Is there any other version that's even faster or even more numerically stable?

-n



More information about the SciPy-User mailing list