[Numpy-discussion] matrix default to column vector?

josef.pktd at gmail.com josef.pktd at gmail.com
Fri Jun 5 13:18:56 EDT 2009


On Fri, Jun 5, 2009 at 11:38 AM, Olivier Verdier <zelbier at gmail.com> wrote:
> I agree. It would be a good idea to have matrices out of numpy as a
> standalone package.
> Indeed, having matrices in the numpy core comes at a pedagogical cost.
> Newcomers (as I once was) do not know which to use. Matrix or array? It
> turns out that the vast majority of numpy/scipy modules use arrays, so
> arrays is the preferred way to go.
> It would thus be clearer to have arrays in numpy and matrices available as
> an external package.
> Besides, I think matrices can be pretty tricky when used for teaching. For
> instance, you have to explain that all the operators work component-wise,
> except the multiplication! Another caveat is that since matrices are always
> 2x2, the "scalar product" of two column vectors computed as " x.T * y" will
> not be a scalar, but a 2x2 matrix. There is also the fact that you must cast
> all your vectors to column/raw matrices (as in matlab). For all these
> reasons, I prefer to use arrays and dot for teaching, and I have never had
> any complaints.

For anyone with an econometrics background in gauss or matlab, numpy
arrays have a huge adjustment cost.
I'm only using arrays for consistency, but my econometrics code is
filled with arr[np.newaxis,:] .

I wouldn't recommend my friends switching from gauss to numpy/scipy
when they want to write or develop some econometrics algorithm. gauss
feels like copying the formula from the notes or a book. With numpy
arrays I always have to recover one dimension, and I have to verify
the shape of any array more often.
In econometrics the default vector is often a 2d vector so x.T*x and
x*x.T have very different meanings.
python/numpy/scipy have a lot of other advantages compared to gauss or
matlab, but it's not the linear algebra syntax.

So, I don't see any sense in removing matrices, you can just ignore
them and tell your students to do so. It increases the initial
switching cost, even if users, that get more into it, then switch to
arrays.

Josef

(BTW: I think scipy.stats will soon become matrix friendly)

>>> X = np.matrix([[1,0],[1,1]])
>>> X.mean()
0.75
>>> X.mean(0)
matrix([[ 1. ,  0.5]])
>>> X-X.mean(0)
matrix([[ 0. , -0.5],
        [ 0. ,  0.5]])


>>> Y = np.array([[1,0],[1,1]])
>>> Y.mean(0).shape
(2,)
>>> Y - Y.mean(0)[np.newaxis,:]
array([[ 0. , -0.5],
       [ 0. ,  0.5]])

>>> np.matrix([[1,0],[1,1]])**2
matrix([[1, 0],
        [2, 1]])
>>> np.array([[1,0],[1,1]])**2
array([[1, 0],
       [1, 1]])

>>> np.matrix([[1,0],[1,1]])**(-1)
matrix([[ 1.,  0.],
        [-1.,  1.]])
>>> np.array([[1,0],[1,1]])**(-1)
array([[          1, -2147483648],
       [          1,           1]])


>>> x = np.matrix([[1],[0]])
>>> x.T*x
matrix([[1]])
>>> (x.T*x).shape
(1, 1)
>>> (x*x.T)
matrix([[1, 0],
        [0, 0]])
>>> (x*x.T).shape
(2, 2)


>>> y = np.array([1,0])
>>> np.dot(y,y)
1
>>> np.dot(y,y.T)
1
>>> np.dot(y[:,np.newaxis], y[np.newaxis,:])
array([[1, 0],
       [0, 0]])



More information about the NumPy-Discussion mailing list