[Numpy-discussion] Einstein summation convention

Thu Jan 27 05:04:14 EST 2011

Hi Mark,
I was very interested to see that you had written an implementation of
the Einstein summation convention for numpy.

I'd thought about this last year, and wrote some notes on what I
thought might be a reasonable interface. Unfortunately I was not in a
position to actually implement it myself, so I left it.

But I'll set them out here in case they are useful to you, or anyone else.

The discussion about the dot product earlier last year
e.g. http://mail.scipy.org/pipermail/numpy-discussion/2010-April/050160.html,
http://mail.scipy.org/pipermail/numpy-discussion/2010-April/050202.html
and the more recent work on tensor manipulation
http://mail.scipy.org/pipermail/numpy-discussion/2010-June/050945.html
and named arrays,
http://projects.scipy.org/numpy/wiki/NdarrayWithNamedAxes

suggested to me that there is a lot of interest in writing tensor and
vector products
more explicitly. Rather than defining a new binary operator for matrix
multiplication, perhaps the most pressing need is for an easy,
intuitive, notation to define the axes that are being summed over.

My initial thought was that perhaps one could do something like
a[:,:*]*b[:*,:] = dot(a,b) where the * would denote the axis that was being
summed over. So
a[:,:*]*b[:,:*] = dot(a,b.T)
and perhaps allow more lables, e.g. &
a[:&,:*]*b[:&,:*] = tensordot(a,b,axes=([0,1],[0,1]))

I'm not sure whether this is possible to implement, though. Even if it
was, I suppose it would require major changes to the python slicing
mechanism.

A rather more cumbersome, though more powerful idea, might be to
somehow parse operations written in terms of the Einstein summation convention.

So in this case a function einst might be implemented such that e.g. supposing

a.shape = (M,N), b.shape = (N,P), c.shape = (M,N)
d.shape = (M,), e.shape = (M,), f.shape = (M,M), g.shape = (M,N,N,P)

from numpy import einst as E

The dot product would be taken where the same letter is found in
lowerscript and upperscript, but no sum would be taken where the same
letter was in
lowerscript both times. So

E(d,'i',e,'I') = sum_i d_{i}*b_{i}  = dot(a,b)
E(a,'ik',b,'Kj') = sum_k a_{ik}*b_{kj}  = dot(a,b)
E(f,'ik',a,'IK') = sum_{ik} f_{ik}*a_{ik} = tensordot(a,b,axes=([0,1],[0,1]))
E(f,'ki',a,'IK') = sum_{ik} f_{ki}*a_{ik} = tensordot(a,b,axes=([1,0],[0,1]))

Contraction, diagonalization and outer product emerge naturally

E(f,'iI') = sum_i a_{ii} = f.trace()
E(f,'ii') = a_{ii} = f.diagonal()
E(d,'i',e,'k') = outer(d,e)

Multiplication of more than two tensors could be performed:

E(d,'i',f,'Ik',e,'K') = sum_{ik} d_{i}*f_{ik}*e_{k}   =
dot(d,dot(f,e)) = d.dot(f.dot(e))
E(a,'ik',g,'IKlm',b,'LM') = sum_{iklm} a_{ik}*g_{iklm}*b{lm}
      = tensordot(a,tensordot(g,b,axes=([2,3],[0,1])),axes=([0,1],[0,1]))

Multiplication term by term without summation is implied by repeated
letters of the same case. Multiplication over unequal axis lengths
would raise an error.

E(d,'j',e,'j') = d_{j}*e_{j} = d*e
E(a,'ij',c,'ij') = a_{ij}*c_{ij} = a*c
E(a,'ij',f,'ij')  undefined, since a.shape[0] != f.shape[0]
E(f,'ij',f,'ji') = f_{ij}*f_{ji} = f*f.T

Broadcasting would be explicit

E(f,'ij',d,'j') =a_{ij}*d_j = a*d
E(f,'ij',d,'i') =a_{ij}*d_i = a*d[:,None]

If a definite order of precedence of sums were established (e.g. doing
the rightmost multiplication first) dotting and term by term
multiplication could be mixed

E(d,'I',d,'i',e,'i') = sum_i d_i*d_i*e_i

I hope this is of some use.

--George Nurser