[Numpy-discussion] Repeated dot products

Thu Jan 7 09:05:19 EST 2010

On 12/12/2009 22:55, T J wrote:
> Hi,
>
> Suppose I have an array of shape:  (n, k, k).  In this case, I have n
> k-by-k matrices.  My goal is to compute the product of a (potentially
> large) user-specified selection (with replacement) of these matrices.
> For example,
>
>     x = [0,1,2,1,3,3,2,1,3,2,1,5,3,2,3,5,2,5,3,2,1,3,5,6]

TJ,
   what are your n, k, len(x) ?

_dotblas.dot is fast: dot( 10x10 matrices ) takes ~ 22 usec on my g4 ppc,
which is ~ 15 clock cycles (700 MHz) per mem access * +.

A hack to find repeated pairs (or triples ...) follows.
Your sequence above has only (3,2) 4 times, no win.

(Can someone give a probabilistic estimate of the number of non-overlapping pairs
in N letters from an alphabet of size A ?)

#!/usr/bin/env python
# numpy-discuss 2009 12dec TJ repeated dot products

from __future__ import division
from collections import defaultdict
import numpy as np

__version__ = "2010 7jan denis"

def pairs( s, Len=2 ):
     """ repeated non-overlapping pairs (substrings, subwords)
         "abracadabra" -> ab ra [[0 7] [2 9]], not br
         Len=3: triples, 4 ...
     """
     # bruteforce
     # grow repeated 2 3 ... ?
     pairs = defaultdict(list)
     for j in range(len(s)-Len+1):
         pairs[ s[j:j+Len] ].append(j)
     min2 = filter( lambda x: len(x) > 1, pairs.values() )
     min2.sort( key = lambda x: len(x), reverse=True )
         # remove overlaps --
         # (if many, during init scan would be faster)
     runs = np.zeros( len(s), np.uint8 )
     run = np.ones( Len, np.uint8 )
     run[0] = Len
     chains = []
     for ovchain in min2:
         chain = []
         for c in ovchain:
             if not runs[c:c+Len].any():
                 runs[c:c+Len] = run
                 chain.append(c)
         if len(chain) > 1:
             chains.append(chain)
     return (chains, runs)

#...............................................................................
if __name__ == "__main__":
     import sys
     abra = "abracadabra"
     alph = 5
     randlen = 100
     randseed = 1
     exec( "\n".join( sys.argv[1:] ))  # Test= ...

     print "pairs( %s ) --" % abra
     print pairs( abra )  # ab [0, 7], br [2, 9]]
     print pairs( abra, 3 )  # abr [0, 7]

     np.random.seed( randseed )
     r = np.random.random_integers( 1, alph, randlen )
     chains, runs = pairs( tuple(r) )
     npair = sum([ len(c) for c in chains ])
     print "%d repeated pairs in %d random %d" % (npair, randlen, alph)
         # 35 repeated pairs in 100 random 5  (prob estimate this ?)
         # 25 repeated pairs in 100 random 10