Sequences in list

Michael Spencer mahs at telcopartners.com
Wed May 4 18:00:49 EDT 2005


temp at mclift.fsnet.co.uk wrote:

> Your code will identify sequences in a list, but how to index them? I
> have an idea, which seems ridiculously long-winded, but should work.
> First, put the groups from the ‘make_diffs’ function into a list
> and do a search for the sequence identified from the
> ‘compare_forms’ function using the ‘Knuth-Morris-Pratt’
> function in the python cookbook. Then the positions in that list will
> obviously be the same as those in the original.
> 
> Next is the problem of the interval between the individual groups of
> the sequence. Using something along the lines of your ‘make_diffs’
> it should be a simple matter to see by what interval the groups in the
> sequence are transposed.

Hold on! Adding the index position and transposition offset is a small change to 
the existing code:

# Good sequences
s1 = ['a', 'b', 'c', 'b', 'c', 'd', 'c', 'd', 'e']
s2 = ['a', 'c', 'f', 'c', 'e', 'h', 'e', 'g', 'c']
s3 = ['c', 'a', 'a', 'f', 'f', 'd', 'd', 'b']

# Bad sequences
b1 = ['a', 'b', 'c', 'b', 'c', 'd', 'c', 'd', 'e', 'f', 'a', 'd']
b2 = ['a', 'c', 'f', 'a', 'f', 'e', 'e', 'g', 'c']


def make_groups(sequence, n, anywhere = False):
     """yields tuples of sub-sequences of length n, and their position
         from sequence.

     bool: anywhere
       False: look only at positions n*i for i = 1,2,3...
       True: return every (overlapping sequence) of length n

     Usage:
     >>> list(make_groups(s1,3))
     [(['a', 'b', 'c'], (0, 3)), (['b', 'c', 'd'], (3, 6)), (['c', 'd', 'e'], 
(6, 9))]
     """
     length = len(sequence)
     if (length % n) and not anywhere:
         raise ValueError, "Sequence length %s is not multiple of %s"\
                             % (length, n)
     for i in xrange(0, length-n+1, anywhere or n):
         yield sequence[i:i+n], (i, i+n)

def ord_note(note, note_base = "a"):
     """Return the ordinal value of a note, or the interval between two notes"""
     return (ord(note)-ord(note_base)) % 7

def make_diffs(groups):
     """return groups in canonical form where the first note
     in each sequence is 0, and each subsequent note is expressed as
     a positive interval from the first.
     Example:
      >>> list(make_diffs(make_groups(s1,3)))
     [((0, 1, 2), (0, 3)), ((0, 1, 2), (3, 6)), ((0, 1, 2), (6, 9))]
     """
     for group, index in groups:
         seq, base_note = [0], group[0]
         for note in group[1:]:
             interval = ord_note(note, base_note)
             seq.append(interval)
         yield tuple(seq), index, base_note

def group_forms(sequence, anywhere = False, max_length = 4):
     """group sequences of similar form

     yields {form_tuple: [index tuples where form occurs]}
     starting with longer forms
     """

     for length in range(max_length, 1, -1):
         forms = {}
         try:
             for group, index, base_note in make_diffs(make_groups(sequence, 
length, anywhere)):
                 forms.setdefault(group,[]).append((index,base_note))
             yield forms
         except ValueError:
             # raised if the sequence is not a multiple of length
             pass

def search(sequence, max_length = 4, min_repeats = 3):
     """print all sequences that occur least min_repeats times"""
     found = 0
     print "Finding forms length 2-%s, occuring at minimum %s times" \
             % (max_length, min_repeats)
     for form_dict in group_forms(sequence, anywhere = True):
         for form, data in form_dict.iteritems():
             if len(data) >= min_repeats:
                 print "%s:%s" % (form, ["%s:%s" % (index, ord_note(offset))
                         for index, offset in data])
                 found +=1
     print "Found: %s" % found

  >>> search(s1)
  Finding forms length 2-4, occuring at minimum 3 times
  (0, 1, 2):['(0, 3):0', '(3, 6):1', '(6, 9):2']
  (0, 1):['(0, 2):0', '(1, 3):1', '(3, 5):1', '(4, 6):2', '(6, 8):2', '(7, 9):3']
  Found: 2


Given the increasingly complex tuples between the different stages it would 
probably be worth defining a class to represent a form_match object

Michael




More information about the Python-list mailing list