Sequences in list

Michael Spencer mahs at telcopartners.com
Mon May 2 17:18:01 EDT 2005


temp at mclift.fsnet.co.uk wrote:
> Hi All,
> 
> I wonder could someone help me with this?
> 
> What I want to do is search through a list of letters and look for
> adjacent groups of letters that form sequences, not in the usual way of
> matching say abc to another appearance later on in the list but to look
> for transposed patterns.
> 
> The groups of letters can be made up of between 2 to 4 letters.
> It is not know beforehand how the groups are formed, so I can't say
> here is a group, look for a sequence formed from this.
> There must be at least three appearances of the groups for the sequence
> to be pass.
> More than one sequence may show in a list, i.e. first abc, bcd and cde
> followed by bd, df, fa, ac at some later point.
> Ideally I would like to know the start and end position of the
> sequence/sequences.
> 
> I'm looking for patterns such as these:
> 
> ['a', 'b', 'c', 'b', 'c', 'd', 'c', 'd', 'e']
> ['a', 'c', 'f', 'c', 'e', 'h', 'e', 'g', 'c']
> ['c', 'a', 'a', 'f', 'f', 'd', 'd', 'b']
> 
> But these would be wrong:
> 
> ['a', 'b', 'c', 'b', 'c', 'd', 'c', 'd', 'e', 'f', 'a', 'd'] - This one
> fails because of the 'jump' of the last group of three letters in
> relation to the others. To pass the last group would need to have been
> def. However, the previous three groups are ok and would be passed as a
> sequence.
> 
> ['a', 'c', 'f', 'a', 'f', 'e', 'e', 'g', 'c'] - Here, although the
> first and last groups have a similar sequence this would fail because
> of the intervening group (afe) not following the pattern.
> 
> 
> 
> Thanks for any help,
> 
> Malcolm
> 
I'm not sure I understand exactly what you want to do, but some of the following 
may help

# Good sequences
s1 = ['a', 'b', 'c', 'b', 'c', 'd', 'c', 'd', 'e']
s2 = ['a', 'c', 'f', 'c', 'e', 'h', 'e', 'g', 'c']
s3 = ['c', 'a', 'a', 'f', 'f', 'd', 'd', 'b']

# Bad sequences
b1 = ['a', 'b', 'c', 'b', 'c', 'd', 'c', 'd', 'e', 'f', 'a', 'd']
b2 = ['a', 'c', 'f', 'a', 'f', 'e', 'e', 'g', 'c']


def make_groups(sequence, n):
     """return groups of length n from sequence
     Usage:
      >>> list(make_groups(s1,3))
     [['a', 'b', 'c'], ['b', 'c', 'd'], ['c', 'd', 'e']]
     """
     length = len(sequence)
     if length % n:
         raise ValueError, "Sequence length %s is not multiple of %s"\
                             % (length, n)
     for i in xrange(0, length, n):
         yield sequence[i:i+n]

def make_diffs(groups):
     """return groups in canonical form where the first note
     in each sequence is 0, and each subsequent note is expressed as
     a positive interval from the first.
     Example:
      >>> list(make_diffs([['a', 'b', 'c'], ['b', 'c', 'd'], ['c', 'd', 'e']]))
     [(0, 1, 2), (0, 1, 2), (0, 1, 2)]
     """
     for group in groups:
         base, basechr = [0], ord(group[0])
         for char in group[1:]:
             diff = ord(char) - basechr
             # although not specified it seems that you want
             # to compare diffs mod 7 i.e. -2 == +5
             base.append(diff % 7)
         yield tuple(base)

def compare_forms(sequence):
     """test whether all groups of a sequence have the same form"""
     for length in range(4,2,-1): # look for longest match first
         forms = {}
         try:
             for group in make_diffs(make_groups(sequence, length)):
                 forms[group] = forms.get(group,0)+1
             print forms
         except ValueError:
             # raised if the sequence is not a multiple of length
             pass

  >>> compare_forms(s1)
  {(0, 1, 2): 3}  # found three instances of (0,1,2)
  >>> compare_forms(s2)
  {(0, 2, 5): 3} # found three instances
  >>> compare_forms(s3)
  {(0, 5, 5, 3): 2} # found two instances
  >>> compare_forms(b1)
  {(0, 1, 3, 6): 1, (0, 1, 2, 1): 1, (0, 1, 0, 1): 1} # found 3 4-tuples
  {(0, 1, 2): 3, (0, 2, 5): 1} # found 3 instances of (0,1,2) and 1 other
  >>> compare_forms(b2)
  {(0, 5, 4): 1, (0, 2, 5): 2}
  >>>

HTH
Michael




More information about the Python-list mailing list