Sequences in list
Michael Spencer
mahs at telcopartners.com
Wed May 4 18:00:49 EDT 2005
temp at mclift.fsnet.co.uk wrote:
> Your code will identify sequences in a list, but how to index them? I
> have an idea, which seems ridiculously long-winded, but should work.
> First, put the groups from the ‘make_diffs’ function into a list
> and do a search for the sequence identified from the
> ‘compare_forms’ function using the ‘Knuth-Morris-Pratt’
> function in the python cookbook. Then the positions in that list will
> obviously be the same as those in the original.
>
> Next is the problem of the interval between the individual groups of
> the sequence. Using something along the lines of your ‘make_diffs’
> it should be a simple matter to see by what interval the groups in the
> sequence are transposed.
Hold on! Adding the index position and transposition offset is a small change to
the existing code:
# Good sequences
s1 = ['a', 'b', 'c', 'b', 'c', 'd', 'c', 'd', 'e']
s2 = ['a', 'c', 'f', 'c', 'e', 'h', 'e', 'g', 'c']
s3 = ['c', 'a', 'a', 'f', 'f', 'd', 'd', 'b']
# Bad sequences
b1 = ['a', 'b', 'c', 'b', 'c', 'd', 'c', 'd', 'e', 'f', 'a', 'd']
b2 = ['a', 'c', 'f', 'a', 'f', 'e', 'e', 'g', 'c']
def make_groups(sequence, n, anywhere = False):
"""yields tuples of sub-sequences of length n, and their position
from sequence.
bool: anywhere
False: look only at positions n*i for i = 1,2,3...
True: return every (overlapping sequence) of length n
Usage:
>>> list(make_groups(s1,3))
[(['a', 'b', 'c'], (0, 3)), (['b', 'c', 'd'], (3, 6)), (['c', 'd', 'e'],
(6, 9))]
"""
length = len(sequence)
if (length % n) and not anywhere:
raise ValueError, "Sequence length %s is not multiple of %s"\
% (length, n)
for i in xrange(0, length-n+1, anywhere or n):
yield sequence[i:i+n], (i, i+n)
def ord_note(note, note_base = "a"):
"""Return the ordinal value of a note, or the interval between two notes"""
return (ord(note)-ord(note_base)) % 7
def make_diffs(groups):
"""return groups in canonical form where the first note
in each sequence is 0, and each subsequent note is expressed as
a positive interval from the first.
Example:
>>> list(make_diffs(make_groups(s1,3)))
[((0, 1, 2), (0, 3)), ((0, 1, 2), (3, 6)), ((0, 1, 2), (6, 9))]
"""
for group, index in groups:
seq, base_note = [0], group[0]
for note in group[1:]:
interval = ord_note(note, base_note)
seq.append(interval)
yield tuple(seq), index, base_note
def group_forms(sequence, anywhere = False, max_length = 4):
"""group sequences of similar form
yields {form_tuple: [index tuples where form occurs]}
starting with longer forms
"""
for length in range(max_length, 1, -1):
forms = {}
try:
for group, index, base_note in make_diffs(make_groups(sequence,
length, anywhere)):
forms.setdefault(group,[]).append((index,base_note))
yield forms
except ValueError:
# raised if the sequence is not a multiple of length
pass
def search(sequence, max_length = 4, min_repeats = 3):
"""print all sequences that occur least min_repeats times"""
found = 0
print "Finding forms length 2-%s, occuring at minimum %s times" \
% (max_length, min_repeats)
for form_dict in group_forms(sequence, anywhere = True):
for form, data in form_dict.iteritems():
if len(data) >= min_repeats:
print "%s:%s" % (form, ["%s:%s" % (index, ord_note(offset))
for index, offset in data])
found +=1
print "Found: %s" % found
>>> search(s1)
Finding forms length 2-4, occuring at minimum 3 times
(0, 1, 2):['(0, 3):0', '(3, 6):1', '(6, 9):2']
(0, 1):['(0, 2):0', '(1, 3):1', '(3, 5):1', '(4, 6):2', '(6, 8):2', '(7, 9):3']
Found: 2
Given the increasingly complex tuples between the different stages it would
probably be worth defining a class to represent a form_match object
Michael
More information about the Python-list
mailing list