Sequences in list
Michael Spencer
mahs at telcopartners.com
Mon May 2 17:18:01 EDT 2005
temp at mclift.fsnet.co.uk wrote:
> Hi All,
>
> I wonder could someone help me with this?
>
> What I want to do is search through a list of letters and look for
> adjacent groups of letters that form sequences, not in the usual way of
> matching say abc to another appearance later on in the list but to look
> for transposed patterns.
>
> The groups of letters can be made up of between 2 to 4 letters.
> It is not know beforehand how the groups are formed, so I can't say
> here is a group, look for a sequence formed from this.
> There must be at least three appearances of the groups for the sequence
> to be pass.
> More than one sequence may show in a list, i.e. first abc, bcd and cde
> followed by bd, df, fa, ac at some later point.
> Ideally I would like to know the start and end position of the
> sequence/sequences.
>
> I'm looking for patterns such as these:
>
> ['a', 'b', 'c', 'b', 'c', 'd', 'c', 'd', 'e']
> ['a', 'c', 'f', 'c', 'e', 'h', 'e', 'g', 'c']
> ['c', 'a', 'a', 'f', 'f', 'd', 'd', 'b']
>
> But these would be wrong:
>
> ['a', 'b', 'c', 'b', 'c', 'd', 'c', 'd', 'e', 'f', 'a', 'd'] - This one
> fails because of the 'jump' of the last group of three letters in
> relation to the others. To pass the last group would need to have been
> def. However, the previous three groups are ok and would be passed as a
> sequence.
>
> ['a', 'c', 'f', 'a', 'f', 'e', 'e', 'g', 'c'] - Here, although the
> first and last groups have a similar sequence this would fail because
> of the intervening group (afe) not following the pattern.
>
>
>
> Thanks for any help,
>
> Malcolm
>
I'm not sure I understand exactly what you want to do, but some of the following
may help
# Good sequences
s1 = ['a', 'b', 'c', 'b', 'c', 'd', 'c', 'd', 'e']
s2 = ['a', 'c', 'f', 'c', 'e', 'h', 'e', 'g', 'c']
s3 = ['c', 'a', 'a', 'f', 'f', 'd', 'd', 'b']
# Bad sequences
b1 = ['a', 'b', 'c', 'b', 'c', 'd', 'c', 'd', 'e', 'f', 'a', 'd']
b2 = ['a', 'c', 'f', 'a', 'f', 'e', 'e', 'g', 'c']
def make_groups(sequence, n):
"""return groups of length n from sequence
Usage:
>>> list(make_groups(s1,3))
[['a', 'b', 'c'], ['b', 'c', 'd'], ['c', 'd', 'e']]
"""
length = len(sequence)
if length % n:
raise ValueError, "Sequence length %s is not multiple of %s"\
% (length, n)
for i in xrange(0, length, n):
yield sequence[i:i+n]
def make_diffs(groups):
"""return groups in canonical form where the first note
in each sequence is 0, and each subsequent note is expressed as
a positive interval from the first.
Example:
>>> list(make_diffs([['a', 'b', 'c'], ['b', 'c', 'd'], ['c', 'd', 'e']]))
[(0, 1, 2), (0, 1, 2), (0, 1, 2)]
"""
for group in groups:
base, basechr = [0], ord(group[0])
for char in group[1:]:
diff = ord(char) - basechr
# although not specified it seems that you want
# to compare diffs mod 7 i.e. -2 == +5
base.append(diff % 7)
yield tuple(base)
def compare_forms(sequence):
"""test whether all groups of a sequence have the same form"""
for length in range(4,2,-1): # look for longest match first
forms = {}
try:
for group in make_diffs(make_groups(sequence, length)):
forms[group] = forms.get(group,0)+1
print forms
except ValueError:
# raised if the sequence is not a multiple of length
pass
>>> compare_forms(s1)
{(0, 1, 2): 3} # found three instances of (0,1,2)
>>> compare_forms(s2)
{(0, 2, 5): 3} # found three instances
>>> compare_forms(s3)
{(0, 5, 5, 3): 2} # found two instances
>>> compare_forms(b1)
{(0, 1, 3, 6): 1, (0, 1, 2, 1): 1, (0, 1, 0, 1): 1} # found 3 4-tuples
{(0, 1, 2): 3, (0, 2, 5): 1} # found 3 instances of (0,1,2) and 1 other
>>> compare_forms(b2)
{(0, 5, 4): 1, (0, 2, 5): 2}
>>>
HTH
Michael
More information about the Python-list
mailing list