[Tutor] Consecutive Sequence

eryksun eryksun at gmail.com
Wed Oct 17 15:43:27 CEST 2012


On Wed, Oct 17, 2012 at 8:02 AM, Dave Angel <d at davea.name> wrote:
>
> Is this a homework assignment where you might be expected to build a
> complex loop, or is it a problem you're solving where you would be
> allowed to use itertools.groupby ?

May as well post a groupby solution. I doubt it would be accepted for homework:

    >>> seq = 'atttttaattaaacagagtgagcagaaaat'

    >>> groups = (''.join(g) for _, g in groupby(seq))
    >>> [g for g in groups if len(g) > 2]
    ['ttttt', 'aaa', 'aaaa']

groupby() yields (key, _grouper) tuples for each group in an iterable.
The default key function is lambda x: x. The _grouper objects share a
common iterator, so they need to be used in the order of creation.
Typically they're used immediately as the data pipes through. In the
generator expression, each _grouper g is joined into a string. The
list comprehension keeps strings of length greater than 2.

Alternatively, you can use a regular expression:

    >>> [m.group() for m in re.finditer(r'(\w)\1{2,}', seq, re.U)]
    ['ttttt', 'aaa', 'aaaa']

\w matches an alphanumeric character or the underscore (the flag re.U
expands this to Unicode). The parentheses (round brackets) mark group
1, which is the single character matched by \w. Next, this group is
referenced with \1 with a repetition of {2,} (at least twice), for a
total of 3 or more consecutive occurrences.


More information about the Tutor mailing list