Splitting a sequence into pieces with identical elements

Chris Rebert clp2 at rebertia.com
Tue Aug 10 21:11:00 EDT 2010


On Tue, Aug 10, 2010 at 5:37 PM, candide <candide at free.invalid> wrote:
> Suppose you have a sequence s , a string  for say, for instance this one :
>
> spppammmmegggssss
>
> We want to split s into the following parts :
>
> ['s', 'ppp', 'a', 'mmmm', 'e', 'ggg', 'ssss']
>
> ie each part is a single repeated character word.
>
> What is the pythonic way to answer this question?

If you're doing an operation on an iterable, always leaf thru itertools first:
http://docs.python.org/library/itertools.html

from itertools import groupby
def split_into_runs(seq):
    return ["".join(run) for letter, run in groupby(seq)]


If itertools didn't exist:

def split_into_runs(seq):
    if not seq: return []

    iterator = iter(seq)
    letter = next(iterator)
    count = 1
    words = []
    for c in iterator:
        if c == letter:
            count += 1
        else:
            word = letter * count
            words.append(word)
            letter = c
            count = 1
    words.append(letter*count)
    return words

Cheers,
Chris
--
http://blog.rebertia.com



More information about the Python-list mailing list