Extracting subsequences composed of the same character

Roy Smith roy at panix.com
Thu Mar 31 21:40:38 EDT 2011


In article <4d952008$0$3943$426a74cc at news.free.fr>,
 candide <candide at free.invalid> wrote:

> Suppose you have a string, for instance
> 
> "pyyythhooonnn ---> ++++"
> 
> and you search for the subquences composed of the same character, here 
> you get :
> 
> 'yyy', 'hh', 'ooo', 'nnn', '---', '++++'

I got the following. It's O(n) (with the minor exception that the string 
addition isn't, but that's trivial to fix, and in practice, the bunches 
are short enough it hardly matters).

#!/usr/bin/env python                                                                               

s = "pyyythhooonnn ---> ++++"
answer = ['yyy', 'hh', 'ooo', 'nnn', '---', '++++']

last = None
bunches = []
bunch = ''
for c in s:
    if c == last:
        bunch += c
    else:
        if bunch:
            bunches.append(bunch)
        bunch = c
        last = c
bunches.append(bunch)

multiples = [bunch for bunch in bunches if len(bunch) > 1]
print multiples
assert(multiples == answer)


[eagerly awaiting a PEP for collections.bunch and 
collections.frozenbunch]



More information about the Python-list mailing list