Split a string based on change of character

attn.steven.kuo at gmail.com attn.steven.kuo at gmail.com
Sun Jul 29 01:31:36 EDT 2007


On Jul 28, 9:46 pm, Andrew Savige <ajsav... at yahoo.com.au> wrote:
> Python beginner here.
>
> For a string 'ABBBCC', I want to produce a list ['A', 'BBB', 'CC'].
> That is, break the string into pieces based on change of character.
> What's the best way to do this in Python?
>
> Using Python 2.5.1, I tried:
>
> import re
> s = re.split(r'(?<=(.))(?!\1)', 'ABBBCC')
> for e in s: print e
>
> but was surprised when it printed:
>
> ABBBCC
>
> I expected something like:
>
> A
> A
> BBB
> B
> CC
> C
>
> (the extra fields because of the capturing parens).


Using itertools:

import itertools

s = 'ABBBCC'
print [''.join(grp) for key, grp in itertools.groupby(s)]


Using re:

import re

pat = re.compile(r'((\w)\2*)')
print [t[0] for t in re.findall(pat, s)]


By the way, your pattern seems to work in perl:

$ perl -le '$, = " "; print split(/(?<=(.))(?!\1)/, "ABBBCC");'
A A BBB B CC C

Was that the type of regular expressions you were expecting?

--
Hope this helps,
Steven





More information about the Python-list mailing list