[Tutor] re module / separator

Thu Jun 25 14:44:29 CEST 2009

Thanks Kent! Once more you go straight to the point!

Kent Johnson <kent37 at tds.net> writes:
> On Wed, Jun 24, 2009 at 2:24 PM, Tiago Saboga<tiagosaboga at gmail.com> wrote:
>> In [33]: re.search("(a[^.]*?b\.\s?){2}", text).group(0)
>> Out[33]: 'a45453b. a325643b. '
>
> group(0) is the entire match so this returns what you expect. But what
> is group(1)?
>
> In [6]: re.search("(a[^.]*?b\.\s?){2}", text).group(1)
> Out[6]: 'a325643b. '
>
> Repeated groups are tricky; the returned value contains only the first
> match for the group, not the repeats.

The problem was exactly that. I had seen that findall got the first
group of the match, but not that this would not span repeats. But it
makes sense, as the repeat count is after the parens. 

> If you change the inner parentheses to be non-grouping then you get
> pretty much what you want:
>
> In [8]: re.findall("((?:a[^.]*?b\.\s?)+)", text)
> Out[8]: ['a2345b. ', 'a45453b. a325643b. a435643b. ']

And the trick of the non-grouping parens is great too. Thanks again!

Tiago.