regex question

Thomas Jollans t at jollybox.de
Fri Jul 29 11:45:10 EDT 2011


On 29/07/11 16:53, rusi wrote:
> Can someone throw some light on this anomalous behavior?
>
>>>> import re
>>>> r = re.search('a(b+)', 'ababbaaabbbbb')
>>>> r.group(1)
> 'b'
>>>> r.group(0)
> 'ab'
>>>> r.group(2)
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> IndexError: no such group
>
>>>> re.findall('a(b+)', 'ababbaaabbbbb')
> ['b', 'bb', 'bbbbb']
>
> So evidently group counts by number of '()'s and not by number of
> matches (and this is the case whether one uses match or search). So
> then whats the point of search-ing vs match-ing?
>
> Or equivalently how to move to the groups of the next match in?
>
> [Side note: The docstrings for this really suck:
>
>>>> help(r.group)
> Help on built-in function group:
>
> group(...)
>

Pretty standard regex behaviour: Group 1 is the first pair of brackets.
Group 2 is the second, etc. pp. Group 0 is the whole match.
The difference between matching and searching is that match assumes that
the start of the regex coincides with the start of the string (and this
is documented in the library docs IIRC). re.match(exp, s) is equivalent
to re.search('^'+exp, s). (if not exp.startswith('^'))

Apparently, findall() returns the content of the first group if there is
one. I didn't check this, but I assume it is documented.

 - Thomas



More information about the Python-list mailing list