I don't understand this regex.groups() behaviour

Harvey Thomas hst at empolis.co.uk
Fri Jun 20 03:51:15 EDT 2003


Grzegorz Adam Hankiewicz wrote:
> Hi.
> 
> I don't understand why the last two sentences of the following
> interactive session don't return more than two groups.
> 
> Python 2.2.3 (#1, Jun  5 2003, 14:02:17)
> Type "copyright", "credits" or "license" for more information.
> 
> In [1]: import re
> 
> In [2]: c = '"A";"AA";"AAA";"AAAA";"AAAAA"'
> 
> In [3]: re.match(r'("A+?";)("A+?"$)', c)
> 
> In [4]: re.match(r'("A+?";)+("A+?"$)', c).groups()
> Out[4]: ('"AAAA";', '"AAAAA"')
> 
> In [5]: re.match(r'("A+?";){4}("A+?"$)', c).groups()
> Out[5]: ('"AAAA";', '"AAAAA"')
> 
> Could somebody please explain why multiple groups aren't returned?
> 
The short answer is that the number of groups returned is always equal to the number of pairs of capturing parentheses in the RE. Since you have two pairs of capturing parentheses, you will get two groups returned - the first group is "overwritten" three times. AFAIK this behaviour is common to RE implementation in Python, Perl, Java, PHP... Jeffrey Friedl's book "Mastering Regular Expressions" (2nd Edition) is an excellent exposition on the subject.

_____________________________________________________________________
This message has been checked for all known viruses by the MessageLabs Virus Scanning Service.





More information about the Python-list mailing list