I don't understand this regex.groups() behaviour

Grzegorz Adam Hankiewicz gradha at titanium.sabren.com
Thu Jun 19 16:59:31 EDT 2003


On 2003-06-16, Michael Chermside <mcherm at mcherm.com> wrote:
> But from the looks of your example, what you REALLY want is to take
> a regular expression and find all the places where it matches. The
> function you want is called "findall", and the syntax for reads
> like this:

Thanks for trying to read my mind, but no, findall is not for me.

> >>> import re
> >>> c = '"A";"AA";"AAA";"AAAA";"AAAAA"'
> >>> re.findall(r'("A+?";?)', c)
> ['"A";', '"AA";', '"AAA";', '"AAAA";', '"AAAAA"']
> 
> I have simplified your regular expression somewhat so that I no
> longer require the ; between all fields, but I expect that after
> you realize that findall is what you want, you'll be able to
> straighten out the details fairly easily.

And hopefully the following will help you realize that by altering
the requisite specification you aren't helping at all:

>>> c = '"A";"AA";"BB";"AA";"AAAAA"'
>>> re.findall(r'("A+?";?)', c)
['"A";', '"AA";', '"AA";', '"AAAAA"']

That is a false positive, because findall gets all the matches of a
pattern, when I specified that I want to extract a pattern composed
of two parts: the first is repeated multiple times (an unknown
amount) and has to start at the beginning (hence the .match use),
the second part has to be exactly at the end:

>>> re.match(r'("A+?";)+("A+?"$)', c).groups()
AttributeError: 'NoneType' object has no attribute 'groups'

This last result is correct, because the given string doesn't
match the regular expression. However, looks like either by bug or
limitation of the re engine, it's impossible to retrieve repeated
groups in a regular expression.

Please prove this last point wrong, instead of working around the
specification. Of course I accept suggestions to use another regex
if and only if it matches the same cases the one I try to use.

PD: Just if you care, .findall() is as useless as .groups(), it
doesn't retrieve all the repeated groups which were matched.

>>> re.findall(r'("A+?";)+("A+?"$)', c)
[('"AAAA";', '"AAAAA"')]

-- 
 Please don't send me private copies of your public answers. Thanks.





More information about the Python-list mailing list