I don't understand this regex.groups() behaviour

Michael Chermside mcherm at mcherm.com
Fri Jun 20 13:53:08 EDT 2003


Grzegorz writes:
> > >>> import re
> > >>> c = '"A";"AA";"AAA";"AAAA";"AAAAA"'
> > >>> re.findall(r'("A+?";?)', c)
> > ['"A";', '"AA";', '"AAA";', '"AAAA";', '"AAAAA"']
> > 
> > I have simplified your regular expression somewhat so that I no
> > longer require the ; between all fields, but I expect that after
> > you realize that findall is what you want, you'll be able to
> > straighten out the details fairly easily.
> 
> And hopefully the following will help you realize that by altering
> the requisite specification you aren't helping at all:
> 
> >>> c = '"A";"AA";"BB";"AA";"AAAAA"'
> >>> re.findall(r'("A+?";?)', c)
> ['"A";', '"AA";', '"AA";', '"AAAAA"']
> 
> That is a false positive, because findall gets all the matches of a
> pattern, when I specified that I want to extract a pattern composed
> of two parts: the first is repeated multiple times (an unknown
> amount) and has to start at the beginning (hence the .match use),
> the second part has to be exactly at the end:

I'm sorry, I didn't realize that this was your actual requirement, I
thought it was a "toy problem" to demonstrate the issue. In that case,
all you need to do is add one line to your code, as follows:

>>> c = '"A";"AA";"BB";"AA";"AAAAA"'
>>> x = re.findall(r'("A+?";?)', c)
>>> x = [x[:-1],  x[-1]]
>>> x
[['"A";', '"AA";', '"AA";'], '"AAAAA"']

Happy-to-help,

-- Michael Chermside





More information about the Python-list mailing list