How do I get to *all* of the groups of an re search?

John Machin sjmachin at lexicon.net
Fri Jan 10 18:47:09 EST 2003


Kyler Laird <Kyler at news.Lairds.org> wrote in message news:<sl51f-6dj.ln1 at news.lairds.org>...
> http://www.python.org/doc/current/lib/re-syntax.html
> 	(...)
> 	    Matches whatever regular expression is inside the
> 	    parentheses, and indicates the start and end of a 
> 	    group; the contents of a group can be retrieved
> 	    after a match has been performed, [...]
> 
> Sounds good, so I tried it.
> 
> 	import re
> 
> 	text = 'foo foo1 foo2 bar bar1 bar2 bar3'
> 
> 	test_re = re.compile('([a-z]+)( \\1[0-9]+)+')
> 
> 	print test_re.findall(text)
> 
> I expected the matches to be something like
> 	[('foo', [' foo1', ' foo2']), ('bar', [' bar1', ' bar2', ' bar3'])]
> but it's just this.
> 	[('foo', ' foo2'), ('bar', ' bar3')]
> 
> How do I get to the other groups that were matched?  (Is this
> an FAQ?  I don't know where to start looking.)

As you have found, you get only the last of a bunch of repeats, not a
list of all of them. The following may help, but I suspect that to do
what you want, you will have to explicitly code a loop or two to munch
through your input string. You may want to look at parsing engines
like SPARK or the one in the mxText package.

>>> import re
>>> test_re = re.compile('([a-z]+)((?: \\1[0-9]+)+)')
>>> text = 'foo foo1 foo2 bar bar1 bar2 bar3'
>>> test_re.findall(text)
[('foo', ' foo1 foo2'), ('bar', ' bar1 bar2 bar3')]




More information about the Python-list mailing list