How do I get to *all* of the groups of an re search?
John Machin
sjmachin at lexicon.net
Fri Jan 10 18:47:09 EST 2003
Kyler Laird <Kyler at news.Lairds.org> wrote in message news:<sl51f-6dj.ln1 at news.lairds.org>...
> http://www.python.org/doc/current/lib/re-syntax.html
> (...)
> Matches whatever regular expression is inside the
> parentheses, and indicates the start and end of a
> group; the contents of a group can be retrieved
> after a match has been performed, [...]
>
> Sounds good, so I tried it.
>
> import re
>
> text = 'foo foo1 foo2 bar bar1 bar2 bar3'
>
> test_re = re.compile('([a-z]+)( \\1[0-9]+)+')
>
> print test_re.findall(text)
>
> I expected the matches to be something like
> [('foo', [' foo1', ' foo2']), ('bar', [' bar1', ' bar2', ' bar3'])]
> but it's just this.
> [('foo', ' foo2'), ('bar', ' bar3')]
>
> How do I get to the other groups that were matched? (Is this
> an FAQ? I don't know where to start looking.)
As you have found, you get only the last of a bunch of repeats, not a
list of all of them. The following may help, but I suspect that to do
what you want, you will have to explicitly code a loop or two to munch
through your input string. You may want to look at parsing engines
like SPARK or the one in the mxText package.
>>> import re
>>> test_re = re.compile('([a-z]+)((?: \\1[0-9]+)+)')
>>> text = 'foo foo1 foo2 bar bar1 bar2 bar3'
>>> test_re.findall(text)
[('foo', ' foo1 foo2'), ('bar', ' bar1 bar2 bar3')]
More information about the Python-list
mailing list