Regular Expression Grouping

Paul McGuire ptmcg at austin.rr.com
Sun Aug 12 16:45:55 EDT 2007


On Aug 12, 12:21 pm, linnew... at gmail.com wrote:
>
> I cannot understand why 'c' constitutes a group here without being
> surrounded by "(" ,")" ?
>
> >>>import re
> >>> m = re.match("([abc])+", "abc")
> >>> m.groups()
>
> ('c',)
>

It sounds from the other replies that this is just the way re's work -
if a group is represented multiple times in the matched text, only the
last matching text is returned for that group.

This sounds similar to a behavior in pyparsing, in using a results
name for the parsed results.  Here is an annotated session using
pyparsing to extract this data.  The explicit OneOrMore and Group
classes and oneOf method give you a little more control over the
collection and structure of the results.

-- Paul

Setup to use pyparsing, and define input string.
>>> from pyparsing import *
>>> data = "abc"

Use a simple pyparsing expression - matches and returns each separate
character.  Each inner match can be returned as element [0], [1], or
[2] of the parsed results.
>>> print OneOrMore( oneOf("a b c") ).parseString(data)
['a', 'b', 'c']

Add use of Group - each single-character match is wrapped in a
subgroup.
>>> print OneOrMore( Group(oneOf("a b c")) ).parseString(data)
[['a'], ['b'], ['c']]

Instead of Group, set a results name on the entire pattern.
>>> pattern = OneOrMore( oneOf("a b c") ).setResultsName("char")
>>> print pattern.parseString(data)['char']
['a', 'b', 'c']

Set results name on the inner expression - this behavior seems most
like the regular expression behavior described in the original post.
>>> pattern = OneOrMore( oneOf("a b c").setResultsName("char") )
>>> print pattern.parseString(data)['char']
c

Adjust results name to retain all of the matched characters for the
given results name.
>>> pattern = OneOrMore( oneOf("a b c").setResultsName("char",listAllMatches=True) )
>>> print pattern.parseString(data)['char']
['a', 'b', 'c']




More information about the Python-list mailing list