question about grouping in RE

Tim Peters tim.one at home.com
Thu Jan 4 23:04:08 EST 2001


[Shen Wang]
> Currently in RE (A)|(B)|(C) create 3 groups, I wonder if there is a
> way to make it generate only 1 group.

Sure:

    (A|B|C)

It only creates groups where *you* put in parentheses!  Since the precedence
of "|" is very low, you usually don't need parentheses just to alter
precedence between "|" branches.  If you do need parentheses to alter
precedence, but do not want to create a group, use the non-grouping
parentheses construct (?:pattern).  For example,

    ((?:A)|(?:B)|(?:C))

matches the same strings but creates only one group.

> Which is more meaningful in my opinion.

As above, you can have it either way, so we don't have to decide whose
opinion is better <wink>.

> But things are different for named group, in pattern like
> (?P<name1>(sub1)(sub2))|(?P<name2>(pat1)), I did want (sub1) and
> (pat1) refer to group 2, (sub2) refer to group 3, but I don't want
> group 'name1' and 'name2' all refer to group 1, because then I
> have no way to tell which part matchs. maybe we can change named
> group's number to negative?
>
> That's all. I don't know if any guru can patch RE to add new flag
> like "merge group name".

Heh -- sorry, but I have no idea what you just said.  If it turns out that
nobody else does either, it might be helpful to post a specific, concrete,
executable example, spelling out exactly what you're trying to accomplish.

Note that there's an undocumented (because still experimental -- it may go
away) feature in Python 2.0's new re implementation:

>>> import re
>>> m = re.search("(A)|(B)|(C)", "abcABC")
>>> m.lastindex
1
>>> m = re.search("(A)|(B)|(C)", "abcCBA")
>>> m.lastindex
3
>>>

That is, a match object has a new attribute "lastindex" that gives the
number of the "last group" that matched (but m.lastindex is None if no group
matched).  There's also a "lastgroup" attribute that gets created if the
last group that matched was a named group.  That may or may not be along the
lines of what you're looking for.  Fredrik is experimenting with this
because, when writing a regexp-based scanner in Python, you need to which
group matched (is the next token an integer?  a string?  an xyz?), and
that's clumsy and slow without something like lastindex/lastgroup.  There's
also an undocumented Scanner class using this feature in 2.0's sre.py, if
that interests you.

simple-regexps-for-simple-tasks-makes-for-a-simple-life-ly y'rs  - tim





More information about the Python-list mailing list