Regular expression (re) anomaly with group names and alternatives

Alexander, Bob BobAlex at uppercase.xerox.com
Thu Jul 8 17:59:08 EDT 1999


It would be nice (IMO) if group names could be duplicated in different
alternatives (sections separated by the | operator) of a regular expression,
with the name taking on the value in the alternative that matches (if any).
However, that doesn't work -- is that a bug or by design?

In the following Python session, the pattern is such that the name A can be
assigned in either of the two alternitives. Since only one alternative can
succeed, it seems that A should have the value of whichever alternative
succeeded. In this example, the first alternative succeeds, which should
assign 'a' to A. However, the presence of A in the second alternative, which
didn't even take part in this match, causes A to be set to None.

>>> import re
>>> m = re.match('(?P<A>.)(?P<B>.)|(?P<A>x)', 'ab')
>>> m
<re.MatchObject instance at 7fe110>
>>> m.groupdict()
{'B': 'b', 'A': None}

My vote: it's a bug. This  would be a useful behavior in some cases, such as
this one where I discovered this anomaly:

	r'("(?P<reference>[^"\n]*)(?P<completed>"?)|(?P<reference>.*))'

After the match succeeds, I could simply use group('reference'), rather than
write some additional code to analyze the situation.

Bob




More information about the Python-list mailing list