[Python-Dev] Behavior of matching backreferences

Gustavo Niemeyer niemeyer@conectiva.com
Sat, 22 Jun 2002 17:10:36 -0300


> I think the re module worked correctly.
> 
> If you write your expression without the ambiguity:

I must confess I see no ambiguity in my expression.

> yours: "^(?P<a>a)?(?P=a)$"
> re-1a: "^((?P<a>a)(?P=a))?$"

Using "aa" was just an example, of course. If I wanted to match "aa" or
"", I wouldn't use this at all.

> re-2a: "^(?P<a>a?)(?P=a)$"
> 
> your test data ebc will does not match either 'aa' or ''. Try removing
> the $ so that it will match '' at the start of the string.

Sorry, I took the wrong test to paste into the message.

> re-1b: "^((?P<a>a)(?P=a))?"
> re-2b: "^(?P<a>a?)(?P=a)"
> 
> I think the re-2b form is the way to deal with the optional quotes.
> 
> I'm not sure a patch is needed for this.

If you think about a match with more characters, you'll end up in
something like "^(?P<a>(abc)?)(?P=a)", instead of "^(?P<a>abc)?(?P=a)".
Besides having a little difference in their meanings (the first
m.group(1) is '', and the second is None), it looks like you're
workarounding an existant problem, but you may argue that this opinion
is something personal.

Thus, my main point here is that using the second regular expression will
never work as expected, and there is no point in not fixing it, if that's
possible and has already been done.

If you find an example where it *should* fail, working as it is now, I
promiss I'll shut up, and withdraw myself. :-)

-- 
Gustavo Niemeyer

[ 2AAC 7928 0FBF 0299 5EB5  60E2 2253 B29A 6664 3A0C ]