Bug in regular expressions ?

David LeBlanc whisper at oz.net
Fri May 17 16:03:15 EDT 2002


Oops - looking at your example more carefully, I was mistaken about what you
asked - sorry.

What you're seeing is correct behavior. 'a|aa' will match on the first
alternation when it sees 'aa' since it works with the first pattern first
and that matches. Greediness comes into play only within a pattern and not
across multiple patterns. 'a.*b' should match on the last 'b' of
'aaaaaaaabaaab' and not the first.

David LeBlanc
Seattle, WA USA

> -----Original Message-----
> From: python-list-admin at python.org
> [mailto:python-list-admin at python.org]On Behalf Of Christophe Delord
> Sent: Friday, May 17, 2002 8:55
> To: python-list at python.org
> Subject: Bug in regular expressions ?
>
>
>
> Hi,
>
> I thought that regular expressions were greedy, so that the
> longuest match is returned by match().
> Consider these expressions : 'a|aa', 'aa|a' and 'aa?'
> These expressions may match 'a' and 'aa' and should be equivalent.
> When applied on 'aa', match only sees the first 'a' when using
> the first regular expression ('a|aa').
>
> >>> import re
> >>> p=re.compile('a|aa')
> >>> p.match('aa').span()
> (0, 1)                           <- 'aa' (2 chars) should have be
> matched ???
> >>> p=re.compile('aa|a')
> >>> p.match('aa').span()
> (0, 2)                           <- ok, two characters have been matched
> >>> p=re.compile('aa?')
> >>> p.match('aa').span()
> (0, 2)                           <- ok
> >>>
>
> So A|B and B|A are not always equivalent. When A and B match, B
> is ignored even if the matched text is longer.
> Is this a bug in the re module?
> Is there a way to tell re to be "totaly greedy"?
>
> Thanks,
>
> --
> Christophe Delord
> http://christophe.delord.free.fr/
> --
> http://mail.python.org/mailman/listinfo/python-list






More information about the Python-list mailing list