Bug in regular expressions ?

David LeBlanc whisper at oz.net
Fri May 17 15:55:51 EDT 2002


RE's can be either greedy or non-greedy. It depends on the implementation,
and even greedy implementations can default to non-greedy depending on how
the RE pattern is coded. You may find the Regex Howto at
http://py-howto.sourceforge.net/regex/regex.html of interest, particularly
the section on "Greedy vs. Non-Greedy", section 6.3.

David LeBlanc
Seattle, WA USA

> -----Original Message-----
> From: python-list-admin at python.org
> [mailto:python-list-admin at python.org]On Behalf Of Christophe Delord
> Sent: Friday, May 17, 2002 8:55
> To: python-list at python.org
> Subject: Bug in regular expressions ?
>
>
>
> Hi,
>
> I thought that regular expressions were greedy, so that the
> longuest match is returned by match().
> Consider these expressions : 'a|aa', 'aa|a' and 'aa?'
> These expressions may match 'a' and 'aa' and should be equivalent.
> When applied on 'aa', match only sees the first 'a' when using
> the first regular expression ('a|aa').
>
> >>> import re
> >>> p=re.compile('a|aa')
> >>> p.match('aa').span()
> (0, 1)                           <- 'aa' (2 chars) should have be
> matched ???
> >>> p=re.compile('aa|a')
> >>> p.match('aa').span()
> (0, 2)                           <- ok, two characters have been matched
> >>> p=re.compile('aa?')
> >>> p.match('aa').span()
> (0, 2)                           <- ok
> >>>
>
> So A|B and B|A are not always equivalent. When A and B match, B
> is ignored even if the matched text is longer.
> Is this a bug in the re module?
> Is there a way to tell re to be "totaly greedy"?
>
> Thanks,
>
> --
> Christophe Delord
> http://christophe.delord.free.fr/
> --
> http://mail.python.org/mailman/listinfo/python-list






More information about the Python-list mailing list