re module non-greedy matches broken

lothar lothar at ultimathule.nul
Sun Apr 3 21:37:24 EDT 2005


this response is nothing but a description of the behavior i reported.

as to whether this behaviour was intended, one would have to ask the module
writer about that.
because of the statement in the documentation, which places no qualification
on how the scan for the shortest possible match is to be done, my guess is
that this problem was overlooked.

to produce a non-greedy (minimal length) match it is required that the start
of the non-greedy part of the match repeatedly be moved right with the last
match of the left-hand part of the pattern (preceding the .*?).

why would someone want a non-greedy (minimal length) match that was not
always non-greedy (minimal length)?



"André Malo" <auch-ich-m at g-kein-spam.com> wrote in message
news:20611953.SkRsez8GTE at news.perlig.de...
* lothar wrote:

> re:
> 4.2.1 Regular Expression Syntax
> http://docs.python.org/lib/re-syntax.html
>
>   *?, +?, ??
>   Adding "?" after the qualifier makes it perform the match in non-greedy
>   or
> minimal fashion; as few characters as possible will be matched.
>
> the regular expression module fails to perform non-greedy matches as
> described in the documentation: more than "as few characters as possible"
> are matched.
>
> this is a bug and it needs to be fixed.

The documentation is just incomplete. Non-greedy regexps still start
matching the leftmost. So instead the longest of the leftmost you get the
shortest of the leftmost. One may consider this as a documentation bug,
yes.

nd
--
# André Malo, <http://www.perlig.de/> #






More information about the Python-list mailing list