re module non-greedy matches broken

lothar lothar at ultimathule.nul
Sun Apr 3 11:47:05 EDT 2005


re:
4.2.1 Regular Expression Syntax
http://docs.python.org/lib/re-syntax.html

  *?, +?, ??
  Adding "?" after the qualifier makes it perform the match in non-greedy or
minimal fashion; as few characters as possible will be matched.

the regular expression module fails to perform non-greedy matches as
described in the documentation: more than "as few characters as possible"
are matched.

this is a bug and it needs to be fixed.

examples follow.

lothar at erda /ntd/vl
$ cat vwre.py
#! /usr/bin/env python

import re

vwre = re.compile("V.*?W")
vwlre = re.compile("V.*?WL")

if __name__ == "__main__":

  newdoc = "V1WVVV2WWW"
  vwli = re.findall(vwre, newdoc)
  print "vwli[], expect", ['V1W', 'V2W']
  print "vwli[], return", vwli

  newdoc = "V1WLV2WV3WV4WLV5WV6WL"
  vwlli = re.findall(vwlre, newdoc)
  print "vwlli[], expect", ['V1WL', 'V4WL', 'V6WL']
  print "vwlli[], return", vwlli

lothar at erda /ntd/vl
$ python vwre.py
vwli[], expect ['V1W', 'V2W']
vwli[], return ['V1W', 'VVV2W']
vwlli[], expect ['V1WL', 'V4WL', 'V6WL']
vwlli[], return ['V1WL', 'V2WV3WV4WL', 'V5WV6WL']

lothar at erda /ntd/vl
$ python -V
Python 2.3.3





More information about the Python-list mailing list