[Python-bugs-list] [ python-Bugs-448951 ] Bug in re group handling

noreply@sourceforge.net noreply@sourceforge.net
Thu, 04 Oct 2001 21:39:24 -0700


Bugs item #448951, was opened at 2001-08-07 17:19
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=448951&group_id=5470

Category: Regular Expressions
Group: Python 2.1.1
Status: Open
Resolution: None
Priority: 5
Submitted By: Nobody/Anonymous (nobody)
Assigned to: Fredrik Lundh (effbot)
Summary: Bug in re group handling

Initial Comment:
#
# read it or run it!
#
import re,sys
print sys.version
#
# Bug in 're' lib in Python 2.1
#
# Consider this regexp : (?:([0-3]):)?0#
# This will match one of
# '0#', '0:0#', '1:0#', '2:0#', '3:0#' 
#
# The matching itself works fine, but group(1) should
# be None for the '0#' case, and 'x' for the 'x:0#' cases.
# For '0#', the optional '([0-3]):' part of the
# r.e. (enclosed in (?: )) does not match anything, and that
# is what contains group 1.
#
# The actual result is, group(1) is '0' for both '0#' and '0:0#'.
# Likely this happens because when '0' is seen, the state machine
# cannot not yet determine whether the ([0-3]): should be matched,
# but has already seen enough of it to know what group(1) is, assuming
# it does match. The match needs to be deleted once the containing 
# ? fails. Indeed, if the group is expanded to include the ':',
# as in '(?:([0-3]:))?0#', or just '([0-3]:)?0#', '0#' produces
# group(1)=None as it should.
#
# Also, this is a good time to point out an error in the
# docs. The docs say that group(n) returns -1 when the
# group is in an unmatched part the of the r.e.; actually
# it returns None, which is more sensible.
#

rexp = '(?:([0-3]):)?0#'
mat1 = re.compile(rexp)

print "Re = ", rexp

for str in [ '2:0#', '0:0#', '0#', '0:#', ':0#']:
	print "\n-----<<", str, ">>-----"
	mat = mat1.match(str)
	if mat:
		print "    group(0) = ", mat.group(0)
		print "    group(1) = ", mat.group(1)
	else:
		print "    no match"
#
# output is below
#
#################################
# Python 2.1 (#15, Apr 16 2001, 18:25:49) [MSC 32 bit (Intel)] on win32
# Re =  (?:([0-3]):)?0#
# 
# -----<< 2:0# >>-----
#     group(0) =  2:0#
#     group(1) =  2
# 
# -----<< 0:0# >>-----
#     group(0) =  0:0#
#     group(1) =  0
# 
# -----<< 0# >>-----
#     group(0) =  0#
#     group(1) =  0
# 
# -----<< 0:# >>-----
#     no match
# 
# -----<< :0# >>-----
#     no match
#
############################################

----------------------------------------------------------------------

Comment By: Matthew Mueller (donut)
Date: 2001-10-04 21:39

Message:
Logged In: YES 
user_id=65253

I posted a fix as patch #468169 since I don't seem to have
access to add it here.


----------------------------------------------------------------------

Comment By: Gregory Smith (gregsmith)
Date: 2001-08-30 09:30

Message:
Logged In: YES 
user_id=292741

This appears to be the same bug as #429357, albeit using
a simpler test case. I have added a comment to that
one.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=448951&group_id=5470