[ python-Bugs-817234 ] re.finditer hangs on final empty match
SourceForge.net
noreply at sourceforge.net
Fri Sep 3 14:04:08 CEST 2004
Bugs item #817234, was opened at 2003-10-03 17:01
Message generated for change (Comment added) made by effbot
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=817234&group_id=5470
Category: Regular Expressions
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Kevin J. Butler (kevinbutler)
>Assigned to: Gustavo Niemeyer (niemeyer)
Summary: re.finditer hangs on final empty match
Initial Comment:
The iterator returned by re.finditer appears to not
terminate if the
final match is empty, but rather keeps returning the
final (empty) match.
Is this a bug in _sre? If so, I'll be happy to file
it, though fixing
it is a bit beyond my _sre experience level at this
point. The solution
would appear to be to either a check for duplicate
match in
iterator.next(), or to increment position by one after
returning an
empty match (which should be OK, because if a non-empty
match started at
that location, we would have returned it instead of the
empty match).
Code to illustrate the failure:
from re import finditer
last = None
for m in finditer( ".*", "asdf" ):
if last == m.span():
print "duplicate match:", last
break
print m.group(), m.span()
last = m.span()
---
asdf (0, 4)
(4, 4)
duplicate match: (4, 4)
---
findall works:
print re.findall( ".*", "asdf" )
['asdf', '']
Workaround is to explicitly check for a duplicate span,
as I did above,
or to check for a duplicate end(), which avoids the
final empty match
Seo Sanghyeon sent the following fix to python-dev list:
Attached one line patch fixes re.finditer bug reported by
Kevin J. Butler. I read cvs log to find out why this
code is
introduced, and it seems to be related to SF bug #581080.
But that bug didn't appear after my patch, so I wonder
why it was introduced in the first place. It seems beyond
my understanding. Please enlighten me.
To test:
#581080
import re
list(re.finditer('\s', 'a b'))
# expected: one item list
# bug: hang
#Kevin J. Butler
import re
list(re.finditer('.*', 'asdf'))
# expected: two item list (?)
# bug: hang
Seo Sanghyeon
-------------- next part --------------
? patch
Index: Modules/_sre.c
===================================================================
RCS file: /cvsroot/python/python/dist/src/Modules/_sre.c,v
retrieving revision 2.99
diff -c -r2.99 _sre.c
*** Modules/_sre.c 26 Jun 2003 14:41:08 -0000 2.99
--- Modules/_sre.c 2 Oct 2003 03:48:55 -0000
***************
*** 3062,3069 ****
match = pattern_new_match((PatternObject*)
self->pattern,
state, status);
! if ((status == 0 || state->ptr == state->start) &&
! state->ptr < state->end)
state->start = (void*) ((char*) state->ptr +
state->charsize);
else
state->start = state->ptr;
--- 3062,3068 ----
match = pattern_new_match((PatternObject*)
self->pattern,
state, status);
! if (status == 0 || state->ptr == state->start)
state->start = (void*) ((char*) state->ptr +
state->charsize);
else
state->start = state->ptr;
----------------------------------------------------------------------
>Comment By: Fredrik Lundh (effbot)
Date: 2004-09-03 14:04
Message:
Logged In: YES
user_id=38376
Still there in 2.4a3, as the following revised example shows:
import re
m = re.finditer(".*", "asdf")
print m.next().span()
print m.next().span()
print m.next().span() # this should raise an exception
Gustavo, can you look at this patch too?
----------------------------------------------------------------------
Comment By: Kevin J. Butler (kevinbutler)
Date: 2003-10-03 20:16
Message:
Logged In: YES
user_id=117665
The above patch does resolve the problem.
The code was introduced in rev 2.85
http://cvs.sourceforge.net/viewcvs.py/python/python/dist/src/Modules/_sre.c
to resolve bug 581080
http://sourceforge.net/tracker/index.php?func=detail&aid=581080&group_id=5470&atid=105470
but removing this line does not re-introduce that bug.
Thanks, and kudos to Seo...
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=817234&group_id=5470
More information about the Python-bugs-list
mailing list