Bug in re.findall()?

Jonathan Feinberg jdf at pobox.com
Mon May 7 20:25:49 EDT 2001


ActivePython 2.0, build 203 (ActiveState Tool Corp.)
based on Python 2.0 (#8, Mar  7 2001, 16:04:37) [MSC 32 bit (Intel)]
on win32

#-----------------------------------------------------
import re
r = re.compile("(?: (apple) | (banana) )\s*", re.X)
tests = [ 'apple apple apple',
          'banana banana banana',
          'apple apple banana banana',
          'apple banana apple banana' ]
for t in tests:
    print r.findall(t)
#-----------------------------------------------------
[('apple', ''), ('apple', ''), ('apple', '')]
[('', 'banana'), ('', 'banana'), ('', 'banana')]
[('apple', ''), ('apple', ''), ('', 'banana'), ('', 'banana')]
[('apple', ''), ('', 'banana'), ('apple', 'banana'), ('', 'banana')]
                                           ^^^^^^
                                        should not be!

Am I wrong?

If you remove the \s* from the end of the regex, the problem goes
away.
-- 
Jonathan Feinberg   jdf at pobox.com   Sunny Brooklyn, NY
http://pobox.com/~jdf



More information about the Python-list mailing list