[Python-bugs-list] [ python-Bugs-436596 ] re.findall() bad with third argument
noreply@sourceforge.net
noreply@sourceforge.net
Fri, 06 Jul 2001 09:14:44 -0700
Bugs item #436596, was opened at 2001-06-26 19:10
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=436596&group_id=5470
Category: Python Library
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Nobody/Anonymous (nobody)
>Assigned to: Fredrik Lundh (effbot)
Summary: re.findall() bad with third argument
Initial Comment:
On Wed, 27 Jun 2001, Dan Tropp wrote:
> I tried these in my python shell. Why do the last
two give what they do?
>
> >>> print re.findall('<.*?>','<a> </a> <a> </a>')
> ['<a>', '</a>', '<a>', '</a>']
> >>> print re.findall('<.*?>','<1> </2> \n<3> </4>')
> ['<1>', '</2>', '<3>', '</4>']
> >>> print re.findall('<.*?>','<1> </2> \n<3> </4>',
re.I|re.S)
> []
> >>> print re.findall('<.*?>','<1> </2> \n<3> </4>',
re.I)
> ['</2>', '<3>', '</4>']
Now this is curious, because according to the
documentation at:
http://python.org/doc/current/lib/Contents_of_Module_re
.html
re.findall() is only supposed to take in two
arguments. In fact, in
Python 1.52, Python complains that:
###
# in Python 1.52:
>> print re.findall('<.*?>','<1> </2> \n<3> </4>',
re.I)
Traceback (innermost last):
File "<stdin>", line 1, in ?
TypeError: too many arguments; expected 2, got 3
##
Let me check if the same behavior happens in 2.1:
###
# in Python 2.1
>>> re.findall('<.*?>','<1> </2> \n<3> </4>', re.I)
['</2>', '<3>', '</4>']
###
Now that is weird! This looks like it might be a
bug. Let's take a look
at the source code, to see why it's doing that.
###
## source code in sre.py
def findall(pattern, string, maxsplit=0):
"""Return a list of all non-overlapping matches in
the string.
If one or more groups are present in the pattern,
return a
list of groups; this will be a list of tuples if
the pattern
has more than one group.
Empty matches are included in the result."""
return _compile(pattern, 0).findall(string,
maxsplit)
###
Weird! findall() in its current incarnation does take
in a third
argument, contrary to the HTML documentation. But
this makes no sense to
me. Why should findall need a maxsplit parameter,
when maxsplit is
something that the split()ing operator works with?
This really looks like
a bug to me.
Hmmm... well, the definition to findall() is adjacent
to split(), so
perhaps someone made a mistake and accidently added
maxsplit as an
argument. I believe that the corrected code in sre.py
should be:
###
def findall(pattern, string):
"""Return a list of all non-overlapping matches in
the string.
If one or more groups are present in the pattern,
return a
list of groups; this will be a list of tuples if
the pattern
has more than one group.
Empty matches are included in the result."""
return _compile(pattern, 0).findall(string)
###
instead.
Ever since June 1, 2000, the findall() code in sre.py
has contained this
weird behavior:
http://cvs.sourceforge.net/cgi-
bin/viewcvs.cgi/python/python/dist/src/Lib/sre.py?
rev=1.5&content-type=text/vnd.viewcvs-markup
and even in the current development sources, it still
has it!
http://cvs.sourceforge.net/cgi-
bin/viewcvs.cgi/python/python/dist/src/Lib/sre.py?
rev=1.25.2.1&content-type=text/vnd.viewcvs-markup
Dan, I think we should report this to the Implementors
and see what they
think about it. Good catch! *grin* Do you want to
submit this to
sourceforge?
----------------------------------------------------------------------
Comment By: Danny Yoo (dyoo)
Date: 2001-06-27 08:30
Message:
Logged In: YES
user_id=49843
More details here:
http://mail.python.org/pipermail/tutor/2001-June/006891.html
----------------------------------------------------------------------
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=436596&group_id=5470