[Python-bugs-list] [ python-Bugs-436596 ] re.findall() bad with third argument

noreply@sourceforge.net noreply@sourceforge.net
Fri, 06 Jul 2001 09:14:44 -0700


Bugs item #436596, was opened at 2001-06-26 19:10
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=436596&group_id=5470

Category: Python Library
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Nobody/Anonymous (nobody)
>Assigned to: Fredrik Lundh (effbot)
Summary: re.findall() bad with third argument

Initial Comment:
On Wed, 27 Jun 2001, Dan Tropp wrote:

> I tried these in my python shell. Why do the last 
two give what they do?
> 
> >>> print re.findall('<.*?>','<a> </a> <a> </a>')
> ['<a>', '</a>', '<a>', '</a>']
> >>> print re.findall('<.*?>','<1> </2> \n<3> </4>')
> ['<1>', '</2>', '<3>', '</4>']
> >>> print re.findall('<.*?>','<1> </2> \n<3> </4>', 
re.I|re.S)
> []
> >>> print re.findall('<.*?>','<1> </2> \n<3> </4>', 
re.I)
> ['</2>', '<3>', '</4>']

Now this is curious, because according to the 
documentation at:

    
http://python.org/doc/current/lib/Contents_of_Module_re
.html

re.findall() is only supposed to take in two 
arguments.  In fact, in
Python 1.52, Python complains that:

###
# in Python 1.52:
>> print re.findall('<.*?>','<1> </2> \n<3> </4>', 
re.I)
Traceback (innermost last):
  File "<stdin>", line 1, in ?
TypeError: too many arguments; expected 2, got 3
##


Let me check if the same behavior happens in 2.1:

###
# in Python 2.1
>>> re.findall('<.*?>','<1> </2> \n<3> </4>', re.I)
['</2>', '<3>', '</4>']
###

Now that is weird!  This looks like it might be a 
bug.  Let's take a look
at the source code, to see why it's doing that.


###
## source code in sre.py
def findall(pattern, string, maxsplit=0):
    """Return a list of all non-overlapping matches in 
the string.

    If one or more groups are present in the pattern, 
return a
    list of groups; this will be a list of tuples if 
the pattern
    has more than one group.

    Empty matches are included in the result."""
    return _compile(pattern, 0).findall(string, 
maxsplit)
###

Weird!  findall() in its current incarnation does take 
in a third
argument, contrary to the HTML documentation.  But 
this makes no sense to
me.  Why should findall need a maxsplit parameter, 
when maxsplit is
something that the split()ing operator works with?  
This really looks like
a bug to me.


Hmmm... well, the definition to findall() is adjacent 
to split(), so
perhaps someone made a mistake and accidently added 
maxsplit as an
argument.  I believe that the corrected code in sre.py 
should be:

###
def findall(pattern, string):
    """Return a list of all non-overlapping matches in 
the string.

    If one or more groups are present in the pattern, 
return a
    list of groups; this will be a list of tuples if 
the pattern
    has more than one group.

    Empty matches are included in the result."""
    return _compile(pattern, 0).findall(string)
###

instead.

Ever since June 1, 2000, the findall() code in sre.py 
has contained this
weird behavior:

http://cvs.sourceforge.net/cgi-
bin/viewcvs.cgi/python/python/dist/src/Lib/sre.py?
rev=1.5&content-type=text/vnd.viewcvs-markup

and even in the current development sources, it still 
has it!

http://cvs.sourceforge.net/cgi-
bin/viewcvs.cgi/python/python/dist/src/Lib/sre.py?
rev=1.25.2.1&content-type=text/vnd.viewcvs-markup


Dan, I think we should report this to the Implementors 
and see what they
think about it.  Good catch!  *grin*  Do you want to 
submit this to
sourceforge?


----------------------------------------------------------------------

Comment By: Danny Yoo (dyoo)
Date: 2001-06-27 08:30

Message:
Logged In: YES 
user_id=49843

More details here:

http://mail.python.org/pipermail/tutor/2001-June/006891.html


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=436596&group_id=5470