[Python-bugs-list] [ python-Bugs-505997 ] string.split docs are inconsistent

noreply@sourceforge.net noreply@sourceforge.net
Wed, 30 Jan 2002 08:17:15 -0800


Bugs item #505997, was opened at 2002-01-20 01:24
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=505997&group_id=5470

Category: Documentation
Group: None
>Status: Closed
>Resolution: Fixed
Priority: 5
Submitted By: Matt Zimmerman (mzimmerman)
Assigned to: Fred L. Drake, Jr. (fdrake)
Summary: string.split docs are inconsistent

Initial Comment:
string.split.__doc__ says:

split(s [,sep [,maxsplit]]) -> list of strings

    Return a list of the words in the string s, using
sep as the
    delimiter string.  If maxsplit is given, splits
into at most
    maxsplit words.  If sep is not specified, any
whitespace string
    is a separator.

    (split and splitfields are synonymous)

This implies that len(split(s, sep, maxsplit)) <=
maxsplit.  In reality,
however, it is <= maxsplit+1.  This seems to be
explained by the library
documentation:

<quote>
split(s[, sep[, maxsplit]])
Return a list of the words of the string s. If the
optional second argument
sep is absent or None, the words are separated by
arbitrary strings of
whitespace characters (space, tab, newline, return,
formfeed). If the second
argument sep is present and not None, it specifies a
string to be used as
the word separator. The returned list will then have
one more item than the
number of non-overlapping occurrences of the separator
in the string. The
optional third argument maxsplit defaults to 0. If it
is nonzero, at most
maxsplit number of splits occur, and the remainder of
the string is returned
as the final element of the list (thus, the list will
have at most
maxsplit+1 elements).
</quote>

Which indicates that maxsplit is in units of "splits"
rather than "words",
where words = splits + 1.  Personally, i find the
"number of splits"
behaviour very counter-intuitive, and would much prefer
"number of words".
At any rate, the inconsistency needs to be corrected.

Also, the sentence "The optional third argument
maxsplit defaults to 0"
implies that specifying maxsplit=0 is the same as not
specifying it at all.
This is not the case, however:

Python 2.2 (#1, Jan  8 2002, 01:13:32) 
[GCC 2.95.4 20011006 (Debian prerelease)] on linux2
Type "help", "copyright", "credits" or "license" for
more information.
>>> print "1x2x3".split('x')
['1', '2', '3']
>>> print "1x2x3".split('x',0)
['1x2x3']

Instead, it seems to cause sep to be disregarded,
making split(anything,0)
equivalent to split().

I don't have the python2.1 documentation installed at
the moment, so I can't
check the library reference for that version, but at
least the
string.split.__doc__ there is inconsistent with behaviour.

This was originally submitted as Debian bug #129272

----------------------------------------------------------------------

>Comment By: Fred L. Drake, Jr. (fdrake)
Date: 2002-01-30 08:17

Message:
Logged In: YES 
user_id=3066

Fixed in Lib/string.py revisions 1.61, 1.60.16.1, and 1.59.4.1.

----------------------------------------------------------------------

Comment By: Matt Zimmerman (mzimmerman)
Date: 2002-01-21 00:38

Message:
Logged In: YES 
user_id=196786

By the way, the reason I ended up looking at the library docs (and docstring)
for the string module was that I did a simple text search on index.html from
the library reference (I'm new to Python).

The first match is UserString, and the second is 4. String Services, under
which can be found section 4.1 "string -- Common string operations".  "string"
is in a monospaced font, and looks as much like a type as a module name, so I
assumed that it applied to the built-in string type.  I later found the
documentation for the string type in section 2.2.6.1 "String Methods".


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-01-20 20:45

Message:
Logged In: YES 
user_id=31435

Guido's right, I did

print "".split.__doc__

without even considering that someone may still be doing 
the archaic <wink>

print string.split.__doc__

----------------------------------------------------------------------

Comment By: Nobody/Anonymous (nobody)
Date: 2002-01-20 20:30

Message:
Logged In: NO 

Tim was looking at the doc string for the split *method* of
string objects, which is correct. But the complaint was
about the split *function* in the (no longer needed, but
still supported) string *module*, which is indeed wrong --
still in 2.2.

--Guido (not logged in)

----------------------------------------------------------------------

Comment By: Matt Zimmerman (mzimmerman)
Date: 2002-01-20 20:20

Message:
Logged In: YES 
user_id=196786

Thanks for responding.

The docstring was from Python 2.1.2 (Debian 2.1.2-2):

Python 2.1.2 (#1, Jan 18 2002, 18:05:45) 
[GCC 2.95.4  (Debian prerelease)] on linux2
Type "copyright", "credits" or "license" for more information.
>>> import string
>>> print string.split.__doc__
split(s [,sep [,maxsplit]]) -> list of strings

    Return a list of the words in the string s, using sep as the
    delimiter string.  If maxsplit is given, splits into at most
    maxsplit words.  If sep is not specified, any whitespace string
    is a separator.

    (split and splitfields are synonymous)

In 2.2, it seems to be corrected:

Python 2.2 (#1, Jan  8 2002, 01:13:32) 
[GCC 2.95.4 20011006 (Debian prerelease)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import string
>>> print string.split.__doc__
split(s [,sep [,maxsplit]]) -> list of strings

    Return a list of the words in the string s, using sep as the
    delimiter string.  If maxsplit is given, splits into at most
    maxsplit words.  If sep is not specified, any whitespace string
    is a separator.

    (split and splitfields are synonymous)

The library documentation for 2.2 still says that maxsplit defaults to 0,
though apparently it defaults to -1, so that needs to be fixed.


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-01-20 19:38

Message:
Logged In: YES 
user_id=31435

I don't know which version of Python they're using, but the 
docstring doesn't match what's claimed here in 2.0.1, 2.1 
or 2.2.  Assigned to Fred for resolution (probably "Fixed").

----------------------------------------------------------------------

Comment By: Nobody/Anonymous (nobody)
Date: 2002-01-20 07:05

Message:
Logged In: NO 

The docs and docstring seems wrong; the behavior is correct.
maxsplit is the number of *separators* recognized; it
defaults to -1. specifying maxsplit=0 makes it a no-op.

--Guido (can't log in right now)

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=505997&group_id=5470