[ python-Bugs-1105286 ] Undocumented implicit strip() in split(None) string method

SourceForge.net noreply at sourceforge.net
Wed Jan 19 17:56:54 CET 2005


Bugs item #1105286, was opened at 2005-01-19 10:04
Message generated for change (Comment added) made by tim_one
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1105286&group_id=5470

Category: Documentation
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: YoHell (yohell)
Assigned to: Nobody/Anonymous (nobody)
Summary: Undocumented implicit strip() in split(None) string method

Initial Comment:
Hi! 

I noticed that the string method split() first does an
implicit strip() before splitting when it's used with
no arguments or with None as the separator (sep in the
docs). There is no mention of this implicit strip() in
the docs.

Example 1:
s = " word1 word2 "

s.split() then returns ['word1', 'word2'] and not ['',
'word1', 'word2', ''] as one might expect.

WHY IS THIS BAD?

1. Because it's undocumented. See:
http://www.python.org/doc/current/lib/string-methods.html#l2h-197

2. Because it may lead to unexpected behavior in programs. 
Example 2:
FASTA sequence headers are one line descriptors of
biological sequences and are on this form: 
">" + Identifier + whitespace + free text description.

Let sHeader be a Python string containing a FASTA
header. One could then use the following syntax to
extract the identifier from the header:

sID = sHeader[1:].split(None, 1)[0]

However, this does not work if sHeader contains a
faulty FASTA header where the identifier is missing or
consists of whitespace. In that case sID will contain
the first word of the free text description, which is
not the desired behavior. 

WHAT SHOULD BE DONE?

The implicit strip() should be removed, or at least
should programmers be given the option to turn it off.
At the very least it should be documented so that
programmers have a chance of adapting their code to it.

Thank you for an otherwise splendid language!
/Joel Hedlund
Ph.D. Student
IFM Bioinformatics
Linköping University

----------------------------------------------------------------------

>Comment By: Tim Peters (tim_one)
Date: 2005-01-19 11:56

Message:
Logged In: YES 
user_id=31435

I think the docs for split() under "String Methods" are quite 
clear:

"""
...

If sep is not specified or is None, a different splitting 
algorithm is applied. Words are separated by arbitrary length 
strings of whitespace characters (spaces, tabs, newlines, 
returns, and formfeeds). Consecutive whitespace delimiters 
are treated as a single delimiter ("'1 2 3'.split()" 
returns "['1', '2', '3']"). Splitting an empty string returns "['']". 
"""

This won't change, because mountains of code rely on this 
behavior -- it's probably the single most common use case 
for .split().


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1105286&group_id=5470


More information about the Python-bugs-list mailing list