split weirds

Donn Cave donn at u.washington.edu
Thu Dec 14 12:45:16 EST 2000


Quoth Robin Becker <robin at jessikat.fsnet.co.uk>:
| In article <QI2l1FAXBMO6EwD6 at jessikat.demon.co.uk>, Robin Becker
| <robin at jessikat.fsnet.co.uk> writes
| Can somebody explain why strip behaves differently with and without an
| argument?
| 
| >>> from string import split
| >>> split(' ')
| []
| >>> split(' ',' ')
| ['', '']
| >>> split('  ')
| []
| >>> split('  ',' ')
| ['', '', '']
| >>> 
| 
| with the default whitespace arg seems as though a run of whitespace is
| being treated as a single character.

| several people have pointed out that arbitrary whitespace in the docs
| means that so the no arg case really is special. I still find it a bit
| strange that
|
| split(' word ') ==> ['word'], but then the docs use the term separated
| for the no arg case which can't be done for absent words. 

Maybe it would comforting to think of the default separator as
"not non-white space."

In ' word ', 'w' is non-white, but ' ' is not non-white -- and
so is "end of data" not non-white.  So the split classifies this
string as three parts, separator,separated,separator.

Given a specific separator, say ':', and the equivalent string
':word:', null end of data does not qualify as separator, so the
same analysis yields 5 parts.

However you justify it, it's mighty convenient in practice.

	Donn Cave, donn at u.washington.edu



More information about the Python-list mailing list