[Python-ideas] str.split() oddness

Terry Reedy tjreedy at udel.edu
Sat Feb 26 19:52:19 CET 2011


On 2/26/2011 9:03 AM, Mart Sõmermaa wrote:
> IMHO, x.join(a).split(x) should be "idempotent"
> in regard to a.

Given that x.join is *not* 1 to 1,

 >>> 'a'.join([])
''
 >>> 'a'.join([''])
''

it cannot have an inverse for all outputs.
In particular, ''.split('a') cannot be both [] and [''].

This could only be fixed by changing the definition of join to not allow 
joining on [], but that would not be convenient. I believe joining is 
otherwise 1 to 1 and invertible for non-empty lists.

Of course, join input a can be any iterable of strings, whereas split 
produces a list, so your equality test can only work for list inputs 
unless generalized to c.join(a).split(c) == list(a).

''.split('a') == [''], not [],  by the definition of s.split(c):
a list of pieces of s that were previously joined by c.
In particular, string_not_containing_sep.split(sep) ==
[string_not_containing_sep].

Note that empty pieces are inserted for repeated seps so that splitting 
on seps (unlike splitting on 'whitespace') *is* 1 to 1.
'abc'.split('b') == ['a','c']
'abbc'.split('b') == ['a','','c']
(whereas 'a c'.split() and 'a  c'.split() are both ['a','c'])

Therefore, sep splitting does have an inverse:
  c.join(s.split(c)) == s

The doc for str.split specifies the above and makes clear that splitting
with and without a separator are slightly different functions.

>>>> assert ' '.join(foo).split() == foo

You have pulled a fast one here. ' ' does not equal 'whitespace' ;-)
If x in your original expression is nothing (to indicate 'whitespace'), 
then your desired equality becomes
   .join(a).split() == a
which is not legal ;-).

Some of the above is a rewording and expansion upon what Joao already 
said, which was all correct.

-- 
Terry Jan Reedy





More information about the Python-ideas mailing list