[Python-ideas] str.split() oddness

Mart Sõmermaa mrts.pydev at gmail.com
Sun Mar 6 19:32:07 CET 2011


First, sorry for such a big delay in replying.

On Mon, Feb 28, 2011 at 2:13 AM, Guido van Rossum <guido at python.org> wrote:
> Does Ruby in general leave out empty strings from the result? What
> does it return when "x,,y" is split on "," ? ["x", "", "y"] or ["x", "y"]?

>> "x,,y".split(",")
=> ["x", "", "y"]

But let me remind that the behaviour of foo.split(x) where
foo is not an empty string is not questioned at all, only
behaviour when splitting the empty string is.

              Python           Ruby
join1     [''] => ''        [''] => ''
join2     [  ] => ''        [  ] => ''

              Python           Ruby
split      [''] <= ''        [  ] <= ''

As you can see, join1 and join2 are identical in both
languages. Python has chosen to make split the inverse of
join1, Ruby, on the other hand, the inverse of join2.

> In Python the generalization is that since "xx".split(",") is ["xx"],
> and "x",split(",") is ["x"], it naturally follows that "".split(",")
> is [""].

That is one line of reasoning that emphasizes the
"string-nature" of ''.

However, I myself, the Ruby folks and Nick would rather
emphasize the "zero-element-nature" [1] of ''.

Both approaches are based on solid reasoning, the latter
just happens to be more practical. And I would still claim
that

"Applying the split operator to the zero element of
strings should result in the zero element of lists"

wins on theoretical grounds as well.

The general problem stems from the fact that my initial
expectation that

 f_a(x) = x.join(a).split(x), where x in lists, a in strings

should be an identity function can not be satisfied as join
is non-injective (because of the surjective example above).

[1] http://en.wikipedia.org/wiki/Zero_element



More information about the Python-ideas mailing list