Splitting a string

Sat Apr 3 05:17:36 EDT 2010

Patrick Maupin wrote:

> On Apr 2, 4:32 pm, Peter Otten <__pete... at web.de> wrote:
> 
>> _split = re.compile(r"(\d+)").split
>> def split(s):
>>     if not s:
>>         return ()
>>     parts = _split(s)
>>     parts[1::2] = map(int, parts[1::2])

       # because s is non-empty parts contains at least one
       # item != "", and parts[x] below cannot fail with an
       # IndexError
>>     if parts[-1] == "":
>>         del parts[-1]
>>     if parts[0] == "":
>>         del parts[0]
>>     return tuple(parts)
>>
> 
> That's certainly faster than a list comprehension (at least on long
> lists), but it might be a little obscure why the "if not s:" is
> needed, 

The function is small; with a test suite covering the corner cases and 
perhaps a comment* nothing should go wrong.

(*) you can certainly improve on my attempt

> so unless Thomas has a really long result list, he might want
> to just keep the list comprehension, which is (IMO) very readable.

Generally speaking performing tests of which you know they can't fail can 
confuse the reader just as much as tests with unobvious interdependencies.

> Alternatively, this is halfway between the previous example and the
> list comprehension:
> 
> _split = re.compile(r"(\d+)").split
> def split(s):
>     parts = _split(s)
>     parts[1::2] = map(int, parts[1::2])
>     for index in (-1, 0):
>         if parts and parts[index] == "":
>             del parts[index]
>     return tuple(parts)

I don't think that this is similar to the list comprehension approach 
because it only tests the first and the last item instead of the whole list.
Both variants should therefore perform equally well for all but the empty 
string argument. If that is a theoretical case you are free to choose the 
more readable variant.

Peter