Splitting a string

Sat Apr 3 12:01:20 EDT 2010

On Apr 3, 4:17 am, Peter Otten <__pete... at web.de> wrote:
> Patrick Maupin wrote:
> > On Apr 2, 4:32 pm, Peter Otten <__pete... at web.de> wrote:
>
> >> _split = re.compile(r"(\d+)").split
> >> def split(s):
> >>     if not s:
> >>         return ()
> >>     parts = _split(s)
> >>     parts[1::2] = map(int, parts[1::2])
>
>        # because s is non-empty parts contains at least one
>        # item != "", and parts[x] below cannot fail with an
>        # IndexError
>
> >>     if parts[-1] == "":
> >>         del parts[-1]
> >>     if parts[0] == "":
> >>         del parts[0]
> >>     return tuple(parts)
>
> > That's certainly faster than a list comprehension (at least on long
> > lists), but it might be a little obscure why the "if not s:" is
> > needed,
>
> The function is small; with a test suite covering the corner cases and
> perhaps a comment* nothing should go wrong.
>
> (*) you can certainly improve on my attempt
>
> > so unless Thomas has a really long result list, he might want
> > to just keep the list comprehension, which is (IMO) very readable.
>
> Generally speaking performing tests of which you know they can't fail can
> confuse the reader just as much as tests with unobvious interdependencies.

Yes, I see your point.  The list comprehension will only treat the
ends differently, and will always pass the middle, so someone could be
confused about why the comprehension is there in the first place.  I
guess I'm used to doing this same thing on lists that *could* have
empty strings in the middle (with more complicated regular expressions
with multiple match cases), so I didn't consider that.

> > Alternatively, this is halfway between the previous example and the
> > list comprehension:
>
> > _split = re.compile(r"(\d+)").split
> > def split(s):
> >     parts = _split(s)
> >     parts[1::2] = map(int, parts[1::2])
> >     for index in (-1, 0):
> >         if parts and parts[index] == "":
> >             del parts[index]
> >     return tuple(parts)
>
> I don't think that this is similar to the list comprehension approach
> because it only tests the first and the last item instead of the whole list.
> Both variants should therefore perform equally well for all but the empty
> string argument. If that is a theoretical case you are free to choose the
> more readable variant.

I agree that "halfway" was not a very precise way of describing the
differences.  Like your solution, this only tests the outer elements.
Like the list comprehension, no short-circuit test before doing the
re.split is required.  Also like the list comprehension, the act of
doing the same operation on multiple elements is refactored such that
operation is only coded once.

BUT...

All of this is just a user preference, and is extremely minor compared
to the observation that re.split() and extended string slicing can be
combined to give a very elegant solution to the problem!

Regards,
Pat