a splitting headache

Mensanator mensanator at aol.com
Tue Oct 20 18:22:55 EDT 2009


On Oct 20, 1:51 pm, David C Ullrich <dullr... at sprynet.com> wrote:
> On Thu, 15 Oct 2009 18:18:09 -0700, Mensanator wrote:
> > All I wanted to do is split a binary number into two lists, a list of
> > blocks of consecutive ones and another list of blocks of consecutive
> > zeroes.
>
> > But no, you can't do that.
>
> >>>> c = '0010000110'
> >>>> c.split('0')
> > ['', '', '1', '', '', '', '11', '']
>
> > Ok, the consecutive delimiters appear as empty strings for reasons
> > unknown (except for the first one). Except when they start or end the
> > string in which case the first one is included.
>
> > Maybe there's a reason for this inconsistent behaviour but you won't
> > find it in the documentation.
>
> Wanna bet? I'm not sure whether you're claiming that the behavior
> is not specified in the docs or the reason for it. The behavior
> certainly is specified. I conjecture you think the behavior itself
> is not specified,

The problem is that the docs give a single example

>>> '1,,2'.split(',')
['1','','2']

ignoring the special case of leading/trailing delimiters. Yes, if you
think it through, ',1,,2,'.split(',') should return ['','1','','2','']
for exactly the reasons you give.

Trouble is, we often find ourselves doing ' 1  2  '.split() which
returns
['1','2'].

I'm not saying either behaviour is wrong, it's just not obvious that
the
one behaviour doesn't follow from the other and the documentation
could be
a little clearer on this matter. It might make a bit more sense to
actually
mention the slpit(sep) behavior that split() doesn't do.


> because your description of what's happening,
>
> "consecutive delimiters appear as empty strings for reasons
>
> > unknown (except for the first one). Except when they start or end the
> > string in which case the first one is included"
>
> is at best an awkward way to look at it. The delimiters
> are not appearing as empty strings.
>
> You're asking to split  '0010000110' on '0'.
> So you're asking for strings a, b, c, etc such that
>
> (*) '0010000110' = a + '0' + b + '0' + c + '0' + etc
>
> The sequence of strings you're getting as output satisfies
> (*) exactly; the first '' is what appears before the first
> delimiter, the second '' is what's between the first and
> second delimiters, etc.




More information about the Python-list mailing list