Splitting on '^' ?

Ethan Furman ethan at stoneleaf.us
Fri Aug 14 19:13:56 EDT 2009


MRAB wrote:
> Ethan Furman wrote:
> 
>> kj wrote:
>>
>>>
>>> Sometimes I want to split a string into lines, preserving the
>>> end-of-line markers.  In Perl this is really easy to do, by splitting
>>> on the beginning-of-line anchor:
>>>
>>>   @lines = split /^/, $string;
>>>
>>> But I can't figure out how to do the same thing with Python.  E.g.:
>>>
>>>
>>>>>> import re
>>>>>> re.split('^', 'spam\nham\neggs\n')
>>>
>>>
>>> ['spam\nham\neggs\n']
>>>
>>>>>> re.split('(?m)^', 'spam\nham\neggs\n')
>>>
>>>
>>> ['spam\nham\neggs\n']
>>>
>>>>>> bol_re = re.compile('^', re.M)
>>>>>> bol_re.split('spam\nham\neggs\n')
>>>
>>>
>>> ['spam\nham\neggs\n']
>>>
>>> Am I doing something wrong?
>>>
>>> kynn
>>
>>
>> As you probably noticed from the other responses:  No, you can't split 
>> on _and_ keep the splitby text.
>>
> You _can_ split and keep what you split on:
> 
>  >>> re.split("(x)", "abxcd")
> ['ab', 'x', 'cd']
> 
> You _can't_ split on a zero-width match:
> 
>  >>> re.split("(x*)", "abxcd")
> ['ab', 'x', 'cd']
> 
> but you can use re.sub to replace zero-width matches with something
> that's not zero-width and then split on that (best with str.split):
> 
>  >>> re.sub("(x*)", "@", "abxcd")
> '@a at b@c at d@'
>  >>> re.sub("(x*)", "@", "abxcd").split("@")
> ['', 'a', 'b', 'c', 'd', '']

Wow!  I stand corrected, although I'm in danger of falling over from the 
dizziness!  :)

As impressive as that is, I don't think it does what the OP is looking 
for.  rurpy reminded us (or at least me ;) of .splitlines(), which seems 
to do exactly what the OP is looking for.  I do take some comfort that 
my little snippet works for more than newlines alone, although I'm not 
aware of any other use-cases.  :(

~Ethan~

Oh, hey, how about this?

re.compile('(^[^\n]*\n?)', re.M).findall('text\ntext\ntext)

Although this does give me an extra blank segment at the end... oh well.



More information about the Python-list mailing list