Splitting on '^' ?

MRAB python at mrabarnett.plus.com
Fri Aug 14 18:30:52 EDT 2009


Ethan Furman wrote:
> kj wrote:
>>
>> Sometimes I want to split a string into lines, preserving the
>> end-of-line markers.  In Perl this is really easy to do, by splitting
>> on the beginning-of-line anchor:
>>
>>   @lines = split /^/, $string;
>>
>> But I can't figure out how to do the same thing with Python.  E.g.:
>>
>>
>>>>> import re
>>>>> re.split('^', 'spam\nham\neggs\n')
>>
>> ['spam\nham\neggs\n']
>>
>>>>> re.split('(?m)^', 'spam\nham\neggs\n')
>>
>> ['spam\nham\neggs\n']
>>
>>>>> bol_re = re.compile('^', re.M)
>>>>> bol_re.split('spam\nham\neggs\n')
>>
>> ['spam\nham\neggs\n']
>>
>> Am I doing something wrong?
>>
>> kynn
> 
> As you probably noticed from the other responses:  No, you can't split 
> on _and_ keep the splitby text.
> 
You _can_ split and keep what you split on:

 >>> re.split("(x)", "abxcd")
['ab', 'x', 'cd']

You _can't_ split on a zero-width match:

 >>> re.split("(x*)", "abxcd")
['ab', 'x', 'cd']

but you can use re.sub to replace zero-width matches with something
that's not zero-width and then split on that (best with str.split):

 >>> re.sub("(x*)", "@", "abxcd")
'@a at b@c at d@'
 >>> re.sub("(x*)", "@", "abxcd").split("@")
['', 'a', 'b', 'c', 'd', '']



More information about the Python-list mailing list