Splitting text at whitespace but keeping the whitespace in the returned list

MRAB python at mrabarnett.plus.com
Sun Jan 24 12:24:40 EST 2010


python at bdurham.com wrote:
> I need to parse some ASCII text into 'word' sized chunks of text AND 
> collect the whitespace that seperates the split items. By 'word' I mean 
> any string of characters seperated by whitespace (newlines, carriage 
> returns, tabs, spaces, soft-spaces, etc). This means that my split text 
> can contain punctuation and numbers - just not whitespace.
>  
> The split( None ) method works fine for returning the word sized chunks 
> of text, but destroys the whitespace separators that I need.
>  
> Is there a variation of split() that returns delimiters as well as tokens?
>  
I'd use the re module:

 >>> import re
 >>> re.split(r'(\s+)', "Hello world!")
['Hello', ' ', 'world!']



More information about the Python-list mailing list