How to prevent re.split() from removing part of string

MRAB python at mrabarnett.plus.com
Mon Nov 30 19:24:43 EST 2009


Jeremy wrote:
> I am using re.split to... well, split a string into sections.  I want
> to split when, following a new line, there are 4 or fewer spaces.  The
> pattern I use is:
> 
>         sections = re.split('\n\s{,4}[^\s]', lineoftext)
> 
> This splits appropriately but I lose the character matched by [^s].  I
> know I can put parentheses around [^s] and keep the matched character,
> but the character is placed in it's own element of the list instead of
> with the rest of the lineoftext.
> 
> Does anyone know how I can accomplish this without losing the matched
> character?
> 
First of all, \s matches any character that's _whitespace_, such as
space, "\t", "\n", "\r", "\f". There's also \S, which matches any
character that's not whitespace.

But in answer to your question, use a look-ahead:

     sections = re.split('\n {,4}(?=\S)', lineoftext)




More information about the Python-list mailing list