re.split() not keeping matched text

Peter Otten __peter__ at web.de
Mon Jul 26 10:49:40 EDT 2004


mark at wutka.com wrote:

> I don't know if this will save you any processing time, but you can just
> replace the split with a findall like this:
> l = re.findall("[^.?!]+[?!.]+", x)
> 
> This should handle your example, plus it handles multiple occurances of
> the punctuation at the end of the sentence.

One caveat: the invariant 

"".join(re.findall("[^?!.]+[?!.]+", s)) == s

will no longer hold as you will lose leading punctuation and trailing
non-punctuation:

>>> re.findall("[^?!.]+[?!.]+", "!so what! you're done? yes done")
['so what!', " you're done?"]
>>>

Peter




More information about the Python-list mailing list