re.split() not keeping matched text

mark at wutka.com mark at wutka.com
Mon Jul 26 09:57:21 EDT 2004


I don't know if this will save you any processing time, but you can just
replace the split with a findall like this:
l = re.findall("[^.?!]+[?!.]+", x)

This should handle your example, plus it handles multiple occurances of
the punctuation at the end of the sentence.

Robert Oschler <no_replies at fake_email_address.invalid> wrote:
> Hello,
> 
> Given the following program:
> 
> --------------
> 
> import re
> 
> x = "The dog ran. The cat eats! The bird flies? Done."
> l = re.split("[.?!]", x)
> 
> for s in l:
>  print s.strip()
> # for
> ---------------
> 
> I am getting the following output:
> 
> The dog ran
> The cat eats
> The bird flies
> Done
> 
> As you can see the end of sentence punctuation marks are being removed.  Yet
> the the docs for re.split() say that the matched text is supposed to be
> returned.  I want to keep the punctuation marks.
> 
> Where am I going wrong here?
> 
> Thanks,



More information about the Python-list mailing list