re.split() not keeping matched text
mark at wutka.com
mark at wutka.com
Mon Jul 26 09:57:21 EDT 2004
I don't know if this will save you any processing time, but you can just
replace the split with a findall like this:
l = re.findall("[^.?!]+[?!.]+", x)
This should handle your example, plus it handles multiple occurances of
the punctuation at the end of the sentence.
Robert Oschler <no_replies at fake_email_address.invalid> wrote:
> Hello,
>
> Given the following program:
>
> --------------
>
> import re
>
> x = "The dog ran. The cat eats! The bird flies? Done."
> l = re.split("[.?!]", x)
>
> for s in l:
> print s.strip()
> # for
> ---------------
>
> I am getting the following output:
>
> The dog ran
> The cat eats
> The bird flies
> Done
>
> As you can see the end of sentence punctuation marks are being removed. Yet
> the the docs for re.split() say that the matched text is supposed to be
> returned. I want to keep the punctuation marks.
>
> Where am I going wrong here?
>
> Thanks,
More information about the Python-list
mailing list