Identifying unicode punctuation characters with Python regex

Shiao multiseed at gmail.com
Fri Nov 14 05:23:08 EST 2008


Hello,
I'm trying to build a regex in python to identify punctuation
characters in all the languages. Some regex implementations support an
extended syntax \p{P} that does just that. As far as I know, python re
doesn't. Any idea of a possible alternative?

Apart from manually including the punctuation character range for each
and every language, I don't see how this can be done.

Thank in advance for any suggestions.

John



More information about the Python-list mailing list