Identifying unicode punctuation characters with Python regex

Shiao multiseed at gmail.com
Fri Nov 14 05:31:09 EST 2008


On Nov 14, 11:27 am, "Martin v. Löwis" <mar... at v.loewis.de> wrote:
> > I'm trying to build a regex in python to identify punctuation
> > characters in all the languages. Some regex implementations support an
> > extended syntax \p{P} that does just that. As far as I know, python re
> > doesn't. Any idea of a possible alternative?
>
> You should use character classes. You can generate them automatically
> from the unicodedata module: check whether unicodedata.category(c)
> starts with "P".
>
> Regards,
> Martin

Thanks Martin. I'll do this.



More information about the Python-list mailing list