unicode categories -- regex

koara koara at atlas.cz
Sat Sep 22 11:57:30 EDT 2007


Hello all -- my question regards special meta characters for the re
module. I saw in the re module documentation about the possibility to
abstract to any alphanumeric unicode character with '\w'. However,
there was no info on constructing patterns for other unicode
categories, such as purely alphabetical characters, or punctuation
symbols etc.

I found that this category information actually IS available in python
-- in the standard module unicodedata. For example,
unicodedata.category(u'.') gives 'Po' for 'Punctuation, other' etc.

So how do i include this information in regular pattern search? Any
ideas? Thanks.


I'm talking about python2.5 here.




More information about the Python-list mailing list