regexps with unicode-aware characterclasses?

Tue Aug 30 10:33:07 EDT 2005

Stefan Rank wrote:

> I know that there is a re.U switch that makes \w match all unicode word
> characters, but there are no subclasses of that ([[:upper:]] or preferably \u).

unicode character classes are not supported by the current RE engine.

it's usually possible to work around this by matching all characters ("\w") in Unicode
mode ("(?u)"), and postprocessing the result to get rid of invalid matches.

</F>