regexps with unicode-aware characterclasses?

Fredrik Lundh fredrik at pythonware.com
Tue Aug 30 10:33:07 EDT 2005


Stefan Rank wrote:

> I know that there is a re.U switch that makes \w match all unicode word
> characters, but there are no subclasses of that ([[:upper:]] or preferably \u).

unicode character classes are not supported by the current RE engine.

it's usually possible to work around this by matching all characters ("\w") in Unicode
mode ("(?u)"), and postprocessing the result to get rid of invalid matches.

</F> 






More information about the Python-list mailing list