problems with regex in Japanese?

Martin von Loewis loewis at informatik.hu-berlin.de
Sat Aug 11 06:03:16 EDT 2001


Joe Strout <joe at strout.net> writes:

> > python no longer uses pcre, the pcre based regexp module
> > was replaced by a new unicode-aware implementation called sre (written
> > by Fredrik Lundh). sre is much faster too...
> 
> Wow, I didn't know that.  Where can I find out more about sre?

In Python 2.x, the re module is really sre, not pcre. I recommend not
to use UTF-8 strings, but convert them to Unicode objects, and pass
those into your regular expressions. Please read the re module
documentation for details; take particular notice of the UNICODE flag,
which determines whether character properties will or will not come
from the Unicode character database.

Regards,
Martin




More information about the Python-list mailing list