Regex: Matching characters, but not digits

Martin v. Löwis loewis at informatik.hu-berlin.de
Tue Nov 5 07:24:45 EST 2002


Thomas Guettler <zopestoller at thomas-guettler.de> writes:

> \w matches characters and digits. But,
> how can I match only characters?
> 
> This should work for unicode, too

This is currently not supported, directly. SRE should be able to
support Posix regex categories, such as [:alpha:], and Unicode
categories, such as [:Ll:], but at the moment, it doesn't.

Patches in this area are welcome.

Your best bet is to generate a character class yourself. For Unicode,
you can use

alphaclass = [u"["]
for i in range(32, sys.maxunicode):
  c = unichr(i)
  if c.isalpha():
    alphaclass.append(c)
alphaclass.append(u"]")
alphaclass = "".join(alphaclass)

Compiling this particular regular expression will be expensive, but
matching it won't: it is compiled into a bitmap internally, with a
constant-time test. 

Regards,
Martin



More information about the Python-list mailing list