Regex: Matching characters, but not digits
Martin v. Löwis
loewis at informatik.hu-berlin.de
Tue Nov 5 07:24:45 EST 2002
Thomas Guettler <zopestoller at thomas-guettler.de> writes:
> \w matches characters and digits. But,
> how can I match only characters?
>
> This should work for unicode, too
This is currently not supported, directly. SRE should be able to
support Posix regex categories, such as [:alpha:], and Unicode
categories, such as [:Ll:], but at the moment, it doesn't.
Patches in this area are welcome.
Your best bet is to generate a character class yourself. For Unicode,
you can use
alphaclass = [u"["]
for i in range(32, sys.maxunicode):
c = unichr(i)
if c.isalpha():
alphaclass.append(c)
alphaclass.append(u"]")
alphaclass = "".join(alphaclass)
Compiling this particular regular expression will be expensive, but
matching it won't: it is compiled into a bitmap internally, with a
constant-time test.
Regards,
Martin
More information about the Python-list
mailing list