Letter class in re

Albert-Jan Roskam fomcl at yahoo.com
Mon Mar 9 09:33:23 EDT 2015


--------------------------------------------
On Mon, 3/9/15, Tim Chase <python.list at tim.thechases.com> wrote:

 Subject: Re: Letter class in re
 To: python-list at python.org
 Date: Monday, March 9, 2015, 12:17 PM
 
 On 2015-03-09 11:37,
 Wolfgang Maier wrote:
 > On 03/09/2015
 11:23 AM, Antoon Pardon wrote:
 >> Does
 anyone know what regular expression to use for a sequence
 of
 >> letters? There is a class for
 alphanumerics but I can't find one
 >> for just letters, which I find odd.
 > 
 > how about [a-zA-Z]
 ?
 
 That breaks if you have
 Unicode letters.  While ugly, since "\w" is
 composed of "letters, numbers, and
 underscores", you can assert that
 the
 "\w" you find is not a number or underscore by
 using
 
   (?:(?!_|\d)\w)
 

I was going to make the same remark, but with a slightly different solution: 
In [1]: repr(re.search("[a-zA-Z]", "é"))
Out[1]: 'None'
 
In [2]: repr(re.search(u"[^\d\W_]+", u"é", re.I | re.U))
Out[2]: '<_sre.SRE_Match object at 0x027CDB10>'

"[^\d\W_]+" means something like "one or more (+) of 'not (a digit, a non-word, an underscore)'.




More information about the Python-list mailing list