Letter class in re

Antoon Pardon antoon.pardon at rece.vub.ac.be
Mon Mar 9 10:29:51 EDT 2015


Op 09-03-15 om 13:50 schreef Tim Chase:
> On 2015-03-09 13:26, Antoon Pardon wrote:
>> Op 09-03-15 om 12:17 schreef Tim Chase:
>>>   (?:(?!_|\d)\w)
>> So if I understand correctly the following should be a regular
>> expression for a python3 identifier.
>>
>>   (?:(?!_|\d)\w)\w+
> If you don't have to treat it as an atom, you can simplify that to
> just
>
>   (?!_|\d)\w+
>
> which just means that the first character can't be an underscore or
> digit.
>
> Though for a Py3 identifier, the underscore is acceptable as a first
> character ("__init__"), so you can simplify it even further to just
>
>   (?!\d)\w+

No that doesn't work. To begin with my attempt above shoud have been:

    (?:(?!_|\d)\w)\w*

because an identifier can just be one letter. So when change the '+'
into a "*' in your suggestion I get this:

>>> r = re.compile(r"(?!\d)\w*")
>>> r.match('√')
<_sre.SRE_Match object; span=(0, 0), match=''>

But the √ is not a letter.

I have done some test with:  (?:(?!\d)\w)\w*, which seems to work.




More information about the Python-list mailing list