Letter class in re

Antoon Pardon antoon.pardon at rece.vub.ac.be
Tue Mar 10 04:16:43 EDT 2015


Op 09-03-15 om 16:17 schreef Tim Chase:
> On 2015-03-09 15:29, Antoon Pardon wrote:
>> Op 09-03-15 om 13:50 schreef Tim Chase:
>>>>   (?:(?!_|\d)\w)\w+
>>> If you don't have to treat it as an atom, you can simplify that to
>>> just
>>>
>>>   (?!_|\d)\w+
>>>
>>> which just means that the first character can't be an underscore
>>> or digit.
>>>
>>> Though for a Py3 identifier, the underscore is acceptable as a
>>> first character ("__init__"), so you can simplify it even further
>>> to just
>>>
>>>   (?!\d)\w+
>> No that doesn't work. To begin with my attempt above shoud have
>> been:
>>
>>     (?:(?!_|\d)\w)\w*
> Did you actually test my suggestion?  The "(?!\d)\w+" means "one or
> more Word characters, but the first one can't be a digit" because
> the "(?!...)" is zero-width. This should match single-character
> strings including a single underscore.

I had done some tests, but due to a misunderstanding I broke off testing
prematurely. I didn't grasp the look ahead nature of the (?! combination
and saw it just as a negation of the regular expression involved.

But IIUC the (?!\d) will check that the next charachter is not a digit
without advancing the position in the string. So that later checking for
\w+ happens as if (?!\d) hadn't been present. So in effect you have part
of the string that is checked against to sub regular expresssions.

-- 
Antoon Pardon 




More information about the Python-list mailing list