Letter class in re

Tim Chase python.list at tim.thechases.com
Mon Mar 9 11:17:42 EDT 2015


On 2015-03-09 15:29, Antoon Pardon wrote:
> Op 09-03-15 om 13:50 schreef Tim Chase:
> >>   (?:(?!_|\d)\w)\w+
> > If you don't have to treat it as an atom, you can simplify that to
> > just
> >
> >   (?!_|\d)\w+
> >
> > which just means that the first character can't be an underscore
> > or digit.
> >
> > Though for a Py3 identifier, the underscore is acceptable as a
> > first character ("__init__"), so you can simplify it even further
> > to just
> >
> >   (?!\d)\w+
> 
> No that doesn't work. To begin with my attempt above shoud have
> been:
> 
>     (?:(?!_|\d)\w)\w*

Did you actually test my suggestion?  The "(?!\d)\w+" means "one or
more Word characters, but the first one can't be a digit" because
the "(?!...)" is zero-width. This should match single-character
strings including a single underscore.

> because an identifier can just be one letter. So when change the '+'
> into a "*' in your suggestion I get this:
> 
> >>> r = re.compile(r"(?!\d)\w*")
> >>> r.match('√')
> <_sre.SRE_Match object; span=(0, 0), match=''>
> 
> But the √ is not a letter.

Notice that you match an empty string there because the (?!\d) is
zero width, and thus you match 0-or-more-word-characters by matching
nothing.  Try either anchoring it with a "$" at the end to see that
it doesn't really match.

-tkc








More information about the Python-list mailing list