Letter class in re
Tim Chase
python.list at tim.thechases.com
Mon Mar 9 11:17:42 EDT 2015
On 2015-03-09 15:29, Antoon Pardon wrote:
> Op 09-03-15 om 13:50 schreef Tim Chase:
> >> (?:(?!_|\d)\w)\w+
> > If you don't have to treat it as an atom, you can simplify that to
> > just
> >
> > (?!_|\d)\w+
> >
> > which just means that the first character can't be an underscore
> > or digit.
> >
> > Though for a Py3 identifier, the underscore is acceptable as a
> > first character ("__init__"), so you can simplify it even further
> > to just
> >
> > (?!\d)\w+
>
> No that doesn't work. To begin with my attempt above shoud have
> been:
>
> (?:(?!_|\d)\w)\w*
Did you actually test my suggestion? The "(?!\d)\w+" means "one or
more Word characters, but the first one can't be a digit" because
the "(?!...)" is zero-width. This should match single-character
strings including a single underscore.
> because an identifier can just be one letter. So when change the '+'
> into a "*' in your suggestion I get this:
>
> >>> r = re.compile(r"(?!\d)\w*")
> >>> r.match('√')
> <_sre.SRE_Match object; span=(0, 0), match=''>
>
> But the √ is not a letter.
Notice that you match an empty string there because the (?!\d) is
zero width, and thus you match 0-or-more-word-characters by matching
nothing. Try either anchoring it with a "$" at the end to see that
it doesn't really match.
-tkc
More information about the Python-list
mailing list