Letter class in re

Antoon Pardon antoon.pardon at rece.vub.ac.be
Mon Mar 9 11:00:51 EDT 2015


Op 09-03-15 om 15:39 schreef Chris Angelico:
> On Tue, Mar 10, 2015 at 1:34 AM, Antoon Pardon
> <antoon.pardon at rece.vub.ac.be> wrote:
>>> There is str.isidentifier, which returns True if something is a valid
>>> identifier name:
>>>
>>>>>> '℮'.isidentifier()
>>> True
>> Which is not very usefull in a context of lexical analysis. I don't need to know
>> if a particular string is useful as an identifier, I want to know which parts of
>> a text are identifiers.
> If you're doing lexical analysis, you probably want a lexer. For
> Python, I would recommend parsing to AST and doing your analysis on
> that; I've had pretty good success doing that, and then using the
> line/column info to go back to the original text if I need it. A regex
> is probably not going to be sufficient for that kind of work.

Maybe I am getting behind, but until now the lexers that I used require a regular
expression per kind of token you want to recognize. At least PLY still seems to
work like that. So if an identifier is one such kind of token, I need a regular
expression that matches what an identifier is.

-- 
Antoon Pardon 




More information about the Python-list mailing list