Where regexs listed for Python language's tokenizer/lexer?

Robert Kern robert.kern at gmail.com
Sat Sep 12 19:07:05 EDT 2009


Dennis Lee Bieber wrote:
> On Fri, 11 Sep 2009 23:10:39 -0700 (PDT), Chris Seberino
> <cseberino at gmail.com> declaimed the following in
> gmane.comp.python.general:
> 
>> Where regexs listed for Python language's tokenizer/lexer?
>>
>> If I'm not mistaken, the grammar is not sufficient to specify the
>> language....
>> you also need to specify the regexs that define the tokens
>> right?..where is that?
>>
> 	Pardon... I've been out of the "market", but I don't recall EVER
> seeing a "regex" used in a textbook for compiler/interpreter design.
> 
> 	BNF (or Pascal's bubble diagram equivalent) has always been used to
> define the syntactical components in those books in my possession, and
> parsers (tokenizers) were written using those implied algorithms (if the
> first character is numeric or "." it starts a number, otherwise treat it
> as an identifier, etc.),

In actual implementations of lexers and the lexical analysis components of 
parsers, regexes are fairly common. For example, from ply:

   http://www.dabeaz.com/ply/ply.html#ply_nn6

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
  that is made terrible by our own mad attempt to interpret it as though it had
  an underlying truth."
   -- Umberto Eco




More information about the Python-list mailing list