lexical analysis of python

robert.muller2 at gmail.com robert.muller2 at gmail.com
Tue Mar 10 21:31:01 EDT 2009


I am trying to implement a lexer and parser for a subset of python
using lexer and parser generators. (It doesn't matter, but I happen to
be using
ocamllex and ocamlyacc). I've run into the following annoying problem
and hoping someone can tell me what I'm missing. Lexers generated by
such tools return a tokens in a stream as they consume the input text.
But python's indentation appears to require interruption of that
stream. For example, in:
def f(x):
        statement1;
        statement2;
              statement3;
              statement4;
A

Between the '\n' at the end of statement4 and the A, a lexer for
Python should return 2 DEDENT tokens. But there is no way to interject
two DEDENT tokens within the token stream between the tokens for
NEWLINE and A.  The generated lexer doesn't have anyway to freeze the
input text pointer.

Does this mean that python lexers are all written by hand? If not, how
do you do it using your favorite lexer generator?

Thanks!

Bob Muller



More information about the Python-list mailing list