lexical analysis of python

robert.muller2 at gmail.com robert.muller2 at gmail.com
Tue Mar 10 21:53:59 EDT 2009


On Mar 10, 9:38 pm, Paul McGuire <pt... at austin.rr.com> wrote:
> On Mar 10, 8:31 pm, robert.mull... at gmail.com wrote:
>
>
>
> > I am trying to implement a lexer and parser for a subset of python
> > using lexer and parser generators. (It doesn't matter, but I happen to
> > be using
> > ocamllex and ocamlyacc). I've run into the following annoying problem
> > and hoping someone can tell me what I'm missing. Lexers generated by
> > such tools return a tokens in a stream as they consume the input text.
> > But python's indentation appears to require interruption of that
> > stream. For example, in:
> > def f(x):
> >         statement1;
> >         statement2;
> >               statement3;
> >               statement4;
> > A
>
> > Between the '\n' at the end of statement4 and the A, a lexer for
> > Python should return 2 DEDENT tokens. But there is no way to interject
> > two DEDENT tokens within the token stream between the tokens for
> > NEWLINE and A.  The generated lexer doesn't have anyway to freeze the
> > input text pointer.
>
> > Does this mean that python lexers are all written by hand? If not, how
> > do you do it using your favorite lexer generator?
>
> > Thanks!
>
> > Bob Muller
>
> In pyparsing's indentedBlock expression/helper, I keep a stack of
> column numbers representing indent levels.  When the indent level of a
> line is less than the column number at the top of the stack, I count
> one DEDENT for each level that I need to pop off the stack before I
> get the new indent column.  If I get a column number less than the
> indent column, then I know that this is an illegal indent (doesn't
> line up with previous indent).  Also, when computing the column
> number, be wary of tab handling.
>
> -- Paul

Thank you Paul. I am also using the same stack as suggested in the
documentation:

http://docs.python.org/reference/lexical_analysis.html

I understand the method, but when you say you "count one DEDENT for
each level"
well lets say you counted 3 of them. Do you have a way to interject 3
consecutive
DEDENT tokens into the token stream so that the parser receives them
before it
receives the next real token?

Thanks much!

Bob Muller




More information about the Python-list mailing list