INDENT/DEDENT tokens considered hampering (Re: Code block literals)

Tue Oct 14 01:06:43 EDT 2003

Bengt Richter wrote:
> The thing is, the current tokenizer doesn't know def from foo, just that they're
> names. So either indenting has to be generated all the time, and the job of
> ignoring it passed on upwards, or the single keyword 'def' could be recognized
> by the parser in a bracketed context, and it would generate a synthetic indent token
> in front of the def name token as wide as if all spaces preceded the def, and then
> continue doing indent/dedent generation like for a normal def, until the def suite closed,
> at which point it would resume ordinary expression processing (if it was within brackets --
> otherwise is would just be a discarded expression evaluated in statement context, and
> in/de/dent processing would be on anyway. (This is speculative until really getting into it ;-)

I think there is a way of handling indentation that would make
changes like this easier to implement, but it would require a
complete re-design of the tokenizing and parsing system.

The basic idea would be to get rid of the indent/dedent tokens
altogether, and have the tokenizer keep track of the indent
level of the line containing the current token, as a separate
state variable.

Then parsing a suite would go something like

    starting_level = current_indent_level
    expect(':')
    expect(NEWLINE)
    while current_indent_level > starting_level:
       parse_statement()

The tokenizer would keep track of the current_indent_level
all the time, even inside brackets, but the parser would
choose whether to take notice of it or not, depending on
what it was doing. So switching back into indent-based
parsing in the middle of a bracketed expression wouldn't
be a problem.

-- 
Greg Ewing, Computer Science Dept,
University of Canterbury,	
Christchurch, New Zealand
http://www.cosc.canterbury.ac.nz/~greg