Block comments

Tue Dec 11 17:31:08 EST 2007

Bruno Desthuilliers wrote:
> Is the array of lines the appropriate data structure here ?

I've done tokenizers both as an array of lines and as a long string.
The former has seemed easier when the language treats EOL as a
statement separator.

re not letting literal strings in code terminate blocks, I think its
the tokenizer-writer's job to be nice to the tokenizer users, the
first one of which will be me, and I'll definitely have string
literals that enclose what would otherwise be a block end marker.

> While we're at it, you may not know but there are already a couple
> Python packages for building tokenizers/parsers

The tokenizer in the Python library is pretty close to what I want,
but it returns tuples, where I want an array of Token objects. It also
reads the source a line at a time, which seems a bit out of date.
Maybe two or three decades out of date.

Actually, it takes about a day to write a reasonable tokenizer. (That
is, if you are writing using a language that you know.) Since I know
the problem thoroughly, it seemed like a good starting point for
learning Python.

There's a tokenizer I wrote in java at http://www.MartinRinehart.com/src/language/Tokenizer.html
. Actually, that's an HTML page written by my "javasrc" (parallel to
Sun's javadoc) based on the Tokenizer's tokenizing of its own source.

Have I got those quotes right?