Block comments
MartinRinehart at gmail.com
MartinRinehart at gmail.com
Tue Dec 11 17:31:08 EST 2007
Bruno Desthuilliers wrote:
> Is the array of lines the appropriate data structure here ?
I've done tokenizers both as an array of lines and as a long string.
The former has seemed easier when the language treats EOL as a
statement separator.
re not letting literal strings in code terminate blocks, I think its
the tokenizer-writer's job to be nice to the tokenizer users, the
first one of which will be me, and I'll definitely have string
literals that enclose what would otherwise be a block end marker.
> While we're at it, you may not know but there are already a couple
> Python packages for building tokenizers/parsers
The tokenizer in the Python library is pretty close to what I want,
but it returns tuples, where I want an array of Token objects. It also
reads the source a line at a time, which seems a bit out of date.
Maybe two or three decades out of date.
Actually, it takes about a day to write a reasonable tokenizer. (That
is, if you are writing using a language that you know.) Since I know
the problem thoroughly, it seemed like a good starting point for
learning Python.
There's a tokenizer I wrote in java at http://www.MartinRinehart.com/src/language/Tokenizer.html
. Actually, that's an HTML page written by my "javasrc" (parallel to
Sun's javadoc) based on the Tokenizer's tokenizing of its own source.
Have I got those quotes right?
More information about the Python-list
mailing list