My first Python program -- a lexer
Robert Lehmann
stargaming at gmail.com
Mon Nov 10 03:24:37 EST 2008
On Sun, 09 Nov 2008 15:53:01 +0100, Thomas Mlynarczyk wrote:
> Arnaud Delobelle schrieb:
>
>> Adding to John's comments, I wouldn't have source as a member of the
>> Lexer object but as an argument of the tokenise() method (which I would
>> make public). The tokenise method would return what you currently call
>> self.result. So it would be used like this.
>
>>>>> mylexer = Lexer(tokens)
>>>>> mylexer.tokenise(source)
>>>>> mylexer.tokenise(another_source)
>
> At a later stage, I intend to have the source tokenised not all at once,
> but token by token, "just in time" when the parser (yet to be written)
> accesses the next token:
You don't have to introduce a `next` method to your Lexer class. You
could just transform your `tokenize` method into a generator by replacing
``self.result.append`` with `yield`. It gives you the just in time part
for free while not picking your algorithm into tiny unrelated pieces.
> token = mylexer.next( 'FOO_TOKEN' )
> if not token: raise Exception( 'FOO token expected.' ) # continue
> doing something useful with token
>
> Where next() would return the next token (and advance an internal
> pointer) *if* it is a FOO_TOKEN, otherwise it would return False. This
> way, the total number of regex matchings would be reduced: Only that
> which is expected is "tried out".
Python generators recently (2.5) grew a `send` method. You could use
`next` for unconditional tokenization and ``mytokenizer.send("expected
token")`` whenever you expect a special token.
See http://www.python.org/dev/peps/pep-0342/ for details.
HTH,
--
Robert "Stargaming" Lehmann
More information about the Python-list
mailing list