My first Python program -- a lexer

Robert Lehmann stargaming at gmail.com
Mon Nov 10 03:24:37 EST 2008


On Sun, 09 Nov 2008 15:53:01 +0100, Thomas Mlynarczyk wrote:

> Arnaud Delobelle schrieb:
> 
>> Adding to John's comments, I wouldn't have source as a member of the
>> Lexer object but as an argument of the tokenise() method (which I would
>> make public).  The tokenise method would return what you currently call
>> self.result.  So it would be used like this.
> 
>>>>> mylexer = Lexer(tokens)
>>>>> mylexer.tokenise(source)
>>>>> mylexer.tokenise(another_source)
> 
> At a later stage, I intend to have the source tokenised not all at once,
> but token by token, "just in time" when the parser (yet to be written)
> accesses the next token:

You don't have to introduce a `next` method to your Lexer class. You 
could just transform your `tokenize` method into a generator by replacing 
``self.result.append`` with `yield`. It gives you the just in time part 
for free while not picking your algorithm into tiny unrelated pieces.

>      token = mylexer.next( 'FOO_TOKEN' )
>      if not token: raise Exception( 'FOO token expected.' ) # continue
>      doing something useful with token
> 
> Where next() would return the next token (and advance an internal
> pointer) *if* it is a FOO_TOKEN, otherwise it would return False. This
> way, the total number of regex matchings would be reduced: Only that
> which is expected is "tried out".

Python generators recently (2.5) grew a `send` method. You could use 
`next` for unconditional tokenization and ``mytokenizer.send("expected 
token")`` whenever you expect a special token.

See http://www.python.org/dev/peps/pep-0342/ for details.

HTH,

-- 
Robert "Stargaming" Lehmann



More information about the Python-list mailing list