My first Python program -- a lexer

Sun Nov 9 18:45:43 EST 2008

On Nov 10, 9:33 am, Thomas Mlynarczyk <tho... at mlynarczyk-webdesign.de>
wrote:
> John Machin schrieb:
>
> >>> dict.iter<anything>() will return its results in essentially random
> >>> order.
> > A list of somethings does seem indicated.
>
> On the other hand: If all my tokens are "mutually exclusive" then,

But they won't *always* be mutually exclusive (another example is
relational operators (< vs <=, > vs >=)) and AFAICT there is nothing
useful that the lexer can do with an assumption/guess/input that they
are mutually exclusive or not.

> in
> theory, the order in which they are tried, should not matter, as at most
> one token could match at any given offset. Still, having the most
> frequent tokens being tried first should improve performance.

Your Lexer class should promise to check the regexes in the order
given. Then the users of your lexer can arrange the order to suit
themselves.

>
> > A dict is a hashtable, intended to provide a mapping from keys to
> > values. It's not intended to have order. In any case your code doesn't
> > use the dict as a mapping.
>
> I map token names to regular expressions. Isn't that a mapping?

Your code uses dict methods; this forces your callers to *create* a
mapping. However (as I said) your code doesn't *use* that mapping --
there is no RHS usage of dict[key] or dict.get(key) etc. In fact I'm
having difficulty imagining what possible practical use there could be
for a mapping from token-name to regex.

> >>>> return "\n".join(
> >>>>     [ "[L:%s]\t[O:%s]\t[%s]\t'%s'" %
> > The first 3 are %s, the last one is '%s'
>
> I only put the single quotes so I could better "see" whitespace in the
> output.

To *best* see whitespace (e.g. Is that a TAB or multiple spaces?), use
%r.

General advice: What you think you see is often not what you've
actually got. repr() is your friend; use it.

Cheers,
John