My first Python program -- a lexer

Sun Nov 9 14:42:29 EST 2008

On Nov 10, 1:39 am, Thomas Mlynarczyk <tho... at mlynarczyk-webdesign.de>
wrote:
> John Machin schrieb:
>
> > Be consistent with your punctuation style. I'd suggest *not* having a
> > space after ( and before ), as in the previous line. Read
> >http://www.python.org/dev/peps/pep-0008/
>
> What were the reasons for preferring (foo) over ( foo )? This PEP gives
> recommendations for coding style, but (naturally) it does not mention
> the reasons why the recommended way is preferrable. I suppose these
> matters have all been discussed -- is there a synopsis available?

Not AFAIK.
>
> > Thirdly, if source does contain \r\n and there is an error, the
> > reported value of offset will be incorrect. Consider retaining the
> > offset of the last newline seen, so that your error reporting can
> > include the line number and (include or use) the column position in
> > the line.
>
> Indeed, I had not thought of that detail -- if I mess with the newlines,
> the offset will be wrong with respect to the original source. But with
> the universal newlines option mentioned above, the problem is already
> solved :-)

NOT solved. You have TWO problems: (1) Reporting the error location as
(offset from the start of the file) instead of (line number, column
position) would get you an express induction into the User Interface
Hall of Shame. (2) In the case of a file with lines terminated by \r
\n, the offset is ambiguous.

>
> >> while self.offset < len( self.source ):
> > You may like to avoid getting len(self.source) for each token.
>
> Yes, I should change that. Unless there is a more elegant way do detect
> the end of the source?

I see no inelegance in a while statement being used in the manner for
which it was intended, nor any plausible reason for another construct.

> >> for name, regex in self.tokens.iteritems():
> > dict.iter<anything>() will return its results in essentially random
> > order.
>
> Ouch! I must do something about that. Thanks for pointing it out. So if
> I want a certain order, I must use a list of tuples?

A list of somethings does seem indicated.

> Or is there a way
> to have order with dicts?

A dict is a hashtable, intended to provide a mapping from keys to
values. It's not intended to have order. In any case your code doesn't
use the dict as a mapping.

>
> >> return "\n".join(
> >>     [ "[L:%s]\t[O:%s]\t[%s]\t'%s'" %
> > For avoidance of ambiguity, you may like to change that '%s' to %r
>
> In which way would there be ambiguity? The first two are integers, the
> last two strings.

The first 3 are %s, the last one is '%s'

Cheers,
John