My first Python program -- a lexer

Sun Nov 9 09:39:30 EST 2008

John Machin schrieb:

> Be consistent with your punctuation style. I'd suggest *not* having a
> space after ( and before ), as in the previous line. Read
> http://www.python.org/dev/peps/pep-0008/

What were the reasons for preferring (foo) over ( foo )? This PEP gives 
recommendations for coding style, but (naturally) it does not mention 
the reasons why the recommended way is preferrable. I suppose these 
matters have all been discussed -- is there a synopsis available?

>> self.source = re.sub( r"\r?\n|\r\n", "\n", source )

> Firstly, would you not expect to be getting your text from a text file
> (perhaps even one opened with the universal newlines option) i.e. by
> the time it's arrived here, source has already had \r\n changed to \n?

I was not aware of the universal newlines option. This would then indeed 
make my newline conversion superfluous.

> Secondly, that's equivalent to
>    re.sub(r"\n|\r\n|\r\n", "\n", source)

My mistake. I meant r"\r?\n|\r" ("\n", "\r\n" or "\r").

> Thirdly, if source does contain \r\n and there is an error, the
> reported value of offset will be incorrect. Consider retaining the
> offset of the last newline seen, so that your error reporting can
> include the line number and (include or use) the column position in
> the line.

Indeed, I had not thought of that detail -- if I mess with the newlines, 
the offset will be wrong with respect to the original source. But with 
the universal newlines option mentioned above, the problem is already 
solved :-)

>> while self.offset < len( self.source ):

> You may like to avoid getting len(self.source) for each token.

Yes, I should change that. Unless there is a more elegant way do detect 
the end of the source?

>> for name, regex in self.tokens.iteritems():

> dict.iter<anything>() will return its results in essentially random
> order.

Ouch! I must do something about that. Thanks for pointing it out. So if 
I want a certain order, I must use a list of tuples? Or is there a way 
to have order with dicts?

>> return "\n".join(
>>     [ "[L:%s]\t[O:%s]\t[%s]\t'%s'" %

> For avoidance of ambiguity, you may like to change that '%s' to %r

In which way would there be ambiguity? The first two are integers, the 
last two strings.

Thanks for your feedback.

Greetings,
Thomas

-- 
Ce n'est pas parce qu'ils sont nombreux à avoir tort qu'ils ont raison!
(Coluche)