Other notes

Steve Holden steve at holdenweb.com
Fri Jan 7 06:36:31 EST 2005


Andrew Dalke wrote:

> Bengt Richter:
> 
>>But it does look ahead to recognize += (i.e., it doesn't generate two
>>successive also-legal tokens of '+' and '=')
>>so it seems it should be a simple fix.
> 
> 
> But that works precisely because of the greedy nature of tokenization.
> Given "a+=2" the longest token it finds first is "a" because "a+"
> is not a valid token.  The next token is "+=".  It isn't just "+"
> because "+=" is valid.  And the last token is "2".
> 
[...]

You're absolutely right, of course, Andrew, and personally I don't think 
that this is worth trying to fix. But the original post I responded to 
was suggesting that an LL(1) grammar couldn't disambiguate "1." and 
"1..3", which assertion relied on a slight fuzzing of the lines between 
lexical and syntactical analysis that I didn't want to leave unsharpened.

The fact that Python's existing tokenizer doesn't allow multi-character 
tokens beginning with a dot after a digit (roughly speaking) is what 
makes the whole syntax proposal infeasibly hard to adapt to.

regards
  Steve
-- 
Steve Holden               http://www.holdenweb.com/
Python Web Programming  http://pydish.holdenweb.com/
Holden Web LLC      +1 703 861 4237  +1 800 494 3119



More information about the Python-list mailing list