[Python-Dev] one more thing for 2.2?

Thomas Wouters thomas@xs4all.net
Fri, 13 Jul 2001 00:42:27 +0200


On Thu, Jul 12, 2001 at 05:28:42PM -0400, Guido van Rossum wrote:

> > I have to agree that a nice, clean, powerful parser that can deal
> > better with ambiguities (an LR parser, is that what it's called ?
> > :P)

> Yes, why the :P)?

Because I was guessing, as I know practically naught about parsers and
parsing techniques. For instance, I was not aware that a yacc-based parser
would be LL(x) (for some small value of x). ':P)' was tongue-in-cheek,
followed by a closing parenthesis.

> > is a much better solution, but in some cases, a hack is better than
> > nothing.
> 
> Adopting this particular hack means you can never go back.  It
> effectively "unreserves" most keywords most of the time, and that
> means that you can no longer use other parser technologies to parse
> Python.  E.g. suppose someone has a Yacc-based parser for Python.  It
> would be quite a feat to hack the Yacc driver to do the same retrying
> that his hack does.  I bet it would also require a major effort to get
> tokenize.py to work correctly again.

[ and ]

> An approach that might work for this is to pick a FEW keywords
> (e.g. those that are not reserved words in C or Java or C++) and add
> those to a FEW places in the grammar.  E.g. add a rule

>     extended_name: NAME | 'print'   # plus a few others
> 
> and then use extended_name instead of NAME in the rules for attribute
> selection and function definition:
> 
>     funcdef: 'def' extended_name parameters ':' suite
>        .
>        .
>        .
>     trailer: '(' [arglist] ')' | '[' subscriptlist ']' | '.' extended_name

> This would be unambiguous.

This has been discussed before. The main problem with this is that no one's
done it :) I've done a quick test-hack, but ran into somany unguarded
'STR(node)' calls in compile.c that expected a NAME, not an extended_name,
that I gave up. It also wouldn't really alleviate the tokenize.py problem --
if adding a few keywords-as-identifiers is doable, so is adding a lot of
them :) And there's the maintenance problem on the Grammar... when adding a
new keyword, you need to carefully consider where to allow it. However, it's
not like adding a new keyword is done more than once a lustrum ;)

But I don't have any real need for keywords as identifiers, so I don't mind
if we keep the current limitations.

-- 
Thomas Wouters <thomas@xs4all.net>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!