[Python-Dev] one more thing for 2.2?

Guido van Rossum guido@digicool.com
Thu, 12 Jul 2001 17:28:42 -0400


> > > http://mail.python.org/pipermail/python-list/2001-June/047996.html
> >
> > Wow, an impressive hack.  But a hack!  Lots of special casing, and
> > breaks abstractions: the parser driver is supposed to know nothing
> > about the actual grammar embodied in its tables.
> 
> But does it hurt if it does ? It's not like we use it as a general purpose
> parser right now, and would we really want to use the current parser as a
> general purpose one ?

Well, it *is* used to parse its own input. :-)

> I have to agree that a nice, clean, powerful parser that can deal
> better with ambiguities (an LR parser, is that what it's called ?
> :P)

Yes, why the :P)?

LR parsers deal better with ambiguities at the grammar level --
actually, not so much with real ambiguities, but things that look
ambiguous until you've seen more of the input.  For example an LR
grammar can correctly be told that

   f(a, b) = 12

is invalid; the current LL parser can't.  Therefore this has to be
rejected in a separate pass.  Currently I believe that's the code
generation pass but it could be a separate pass altogether.

> is a much better solution, but in some cases, a hack is better than
> nothing.

Adopting this particular hack means you can never go back.  It
effectively "unreserves" most keywords most of the time, and that
means that you can no longer use other parser technologies to parse
Python.  E.g. suppose someone has a Yacc-based parser for Python.  It
would be quite a feat to hack the Yacc driver to do the same retrying
that his hack does.  I bet it would also require a major effort to get
tokenize.py to work correctly again.

The hack it effectively makes it impossible to give a specification of
the real grammar of the language -- you have to try and see if the
parser accepts something or not.

> No, but it will help with bindings to languages that require
> keywords. .NET comes to mind, again, as does Java. It would also be
> very cool if we could rename pprint.pprint to pprint.print ;P

An approach that might work for this is to pick a FEW keywords
(e.g. those that are not reserved words in C or Java or C++) and add
those to a FEW places in the grammar.  E.g. add a rule

    extended_name: NAME | 'print'   # plus a few others

and then use extended_name instead of NAME in the rules for attribute
selection and function definition:

    funcdef: 'def' extended_name parameters ':' suite
       .
       .
       .
    trailer: '(' [arglist] ')' | '[' subscriptlist ']' | '.' extended_name

This would be unambiguous.

--Guido van Rossum (home page: http://www.python.org/~guido/)