[Python-ideas] Hooking between lexer and parser

Sat Jun 6 04:57:54 CEST 2015

I don't see why it makes anything simpler.  Your lexing rules just live
alongside your parsing rules.  And I also don't see why it has to be faster
to do the lexing in a separate part of the code.  Wouldn't the parser
generator realize that that some of the rules don't use the stack and so
they would end up just as fast as any lexer?

On Fri, Jun 5, 2015 at 10:55 PM, Ryan Gonzalez <rymg19 at gmail.com> wrote:

> IMO, lexer and parser separation is sometimes great. It also makes
> hand-written parsers much simpler.
>
> "Modern" parsing with no lexer and EBNF can sometimes be slower than the
> classics, especially if one is using an ultra-fast lexer generator such as
> re2c.
>
>
> On June 5, 2015 9:21:08 PM CDT, Neil Girdhar <mistersheik at gmail.com>
> wrote:
>
>> Back in the day, I remember Lex and Yacc, then came Flex and Bison, and
>> then ANTLR, which unified lexing and parsing under one common language.  In
>> general, I like the idea of putting everything together.  I think that
>> because of Python's separation of lexing and parsing, it accepts weird text
>> like "(1if 0else 2)", which is crazy.
>>
>> Here's what I think I want in a parser:
>>
>> Along with the grammar, you also give it code that it can execute as it
>> matches each symbol in a rule.  In Python for example, as it matches each
>> argument passed to a function, it would keep track of the count of *args,
>> **kwargs, and  keyword arguments, and regular arguments, and then raise a
>> syntax error if it encounters anything out of order.  Right now that check
>> is done in validate.c, which is really annoying.
>>
>> I want to specify the lexical rules in the same way that I specify the
>> parsing rules.  And I think (after Andrew elucidates what he means by
>> hooks) I want the parsing hooks to be the same thing as lexing hooks, and I
>> agree with him that hooking into the lexer is useful.
>>
>> I want the parser module to be automatically-generated from the grammar
>> if that's possible (I think it is).
>>
>> Typically each grammar rule is implemented using a class.  I want the
>> code generation to be a method on that class.  This makes changing the AST
>> easy.  For example, it was suggested that we might change the grammar to
>> include a starstar_expr node.  This should be an easy change, but because
>> of the way every node validates its children, which it expects to have a
>> certain tree structure, it would be a big task with almost no payoff.
>>
>> There's also a question of which parsing algorithm you use.  I wish I
>> knew more about the state-of-art parsers.  I was interested because I
>> wanted to use Python to parse my LaTeX files.  I got the impression that
>> https://en.wikipedia.org/wiki/Earley_parser were state of the art, but
>> I'm not sure.
>>
>> I'm curious what other people will contribute to this discussion as I
>> think having no great parsing library is a huge hole in Python.  Having one
>> would definitely allow me to write better utilities using Python.
>>
>>
>> On Fri, Jun 5, 2015 at 6:55 PM, Luciano Ramalho <luciano at ramalho.org>
>> wrote:
>>
>>> On Fri, Jun 5, 2015 at 5:38 PM, Neil Girdhar <mistersheik at gmail.com>
>>> wrote:
>>> > Modern parsers do not separate the grammar from tokenizing, parsing,
>>> and
>>> > validation.  All of these are done in one place, which not only
>>> simplifies
>>> > changes to the grammar, but also protects you from possible
>>> inconsistencies.
>>>
>>> Hi, Neil, thanks for that!
>>>
>>> Having studied only ancient parsers, I'd love to learn new ones. Can
>>> you please post references to modern parsing? Actual parsers, books,
>>> papers, anything you may find valuable.
>>>
>>> I have I hunch you're talking about PEG parsers, but maybe something
>>> else, or besides?
>>>
>>> Thanks!
>>>
>>> Best,
>>>
>>> Luciano
>>>
>>> --
>>> Luciano Ramalho
>>> |  Author of Fluent Python (O'Reilly, 2015)
>>> |     http://shop.oreilly.com/product/0636920032519.do
>>> |  Professor em: http://python.pro.br
>>> |  Twitter: @ramalhoorg
>>>
>>
>> ------------------------------
>>
>> Python-ideas mailing list
>> Python-ideas at python.org
>> https://mail.python.org/mailman/listinfo/python-ideas
>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
>>
> --
> Sent from my Android device with K-9 Mail. Please excuse my brevity.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150605/4490e6cf/attachment-0001.html>