PEP 269 – Pgen Module for Python

Author:: Jonathan Riehl <jriehl at spaceship.com>
Status:: Deferred
Type:: Standards Track
Created:: 24-Aug-2001
Python-Version:: 2.2
Post-History:

Table of Contents

Abstract
Rationale
Specification
Implementation Plan
Limitations
Reference Implementation
References
Copyright

Abstract

Much like the parser module exposes the Python parser, this PEP proposes that the parser generator used to create the Python parser, pgen, be exposed as a module in Python.

Through the course of Pythonic history, there have been numerous discussions about the creation of a Python compiler [1]. These have resulted in several implementations of Python parsers, most notably the parser module currently provided in the Python standard library [2] and Jeremy Hylton’s compiler module [3]. However, while multiple language changes have been proposed [4] [5], experimentation with the Python syntax has lacked the benefit of a Python binding to the actual parser generator used to build Python.

By providing a Python wrapper analogous to Fred Drake Jr.’s parser wrapper, but targeted at the pgen library, the following assertions are made:

Reference implementations of syntax changes will be easier to develop. Currently, a reference implementation of a syntax change would require the developer to use the pgen tool from the command line. The resulting parser data structure would then either have to be reworked to interface with a custom CPython implementation, or wrapped as a C extension module.
Reference implementations of syntax changes will be easier to distribute. Since the parser generator will be available in Python, it should follow that the resulting parser will accessible from Python. Therefore, reference implementations should be available as pure Python code, versus using custom versions of the existing CPython distribution, or as compilable extension modules.
Reference implementations of syntax changes will be easier to discuss with a larger audience. This somewhat falls out of the second assertion, since the community of Python users is most likely larger than the community of CPython developers.
Development of small languages in Python will be further enhanced, since the additional module will be a fully functional LL(1) parser generator.

Specification

The proposed module will be called pgen. The pgen module will contain the following functions:

`parseGrammarFile (fileName) -> AST`

The parseGrammarFile() function will read the file pointed to by fileName and create an AST object. The AST nodes will contain the nonterminal, numeric values of the parser generator meta-grammar. The output AST will be an instance of the AST extension class as provided by the parser module. Syntax errors in the input file will cause the SyntaxError exception to be raised.

`parseGrammarString (text) -> AST`

The parseGrammarString() function will follow the semantics of the parseGrammarFile(), but accept the grammar text as a string for input, as opposed to the file name.

`buildParser (grammarAst) -> DFA`

The buildParser() function will accept an AST object for input and return a DFA (deterministic finite automaton) data structure. The DFA data structure will be a C extension class, much like the AST structure is provided in the parser module. If the input AST does not conform to the nonterminal codes defined for the pgen meta-grammar, buildParser() will throw a ValueError exception.

`parseFile (fileName, dfa, start) -> AST`

The parseFile() function will essentially be a wrapper for the PyParser_ParseFile() C API function. The wrapper code will accept the DFA C extension class, and the file name. An AST instance that conforms to the lexical values in the token module and the nonterminal values contained in the DFA will be output.

`parseString (text, dfa, start) -> AST`

The parseString() function will operate in a similar fashion to the parseFile() function, but accept the parse text as an argument. Much like parseFile() will wrap the PyParser_ParseFile() C API function, parseString() will wrap the PyParser_ParseString() function.

`symbolToStringMap (dfa) -> dict`

The symbolToStringMap() function will accept a DFA instance and return a dictionary object that maps from the DFA’s numeric values for its nonterminals to the string names of the nonterminals as found in the original grammar specification for the DFA.

`stringToSymbolMap (dfa) -> dict`

The stringToSymbolMap() function output a dictionary mapping the nonterminal names of the input DFA to their corresponding numeric values.

Extra credit will be awarded if the map generation functions and parsing functions are also methods of the DFA extension class.

Implementation Plan

A cunning plan has been devised to accomplish this enhancement:

Rename the pgen functions to conform to the CPython naming standards. This action may involve adding some header files to the Include subdirectory.
Move the pgen C modules in the Makefile.pre.in from unique pgen elements to the Python C library.
Make any needed changes to the parser module so the AST extension class understands that there are AST types it may not understand. Cursory examination of the AST extension class shows that it keeps track of whether the tree is a suite or an expression.

Code an additional C module in the Modules directory. The C extension module will implement the DFA extension class and the functions outlined in the previous section.
Add the new module to the build process. Black magic, indeed.

Last modified: 2023-09-09 17:39:29 GMT