[Python-checkins] gh-98831: Move DSL documentation here from ideas repo (#101629)

Tue Feb 7 00:04:05 EST 2023

https://github.com/python/cpython/commit/694e346a01f407c7f78138331db006dea79f82a8
commit: 694e346a01f407c7f78138331db006dea79f82a8
branch: main
author: Guido van Rossum <guido at python.org>
committer: gvanrossum <gvanrossum at gmail.com>
date: 2023-02-06T21:03:58-08:00
summary:

gh-98831: Move DSL documentation here from ideas repo (#101629)

files:
A Tools/cases_generator/interpreter_definition.md
M Tools/cases_generator/README.md

diff --git a/Tools/cases_generator/README.md b/Tools/cases_generator/README.md
index dc055ead1941..c595a932ac44 100644
--- a/Tools/cases_generator/README.md
+++ b/Tools/cases_generator/README.md
@@ -1,5 +1,8 @@
 # Tooling to generate interpreters
 
+Documentation for the instruction definitions in `Python/bytecodes.c`
+("the DSL") is [here](interpreter_definition.md).
+
 What's currently here:
 
 - `lexer.py`: lexer for C, originally written by Mark Shannon
@@ -7,10 +10,10 @@ What's currently here:
 - `parser.py`: Parser for instruction definition DSL; main class `Parser`
 - `generate_cases.py`: driver script to read `Python/bytecodes.c` and
   write `Python/generated_cases.c.h`
+- `test_generator.py`: tests, require manual running using `pytest`
 
-The DSL for the instruction definitions in `Python/bytecodes.c` is described
-[here](https://github.com/faster-cpython/ideas/blob/main/3.12/interpreter_definition.md).
-Note that there is some dummy C code at the top and bottom of the file
+Note that there is some dummy C code at the top and bottom of
+`Python/bytecodes.c`
 to fool text editors like VS Code into believing this is valid C code.
 
 ## A bit about the parser
diff --git a/Tools/cases_generator/interpreter_definition.md b/Tools/cases_generator/interpreter_definition.md
new file mode 100644
index 000000000000..c7bd38d32ff4
--- /dev/null
+++ b/Tools/cases_generator/interpreter_definition.md
@@ -0,0 +1,409 @@
+# A higher level definition of the bytecode interpreter
+
+## Abstract
+
+The CPython interpreter is defined in C, meaning that the semantics of the
+bytecode instructions, the dispatching mechanism, error handling, and
+tracing and instrumentation are all intermixed.
+
+This document proposes defining a custom C-like DSL for defining the 
+instruction semantics and tools for generating the code deriving from
+the instruction definitions.
+
+These tools would be used to:
+* Generate the main interpreter (done)
+* Generate the tier 2 interpreter
+* Generate documentation for instructions
+* Generate metadata about instructions, such as stack use (done).
+
+Having a single definition file ensures that there is a single source
+of truth for bytecode semantics.
+
+Other tools that operate on bytecodes, like `frame.setlineno`
+and the `dis` module, will be derived from the common semantic
+definition, reducing errors.
+
+## Motivation
+
+The bytecode interpreter of CPython has traditionally been defined as standard
+C code, but with a lot of macros.
+The presence of these macros and the nature of bytecode interpreters means
+that the interpreter is effectively defined in a domain specific language (DSL).
+
+Rather than using an ad-hoc DSL embedded in the C code for the interpreter,
+a custom DSL should be defined and the semantics of the bytecode instructions,
+and the instructions defined in that DSL.
+
+Generating the interpreter decouples low-level details of dispatching
+and error handling from the semantics of the instructions, resulting
+in more maintainable code and a potentially faster interpreter.
+
+It also provides the ability to create and check optimizers and optimization
+passes from the semantic definition, reducing errors.
+
+## Rationale
+
+As we improve the performance of CPython, we need to optimize larger regions
+of code, use more complex optimizations and, ultimately, translate to machine
+code. 
+
+All of these steps introduce the possibility of more bugs, and require more code
+to be written. One way to mitigate this is through the use of code generators.
+Code generators decouple the debugging of the code (the generator) from checking
+the correctness (the DSL input).
+
+For example, we are likely to want a new interpreter for the tier 2 optimizer
+to be added in 3.12. That interpreter will have a different API, a different
+set of instructions and potentially different dispatching mechanism.
+But the instructions it will interpret will be built from the same building
+blocks as the instructions for the tier 1 (PEP 659) interpreter.
+
+Rewriting all the instructions is tedious and error-prone, and changing the
+instructions is a maintenance headache as both versions need to be kept in sync.
+
+By using a code generator and using a common source for the instructions, or 
+parts of instructions, we can reduce the potential for errors considerably.
+
+
+## Specification
+
+This specification is at an early stage and is likely to change considerably.
+
+Syntax
+------
+
+Each op definition has a kind, a name, a stack and instruction stream effect,
+and a piece of C code describing its semantics::
+ 
+```
+  file:
+    (definition | family)+
+
+  definition:
+    "inst" "(" NAME ["," stack_effect] ")" "{" C-code "}"
+    |
+    "op" "(" NAME "," stack_effect ")" "{" C-code "}"
+    |
+    "macro" "(" NAME ")" "=" uop ("+" uop)* ";"
+    |
+    "super" "(" NAME ")" "=" NAME ("+" NAME)* ";"
+ 
+  stack_effect:
+    "(" [inputs] "--" [outputs] ")"
+
+  inputs:
+    input ("," input)*
+
+  outputs:
+    output ("," output)*
+
+  input:
+    object | stream | array
+
+  output:
+    object | array
+
+  uop:
+    NAME | stream
+
+  object:
+    NAME [":" type] [ "if" "(" C-expression ")" ]
+
+  type:
+    NAME
+
+  stream:
+    NAME "/" size
+
+  size:
+    INTEGER
+
+  array:
+    object "[" C-expression "]"
+
+  family:
+    "family" "(" NAME ")" = "{" NAME ("," NAME)+ "}" ";"
+```
+
+The following definitions may occur:
+
+* `inst`: A normal instruction, as previously defined by `TARGET(NAME)` in `ceval.c`.
+* `op`: A part instruction from which macros can be constructed.
+* `macro`: A bytecode instruction constructed from ops and cache effects.
+* `super`: A super-instruction, such as `LOAD_FAST__LOAD_FAST`, constructed from
+  normal or macro instructions.
+
+`NAME` can be any ASCII identifier that is a C identifier and not a C or Python keyword.
+`foo_1` is legal. `$` is not legal, nor is `struct` or `class`.
+
+The optional `type` in an `object` is the C type. It defaults to `PyObject *`.
+The objects before the "--" are the objects on top of the the stack at the start
+of the instruction. Those after the "--" are the objects on top of the the stack
+at the end of the instruction.
+
+An `inst` without `stack_effect` is a transitional form to allow the original C code
+definitions to be copied. It lacks information to generate anything other than the
+interpreter, but is useful for initial porting of code.
+
+Stack effect names may be `unused`, indicating the space is to be reserved
+but no use of it will be made in the instruction definition.
+This is useful to ensure that all instructions in a family have the same
+stack effect.
+
+The number in a `stream` define how many codeunits are consumed from the
+instruction stream. It returns a 16, 32 or 64 bit value.
+If the name is `unused` the size can be any value and that many codeunits
+will be skipped in the instruction stream.
+
+By convention cache effects (`stream`) must precede the input effects.
+
+The name `oparg` is pre-defined as a 32 bit value fetched from the instruction stream.
+
+The C code may include special functions that are understood by the tools as
+part of the DSL.
+
+Those functions include:
+
+* `DEOPT_IF(cond, instruction)`. Deoptimize if `cond` is met.
+* `ERROR_IF(cond, label)`. Jump to error handler if `cond` is true.
+* `DECREF_INPUTS()`. Generate `Py_DECREF()` calls for the input stack effects.
+
+Variables can either be defined in the input, output, or in the C code.
+Variables defined in the input may not be assigned in the C code.
+If an `ERROR_IF` occurs, all values will be removed from the stack;
+they must already be `DECREF`'ed by the code block.
+If a `DEOPT_IF` occurs, no values will be removed from the stack or
+the instruction stream; no values must have been `DECREF`'ed or created.
+
+These requirements result in the following constraints on the use of
+`DEOPT_IF` and `ERROR_IF` in any instruction's code block:
+
+1. Until the last `DEOPT_IF`, no objects may be allocated, `INCREF`ed,
+   or `DECREF`ed.
+2. Before the first `ERROR_IF`, all input values must be `DECREF`ed,
+   and no objects may be allocated or `INCREF`ed, with the exception
+   of attempting to create an object and checking for success using
+   `ERROR_IF(result == NULL, label)`. (TODO: Unclear what to do with
+   intermediate results.)
+3. No `DEOPT_IF` may follow an `ERROR_IF` in the same block.
+
+Semantics
+---------
+
+The underlying execution model is a stack machine.
+Operations pop values from the stack, and push values to the stack.
+They also can look at, and consume, values from the instruction stream.
+
+All members of a family must have the same stack and instruction stream effect.
+
+Examples
+--------
+
+(Another source of examples can be found in the [tests](test_generator.py).)
+
+Some examples:
+
+### Output stack effect
+```C
+    inst ( LOAD_FAST, (-- value) ) {
+        value = frame->f_localsplus[oparg];
+        Py_INCREF(value);
+    }
+```
+This would generate:
+```C
+    TARGET(LOAD_FAST) {
+        PyObject *value;
+        value = frame->f_localsplus[oparg];
+        Py_INCREF(value);
+        PUSH(value);
+        DISPATCH();
+    }
+```
+
+### Input stack effect
+```C
+    inst ( STORE_FAST, (value --) ) {
+        SETLOCAL(oparg, value);
+    }
+```
+This would generate:
+```C
+    TARGET(STORE_FAST) {
+        PyObject *value = PEEK(1);
+        SETLOCAL(oparg, value);
+        STACK_SHRINK(1);
+        DISPATCH();
+    }
+```
+
+### Super-instruction definition
+
+```C
+    super ( LOAD_FAST__LOAD_FAST ) = LOAD_FAST + LOAD_FAST ;
+```
+This might get translated into the following:
+```C
+    TARGET(LOAD_FAST__LOAD_FAST) {
+        PyObject *value;
+        value = frame->f_localsplus[oparg];
+        Py_INCREF(value);
+        PUSH(value);
+        NEXTOPARG();
+        next_instr++;
+        value = frame->f_localsplus[oparg];
+        Py_INCREF(value);
+        PUSH(value);
+        DISPATCH();
+    }
+```
+
+### Input stack effect and cache effect
+```C
+    op ( CHECK_OBJECT_TYPE, (owner, type_version/2 -- owner) ) {
+        PyTypeObject *tp = Py_TYPE(owner);
+        assert(type_version != 0);
+        DEOPT_IF(tp->tp_version_tag != type_version);
+    }
+```
+This might become (if it was an instruction):
+```C
+    TARGET(CHECK_OBJECT_TYPE) {
+        PyObject *owner = PEEK(1);
+        uint32 type_version = read32(next_instr);
+        PyTypeObject *tp = Py_TYPE(owner);
+        assert(type_version != 0);
+        DEOPT_IF(tp->tp_version_tag != type_version);
+        next_instr += 2;
+        DISPATCH();
+    }
+```
+
+### More examples
+
+For explanations see "Generating the interpreter" below.)
+```C
+    op ( CHECK_HAS_INSTANCE_VALUES, (owner -- owner) ) {
+        PyDictOrValues dorv = *_PyObject_DictOrValuesPointer(owner);
+        DEOPT_IF(!_PyDictOrValues_IsValues(dorv));
+    }
+```
+```C
+    op ( LOAD_INSTANCE_VALUE, (owner, index/1 -- null if (oparg & 1), res) ) {
+        res = _PyDictOrValues_GetValues(dorv)->values[index];
+        DEOPT_IF(res == NULL);
+        Py_INCREF(res);
+        null = NULL;
+        Py_DECREF(owner);
+    }
+```
+```C
+    macro ( LOAD_ATTR_INSTANCE_VALUE ) =
+        counter/1 + CHECK_OBJECT_TYPE + CHECK_HAS_INSTANCE_VALUES +
+        LOAD_INSTANCE_VALUE + unused/4 ;
+```
+```C
+    op ( LOAD_SLOT, (owner, index/1 -- null if (oparg & 1), res) ) {
+        char *addr = (char *)owner + index;
+        res = *(PyObject **)addr;
+        DEOPT_IF(res == NULL);
+        Py_INCREF(res);
+        null = NULL;
+        Py_DECREF(owner);
+    }
+```
+```C
+    macro ( LOAD_ATTR_SLOT ) = counter/1 + CHECK_OBJECT_TYPE + LOAD_SLOT + unused/4;
+```
+```C
+    inst ( BUILD_TUPLE, (items[oparg] -- tuple) ) {
+        tuple = _PyTuple_FromArraySteal(items, oparg);
+        ERROR_IF(tuple == NULL, error);
+    }
+```
+```C
+    inst ( PRINT_EXPR ) {
+        PyObject *value = POP();
+        PyObject *hook = _PySys_GetAttr(tstate, &_Py_ID(displayhook));
+        PyObject *res;
+        if (hook == NULL) {
+            _PyErr_SetString(tstate, PyExc_RuntimeError,
+                                "lost sys.displayhook");
+            Py_DECREF(value);
+            goto error;
+        }
+        res = PyObject_CallOneArg(hook, value);
+        Py_DECREF(value);
+        ERROR_IF(res == NULL);
+        Py_DECREF(res);
+    }
+```
+
+### Define an instruction family
+These opcodes all share the same instruction format):
+```C
+    family(load_attr) = { LOAD_ATTR, LOAD_ATTR_INSTANCE_VALUE, LOAD_SLOT } ;
+```
+
+Generating the interpreter
+==========================
+
+The generated C code for a single instruction includes a preamble and dispatch at the end
+which can be easily inserted. What is more complex is ensuring the correct stack effects
+and not generating excess pops and pushes.
+
+For example, in `CHECK_HAS_INSTANCE_VALUES`, `owner` occurs in the input, so it cannot be
+redefined. Thus it doesn't need to written and can be read without adjusting the stack pointer.
+The C code generated for `CHECK_HAS_INSTANCE_VALUES` would look something like:
+
+```C
+    {
+        PyObject *owner = stack_pointer[-1];
+        PyDictOrValues dorv = *_PyObject_DictOrValuesPointer(owner);
+        DEOPT_IF(!_PyDictOrValues_IsValues(dorv));
+    }
+```
+
+When combining ops together to form instructions, temporary values should be used,
+rather than popping and pushing, such that `LOAD_ATTR_SLOT` would look something like:
+
+```C
+    case LOAD_ATTR_SLOT: {
+        PyObject *s1 = stack_pointer[-1];
+        /* CHECK_OBJECT_TYPE */
+        {
+            PyObject *owner = s1;
+            uint32_t type_version = read32(next_instr + 1);
+            PyTypeObject *tp = Py_TYPE(owner);
+            assert(type_version != 0);
+            if (tp->tp_version_tag != type_version) goto deopt;
+        }
+        /* LOAD_SLOT */
+        {
+            PyObject *owner = s1;
+            uint16_t index = *(next_instr + 1 + 2);
+            char *addr = (char *)owner + index;
+            PyObject *null;
+            PyObject *res = *(PyObject **)addr;
+            if (res == NULL) goto deopt;
+            Py_INCREF(res);
+            null = NULL;
+            Py_DECREF(owner);
+            if (oparg & 1) {
+                stack_pointer[0] = null;
+                stack_pointer += 1;
+            }
+            s1 = res;
+        }   
+        next_instr += (1 + 1 + 2 + 1 + 4);
+        stack_pointer[-1] = s1;
+        DISPATCH();
+    }
+```
+
+Other tools
+===========
+
+From the instruction definitions we can generate the stack marking code used in `frame.set_lineno()`,
+and the tables for use by disassemblers.
+