[Python-checkins] peps: Add PEP 445: The Argument Clinic DSL
brett.cannon
python-checkins at python.org
Mon Feb 25 17:40:03 CET 2013
http://hg.python.org/peps/rev/7aa92fb33436
changeset: 4776:7aa92fb33436
user: Brett Cannon <brett at python.org>
date: Mon Feb 25 11:39:56 2013 -0500
summary:
Add PEP 445: The Argument Clinic DSL
files:
pep-0445.txt | 481 +++++++++++++++++++++++++++++++++++++++
1 files changed, 481 insertions(+), 0 deletions(-)
diff --git a/pep-0445.txt b/pep-0445.txt
new file mode 100644
--- /dev/null
+++ b/pep-0445.txt
@@ -0,0 +1,481 @@
+PEP: 445
+Title: The Argument Clinic DSL
+Version: $Revision$
+Last-Modified: $Date$
+Author: Larry Hastings <larry at hastings.org>
+Discussions-To: Python-Dev <python-dev at python.org>
+Status: Draft
+Type: Standards Track
+Content-Type: text/x-rst
+Created: 22-Feb-2013
+
+
+Abstract
+========
+
+This document proposes "Argument Clinic", a DSL designed
+to facilitate argument processing for built-in functions
+in the implementation of CPython.
+
+Rationale and Goals
+===================
+
+The primary implementation of Python, "CPython", is written in
+a mixture of Python and C. One of the implementation details
+of CPython is what are called "built-in" functions--functions
+available to Python programs but written in C. When a
+Python program calls a built-in function and passes in
+arguments, those arguments must be translated from Python
+values into C values. This process is called "parsing arguments".
+
+As of CPython 3.3, arguments to functions are primarily
+parsed with one of two functions: the original
+``PyArg_ParseTuple()``, [1]_ and the more modern
+``PyArg_ParseTupleAndKeywords()``. [2]_
+The former function only handles positional parameters; the
+latter also accomodates keyword and keyword-only parameters,
+and is preferred for new code.
+
+``PyArg_ParseTuple()`` was a reasonable approach when it was
+first concieved. The programmer specified the translation for
+the arguments in a "format string": [3]_ each parameter matched to
+a "format unit", a one-or-two character sequence telling
+``PyArg_ParseTuple()`` what Python types to accept and how
+to translate them into the appropriate C value for that
+parameter. There were only a dozen or so of these "format
+units", and each one was distinct and easy to understand.
+
+Over the years the ``PyArg_Parse`` interface has been extended in
+numerous ways. The modern API is quite complex, to the point
+that it is somewhat painful to use. Consider:
+
+ * There are now forty different "format units"; a few are
+ even three characters long.
+ This overload of symbology makes it difficult to understand
+ what the format string says without constantly cross-indexing
+ it with the documentation.
+ * There are also six meta-format units that may be buried
+ in the format string. (They are: ``"()|$:;"``.)
+ * The more format units are added, the less likely it is the
+ implementor can pick an easy-to-use mnemonic for the format
+ unit, because the character of choice is probably already in
+ use. In other words, the more format units we have, the more
+ obtuse the format units become.
+ * Several format units are nearly identical to others, having
+ only subtle differences. This makes understanding the exact
+ semantics of the format string even harder.
+ * The docstring is specified as a static C string,
+ which is mildly bothersome to read and edit.
+ * When adding a new parameter to a function using
+ ``PyArg_ParseTupleAndKeywords()``, it's necessary to
+ touch six different places in the code: [4]_
+
+ * Declaring the variable to store the argument.
+ * Passing in a pointer to that variable in the correct
+ spot in ``PyArg_ParseTupleAndKeywords()``, also passing
+ in any "length" or "converter" arguments in the correct
+ order.
+ * Adding the name of the argument in the correct spot
+ of the "keywords" array passed in to
+ ``PyArg_ParseTupleAndKeywords()``.
+ * Adding the format unit to the correct spot in the
+ format string.
+ * Adding the parameter to the prototype in the
+ docstring.
+ * Documenting the parameter in the docstring.
+
+ * There is currently no mechanism for builtin functions
+ to provide their "signature" information (see
+ ``inspect.getfullargspec`` and ``inspect.Signature``).
+ Adding this information using a mechanism similar to
+ the existing ``PyArg_Parse`` functions would require
+ repeating ourselves yet again.
+
+The goal of Argument Clinic is to replace this API with a
+mechanism inheriting none of these downsides:
+
+ * You need specify each parameter only once.
+ * All information about a parameter is kept together in one place.
+ * For each parameter, you specify its type in C;
+ Argument Clinic handles the translation from
+ Python value into C value for you.
+ * Argument Clinic also allows for fine-tuning
+ of argument processing behavior with
+ highly-readable "flags", both per-parameter
+ and applying across the whole function.
+ * Docstrings are written in plain text.
+ * From this, Argument Clinic generates for you all
+ the mundane, repetitious code and data structures
+ CPython needs internally. Once you've specified
+ the interface, the next step is simply to write your
+ implementation using native C types. Every detail
+ of argument parsing is handled for you.
+
+Future goals of Argument Clinic include:
+
+ * providing signature information for builtins, and
+ * speed improvements to the generated code.
+
+DSL Syntax Summary
+==================
+
+The Argument Clinic DSL is specified as a comment
+embedded in a C file, as follows. The "Example" column on the
+right shows you sample input to the Argument Clinic DSL,
+and the "Section" column on the left specifies what each line
+represents in turn.
+
+::
+
+ +-----------------------+-----------------------------------------------------+
+ | Section | Example |
+ +-----------------------+-----------------------------------------------------+
+ | Clinic DSL start | /*[clinic] |
+ | Function declaration | module.function_name -> return_annotation |
+ | Function flags | flag flag2 flag3=value |
+ | Parameter declaration | type name = default |
+ | Parameter flags | flag flag2 flag3=value |
+ | Parameter docstring | Lorem ipsum dolor sit amet, consectetur |
+ | | adipisicing elit, sed do eiusmod tempor |
+ | Function docstring | Lorem ipsum dolor sit amet, consectetur adipisicing |
+ | | elit, sed do eiusmod tempor incididunt ut labore et |
+ | Clinic DSL end | [clinic]*/ |
+ | Clinic output | ... |
+ | Clinic output end | /*[clinic end output:<checksum>]*/ |
+ +-----------------------+-----------------------------------------------------+
+
+
+General Behavior Of the Argument Clinic DSL
+-------------------------------------------
+
+All lines support ``#`` as a line comment delimiter *except* docstrings.
+Blank lines are always ignored.
+
+Like Python itself, leading whitespace is significant in the Argument Clinic
+DSL. The first line of the "function" section is the declaration;
+all subsequent lines at the same indent are function flags. Once you indent,
+the first line is a parameter declaration; subsequent lines at that indent
+are parameter flags. Indent one more time for the lines of the parameter
+docstring. Finally, outdent back to the same level as the function
+declaration for the function docstring.
+
+Function Declaration
+--------------------
+
+The return annotation is optional. If skipped, the arrow ("``->``") must also be omitted.
+
+Parameter Declaration
+---------------------
+
+The "type" is a C type. If it's a pointer type, you must specify
+a single space between the type and the "``*``", and zero spaces between
+the "``*``" and the name. (e.g. "``PyObject *foo``", not "``PyObject* foo``")
+
+The "name" must be a legal C identifier.
+
+The "default" is a Python value. Default values are optional;
+if not specified you must omit the equals sign too. Parameters
+which don't have a default are implicitly required. The default
+value is dynamically assigned, "live" in the generated C code,
+and although it's specified as a Python value, it's translated
+into a native C value in the generated C code.
+
+It's explicitly permitted to end the parameter declaration line
+with a semicolon, though the semicolon is optional. This is
+intended to allow directly cutting and pasting in declarations
+from C code. However, the preferred style is without the semicolon.
+
+
+Flags
+-----
+
+"Flags" are like "``make -D``" arguments. They're unordered. Flags lines
+are parsed much like the shell (specifically, using ``shlex.split()`` [5]_ ).
+You can have as many flag lines as you like. Specifying a flag twice
+is currently an error.
+
+Supported flags for functions:
+
+``basename``
+ The basename to use for the generated C functions.
+ By default this is the name of the function from
+ the DSL, only with periods replaced by underscores.
+
+``positional-only``
+ This function only supports positional parameters,
+ not keyword parameters. See `Functions With
+ Positional-Only Parameters`_ below.
+
+Supported flags for parameters:
+
+``bitwise``
+ If the Python integer passed in is signed, copy the
+ bits directly even if it is negative. Only valid
+ for unsigned integer types.
+
+``converter``
+ Backwards-compatibility support for parameter "converter"
+ functions. [6]_ The value should be the name of the converter
+ function in C. Only valid when the type of the parameter
+ is ``void *``.
+
+``default``
+ The Python value to use in place of the parameter's actual
+ default in Python contexts. Specifically, when specified,
+ this value will be used for the parameter's default in the
+ docstring, and in the ``Signature``. (TBD: If the string is a
+ valid Python expression, renderable into a Python value
+ using ``eval()``, then the result of ``eval()`` on it will be used
+ as the default in the ``Signature``.) Ignored if there is no
+ default.
+
+``encoding``
+ Encoding to use when encoding a Unicode string to a ``char *``.
+ Only valid when the type of the parameter is ``char *``.
+
+``group=``
+ This parameter is part of a group of options that must either
+ all be specified or none specified. Parameters in the same
+ "group" must be contiguous. The value of the group flag
+ is the name used for the group variable, and therefore must
+ be legal as a C identifier. Only valid for functions
+ marked "``positional-only``"; see `Functions With
+ Positional-Only Parameters`_ below.
+
+``immutable``
+ Only accept immutable values.
+
+``keyword-only``
+ This parameter (and all subsequent parameters) is
+ keyword-only. Keyword-only parameters must also be
+ optional parameters. Not valid for positional-only functions.
+
+``length``
+ This is an iterable type, and we also want its length. The
+ DSL will generate a second ``Py_ssize_t`` variable;
+ its name will be this parameter's name appended with
+ "``_length``".
+
+``nullable``
+ ``None`` is a legal argument for this parameter. If ``None`` is
+ supplied on the Python side, the equivalent C argument will be
+ ``NULL``. Only valid for pointer types.
+
+``required``
+ Normally any parameter that has a default value is
+ automatically optional. A parameter that has "required"
+ set will be considered required (non-optional) even if
+ it has a default value. The generated documentation
+ will also not show any default value.
+
+``types``
+ Space-separated list of acceptable Python types for this
+ object. There are also four special-case types which
+ represent Python protocols:
+
+ * buffer
+ * mapping
+ * number
+ * sequence
+
+``zeroes``
+ This parameter is a string type, and its value should be
+ allowed to have embedded zeroes. Not valid for all
+ varieties of string parameters.
+
+
+Python Code
+-----------
+
+Argument Clinic also permits embedding Python code inside C files,
+which is executed in-place when Argument Clinic processes the file.
+Embedded code looks like this:
+
+::
+
+ /*[python]
+
+ # this is python code!
+ print("/" + "* Hello world! *" + "/")
+
+ [python]*/
+
+Any Python code is valid. Python code sections in Argument Clinic
+can also be used to modify Clinic's behavior at runtime; for example,
+see `Extending Argument Clinic`_.
+
+
+Output
+======
+
+Argument Clinic writes its output in-line in the C file, immediately after
+the section of Clinic code. For "python" sections, the output is
+everything printed using ``builtins.print``. For "clinic" sections, the
+output is valid C code, including:
+
+ * a ``#define`` providing the correct ``methoddef`` structure for the
+ function
+ * a prototype for the "impl" function--this is what you'll write to
+ implement this function
+ * a function that handles all argument processing, which calls your
+ "impl" function
+ * the definition line of the "impl" function
+ * and a comment indicating the end of output.
+
+The intention is that you will write the body of your impl function
+immediately after the output--as in, you write a left-curly-brace
+immediately after the end-of-output comment and write the implementation
+of the builtin in the body there. (It's a bit strange at first--but oddly
+convenient.)
+
+Argument Clinic will define the parameters of the impl function for you.
+The function will take the "self" parameter passed in originally, all
+the parameters you define, and possibly some extra generated parameters
+("length" parameters; also "group" parameters, see next section).
+
+Argument Clinic also writes a checksum for the output section. This
+is a valuable safety feature: if you modify the output by hand, Clinic
+will notice that the checksum doesn't match, and will refuse to
+overwrite the file. (You can force Clinic to overwrite with the "``-f``"
+command-line argument; Clinic will also ignore the checksums when
+using the "``-o``" command-line argument.)
+
+
+Functions With Positional-Only Parameters
+=========================================
+
+A significant fraction of Python builtins implemented in C use the
+older positional-only API for processing arguments (``PyArg_ParseTuple()``).
+In some instances, these builtins parse their arguments differently
+based on how many arguments were passed in. This can provide some
+bewildering flexibility: there may be groups of optional parameters,
+which must either all be specified or none specified. And occasionally
+these groups are on the *left!* (For example: ``curses.window.addch()``.)
+
+Argument Clinic supports these legacy use-cases with a special set
+of flags. First, set the flag "``positional-only``" on the entire
+function. Then, for every group of parameters that is collectively
+optional, add a "``group=``" flag with a unique string to all the
+parameters in that group. Note that these groups are permitted on
+the right *or left* of any required parameters! However, all groups
+(including the group of required parameters) must be contiguous.
+
+The impl function generated by Clinic will add an extra parameter for
+every group, "``int <group>_group``". This argument will be nonzero if
+the group was specified on this call, and zero if it was not.
+
+Note that when operating in this mode, you cannot specify default
+arguments. You can simulate defaults by putting parameters in
+individual groups and detecting whether or not they were
+specified--but generally speaking it's better to simply not
+use "positional-only" where it isn't absolutely necessary. (TBD: It
+might be possible to relax this restriction. But adding default
+arguments into the mix of groups would seemingly make calculating which
+groups are active a good deal harder.)
+
+Also, note that it's possible--even easy--to specify a set of groups
+to a function such that there are several valid mappings from the number
+of arguments to a valid set of groups. If this happens, Clinic will exit
+with an error message. This should not be a problem, as positional-only
+operation is only intended for legacy use cases, and all the legacy
+functions using this quirky behavior should have unambiguous mappings.
+
+
+Current Status
+==============
+
+As of this writing, there is a working prototype implementation of
+Argument Clinic available online. [7]_ The prototype implements
+the syntax above, and generates code using the existing ``PyArg_Parse``
+APIs. It supports translating to all current format units except ``"w*"``.
+Sample functions using Argument Clinic exercise all major features,
+including positional-only argument parsing.
+
+Extending Argument Clinic
+-------------------------
+
+The prototype also currently provides an experimental extension mechanism,
+allowing adding support for new types on-the-fly. See ``Modules/posixmodule.c``
+in the prototype for an example of its use.
+
+
+Notes / TBD
+===========
+
+* Guido proposed having the "function docstring" be hand-written inline,
+ in the middle of the output, something like this:
+
+ ::
+
+ /*[clinic]
+ ... prototype and parameters (including parameter docstrings) go here
+ [clinic]*/
+ ... some output ...
+ /*[clinic docstring start]*/
+ ... hand-edited function docstring goes here <-- you edit this by hand!
+ /*[clinic docstring end]*/
+ ... more output
+ /*[clinic output end]*/
+
+ I tried it this way and don't like it--I think it's clumsy. I prefer that
+ everything you write goes in one place, rather than having an island of
+ hand-edited stuff in the middle of the DSL output.
+
+* Do we need to support tuple unpacking? (The "``(OOO)``" style format string.)
+ Boy I sure hope not.
+
+* What about Python functions that take no arguments? This syntax doesn't
+ provide for that. Perhaps a lone indented "None" should mean "no arguments"?
+
+* This approach removes some dynamism / flexibility. With the existing
+ syntax one could theoretically pass in different encodings at runtime for
+ the "``es``"/"``et``" format units. AFAICT CPython doesn't do this itself,
+ however it's possible external users might do this. (Trivia: there are no
+ uses of "``es``" exercised by regrtest, and all the uses of "``et``"
+ exercised are in socketmodule.c, except for one in _ssl.c. They're all
+ static, specifying the encoding ``"idna"``.)
+
+* Right now the "basename" flag on a function changes the ``#define methoddef`` name
+ too. Should it, or should the #define'd methoddef name always be
+ ``{module_name}_{function_name}`` ?
+
+
+References
+==========
+
+.. [1] ``PyArg_ParseTuple()``:
+ http://docs.python.org/3/c-api/arg.html#PyArg_ParseTuple
+
+.. [2] ``PyArg_ParseTupleAndKeywords()``:
+ http://docs.python.org/3/c-api/arg.html#PyArg_ParseTupleAndKeywords
+
+.. [3] ``PyArg_`` format units:
+ http://docs.python.org/3/c-api/arg.html#strings-and-buffers
+
+.. [4] Keyword parameters for extension functions:
+ http://docs.python.org/3/extending/extending.html#keyword-parameters-for-extension-functions
+
+.. [5] ``shlex.split()``:
+ http://docs.python.org/3/library/shlex.html#shlex.split
+
+.. [6] ``PyArg_`` "converter" functions, see ``"O&"`` in this section:
+ http://docs.python.org/3/c-api/arg.html#other-objects
+
+.. [7] Argument Clinic prototype:
+ https://bitbucket.org/larry/python-clinic/
+
+Copyright
+=========
+
+This document has been placed in the public domain.
+
+
+
+..
+ Local Variables:
+ mode: indented-text
+ indent-tabs-mode: nil
+ sentence-end-double-space: t
+ fill-column: 70
+ coding: utf-8
+ End:
--
Repository URL: http://hg.python.org/peps
More information about the Python-checkins
mailing list