[Python-checkins] peps: PEP 436, The Argument Clinic DSL, Larry Hastings

barry.warsaw python-checkins at python.org
Mon Feb 25 15:50:43 CET 2013


http://hg.python.org/peps/rev/85758f0f93bc
changeset:   4774:85758f0f93bc
user:        Barry Warsaw <barry at python.org>
date:        Mon Feb 25 09:50:32 2013 -0500
summary:
  PEP 436, The Argument Clinic DSL, Larry Hastings

files:
  pep-0436.txt |  480 +++++++++++++++++++++++++++++++++++++++
  1 files changed, 480 insertions(+), 0 deletions(-)


diff --git a/pep-0436.txt b/pep-0436.txt
new file mode 100644
--- /dev/null
+++ b/pep-0436.txt
@@ -0,0 +1,480 @@
+PEP: 436
+Title: The Argument Clinic DSL
+Version: $Revision$
+Last-Modified: $Date$
+Author: Larry Hastings <larry at hastings.org>
+Discussions-To: Python-Dev <python-dev at python.org>
+Status: Draft
+Type: Standards Track
+Content-Type: text/x-rst
+Created: 22-Feb-2013
+
+
+Abstract
+========
+
+This document proposes "Argument Clinic", a DSL designed to facilitate
+argument processing for built-in functions in the implementation of
+CPython.
+
+
+Rationale and Goals
+===================
+
+The primary implementation of Python, "CPython", is written in a
+mixture of Python and C.  One of the implementation details of CPython
+is what are called "built-in" functions -- functions available to
+Python programs but written in C.  When a Python program calls a
+built-in function and passes in arguments, those arguments must be
+translated from Python values into C values.  This process is called
+"parsing arguments".
+
+As of CPython 3.3, arguments to functions are primarily parsed with
+one of two functions: the original ``PyArg_ParseTuple()``, [1]_ and
+the more modern ``PyArg_ParseTupleAndKeywords()``. [2]_ The former
+function only handles positional parameters; the latter also
+accommodates keyword and keyword-only parameters, and is preferred for
+new code.
+
+``PyArg_ParseTuple()`` was a reasonable approach when it was first
+conceived.  The programmer specified the translation for the arguments
+in a "format string": [3]_ each parameter matched to a "format unit",
+a one-or-two character sequence telling ``PyArg_ParseTuple()`` what
+Python types to accept and how to translate them into the appropriate
+C value for that parameter.  There were only a dozen or so of these
+"format units", and each one was distinct and easy to understand.
+
+Over the years the ``PyArg_Parse`` interface has been extended in
+numerous ways.  The modern API is quite complex, to the point that it
+is somewhat painful to use.  Consider:
+
+  * There are now forty different "format units"; a few are even three
+    characters long.  This makes it difficult to understand what the
+    format string says without constantly cross-indexing it with the
+    documentation.
+  * There are also six meta-format units that may be buried in the
+    format string.  (They are: ``"()|$:;"``.)
+  * The more format units are added, the less likely it is the
+    implementer can pick an easy-to-use mnemonic for the format unit,
+    because the character of choice is probably already in use.  In
+    other words, the more format units we have, the more obtuse the
+    format units become.
+  * Several format units are nearly identical to others, having only
+    subtle differences.  This makes understanding the exact semantics
+    of the format string even harder.
+  * The docstring is specified as a static C string, which is mildly
+    bothersome to read and edit.
+  * When adding a new parameter to a function using
+    ``PyArg_ParseTupleAndKeywords()``, it's necessary to touch six
+    different places in the code: [4]_
+
+      * Declaring the variable to store the argument.
+      * Passing in a pointer to that variable in the correct spot in
+        ``PyArg_ParseTupleAndKeywords()``, also passing in any
+        "length" or "converter" arguments in the correct order.
+      * Adding the name of the argument in the correct spot of the
+        "keywords" array passed in to
+        ``PyArg_ParseTupleAndKeywords()``.
+      * Adding the format unit to the correct spot in the format
+        string.
+      * Adding the parameter to the prototype in the docstring.
+      * Documenting the parameter in the docstring.
+
+  * There is currently no mechanism for builtin functions to provide
+    their "signature" information (see ``inspect.getfullargspec`` and
+    ``inspect.Signature``).  Adding this information using a mechanism
+    similar to the existing ``PyArg_Parse`` functions would require
+    repeating ourselves yet again.
+
+The goal of Argument Clinic is to replace this API with a mechanism
+inheriting none of these downsides:
+
+  * You need specify each parameter only once.
+  * All information about a parameter is kept together in one place.
+  * For each parameter, you specify its type in C; Argument Clinic
+    handles the translation from Python value into C value for you.
+  * Argument Clinic also allows for fine-tuning of argument processing
+    behavior with highly-readable "flags", both per-parameter and
+    applying across the whole function.
+  * Docstrings are written in plain text.
+  * From this, Argument Clinic generates for you all the mundane,
+    repetitious code and data structures CPython needs internally.
+    Once you've specified the interface, the next step is simply to
+    write your implementation using native C types.  Every detail of
+    argument parsing is handled for you.
+
+Future goals of Argument Clinic include:
+
+  * providing signature information for builtins, and
+  * speed improvements to the generated code.
+
+
+DSL Syntax Summary
+==================
+
+The Argument Clinic DSL is specified as a comment embedded in a C
+file, as follows.  The "Example" column on the right shows you sample
+input to the Argument Clinic DSL, and the "Section" column on the left
+specifies what each line represents in turn.
+
+::
+
+ +-----------------------+-----------------------------------------------------+
+ | Section               | Example                                             |
+ +-----------------------+-----------------------------------------------------+
+ | Clinic DSL start      | /*[clinic]                                          |
+ | Function declaration  | module.function_name -> return_annotation           |
+ | Function flags        | flag flag2 flag3=value                              |
+ | Parameter declaration |       type name = default                           |
+ | Parameter flags       |       flag flag2 flag3=value                        |
+ | Parameter docstring   |           Lorem ipsum dolor sit amet, consectetur   |
+ |                       |           adipisicing elit, sed do eiusmod tempor   |
+ | Function docstring    | Lorem ipsum dolor sit amet, consectetur adipisicing |
+ |                       | elit, sed do eiusmod tempor incididunt ut labore et |
+ | Clinic DSL end        | [clinic]*/                                          |
+ | Clinic output         | ...                                                 |
+ | Clinic output end     | /*[clinic end output:<checksum>]*/                  |
+ +-----------------------+-----------------------------------------------------+
+
+
+General Behavior Of the Argument Clinic DSL
+-------------------------------------------
+
+All lines support ``#`` as a line comment delimiter *except*
+docstrings.  Blank lines are always ignored.
+
+Like Python itself, leading whitespace is significant in the Argument
+Clinic DSL.  The first line of the "function" section is the
+declaration; all subsequent lines at the same indent are function
+flags.  Once you indent, the first line is a parameter declaration;
+subsequent lines at that indent are parameter flags.  Indent one more
+time for the lines of the parameter docstring.  Finally, dedent back
+to the same level as the function declaration for the function
+docstring.
+
+
+Function Declaration
+--------------------
+
+The return annotation is optional.  If skipped, the arrow ("``->``")
+must also be omitted.
+
+
+Parameter Declaration
+---------------------
+
+The "type" is a C type.  If it's a pointer type, you must specify a
+single space between the type and the "``*``", and zero spaces between
+the "``*``" and the name.  (e.g. "``PyObject *foo``", not "``PyObject*
+foo``")
+
+The "name" must be a legal C identifier.
+
+The "default" is a Python value.  Default values are optional; if not
+specified you must omit the equals sign too.  Parameters which don't
+have a default are implicitly required.  The default value is
+dynamically assigned, "live" in the generated C code, and although
+it's specified as a Python value, it's translated into a native C
+value in the generated C code.
+
+It's explicitly permitted to end the parameter declaration line with a
+semicolon, though the semicolon is optional.  This is intended to
+allow directly cutting and pasting in declarations from C code.
+However, the preferred style is without the semicolon.
+
+
+Flags
+-----
+
+"Flags" are like "``make -D``" arguments.  They're unordered.  Flags
+lines are parsed much like the shell (specifically, using
+``shlex.split()`` [5]_ ).  You can have as many flag lines as you
+like.  Specifying a flag twice is currently an error.
+
+Supported flags for functions:
+
+``basename``
+  The basename to use for the generated C functions.  By default this
+  is the name of the function from the DSL, only with periods replaced
+  by underscores.
+
+``positional-only``
+  This function only supports positional parameters, not keyword
+  parameters.  See `Functions With Positional-Only Parameters`_ below.
+
+Supported flags for parameters:
+
+``bitwise``
+  If the Python integer passed in is signed, copy the bits directly
+  even if it is negative.  Only valid for unsigned integer types.
+
+``converter``
+  Backwards-compatibility support for parameter "converter"
+  functions. [6]_ The value should be the name of the converter
+  function in C.  Only valid when the type of the parameter is
+  ``void *``.
+
+``default``
+  The Python value to use in place of the parameter's actual default
+  in Python contexts.  Specifically, when specified, this value will
+  be used for the parameter's default in the docstring, and in the
+  ``Signature``.  (TBD: If the string is a valid Python expression
+  which can be rendered into a Python value using ``eval()``, then the
+  result of ``eval()`` on it will be used as the default in the
+  ``Signature``.)  Ignored if there is no default.
+
+``encoding``
+  Encoding to use when encoding a Unicode string to a ``char *``.
+  Only valid when the type of the parameter is ``char *``.
+
+``group=``
+  This parameter is part of a group of options that must either all be
+  specified or none specified.  Parameters in the same "group" must be
+  contiguous.  The value of the group flag is the name used for the
+  group variable, and therefore must be legal as a C identifier.  Only
+  valid for functions marked "``positional-only``"; see `Functions
+  With Positional-Only Parameters`_ below.
+
+``immutable``
+  Only accept immutable values.
+
+``keyword-only``
+  This parameter (and all subsequent parameters) is keyword-only.
+  Keyword-only parameters must also be optional parameters.  Not valid
+  for positional-only functions.
+
+``length``
+  This is an iterable type, and we also want its length.  The DSL will
+  generate a second ``Py_ssize_t`` variable; its name will be this
+  parameter's name appended with "``_length``".
+
+``nullable``
+  ``None`` is a legal argument for this parameter.  If ``None`` is
+  supplied on the Python side, the equivalent C argument will be
+  ``NULL``.  Only valid for pointer types.
+
+``required``
+  Normally any parameter that has a default value is automatically
+  optional.  A parameter that has "required" set will be considered
+  required (non-optional) even if it has a default value.  The
+  generated documentation will also not show any default value.
+
+``types``
+  Space-separated list of acceptable Python types for this object.
+  There are also four special-case types which represent Python
+  protocols:
+
+    * buffer
+    * mapping
+    * number
+    * sequence
+
+``zeroes``
+  This parameter is a string type, and its value should be allowed to
+  have embedded zeroes.  Not valid for all varieties of string
+  parameters.
+
+
+Python Code
+-----------
+
+Argument Clinic also permits embedding Python code inside C files,
+which is executed in-place when Argument Clinic processes the file.
+Embedded code looks like this:
+
+::
+
+    /*[python]
+
+    # this is python code!
+    print("/" + "* Hello world! *" + "/")
+
+    [python]*/
+
+Any Python code is valid.  Python code sections in Argument Clinic can
+also be used to modify Clinic's behavior at runtime; for example, see
+`Extending Argument Clinic`_.
+
+
+Output
+======
+
+Argument Clinic writes its output in-line in the C file, immediately
+after the section of Clinic code.  For "python" sections, the output
+is everything printed using ``builtins.print``.  For "clinic"
+sections, the output is valid C code, including:
+
+  * a ``#define`` providing the correct ``methoddef`` structure for the
+    function
+  * a prototype for the "impl" function -- this is what you'll write
+    to implement this function
+  * a function that handles all argument processing, which calls your
+    "impl" function
+  * the definition line of the "impl" function
+  * and a comment indicating the end of output.
+
+The intention is that you will write the body of your impl function
+immediately after the output -- as in, you write a left-curly-brace
+immediately after the end-of-output comment and write the
+implementation of the builtin in the body there.  (It's a bit strange
+at first, but oddly convenient.)
+
+Argument Clinic will define the parameters of the impl function for
+you.  The function will take the "self" parameter passed in
+originally, all the parameters you define, and possibly some extra
+generated parameters ("length" parameters; also "group" parameters,
+see next section).
+
+Argument Clinic also writes a checksum for the output section.  This
+is a valuable safety feature: if you modify the output by hand, Clinic
+will notice that the checksum doesn't match, and will refuse to
+overwrite the file.  (You can force Clinic to overwrite with the
+"``-f``" command-line argument; Clinic will also ignore the checksums
+when using the "``-o``" command-line argument.)
+
+
+Functions With Positional-Only Parameters
+=========================================
+
+A significant fraction of Python builtins implemented in C use the
+older positional-only API for processing arguments
+(``PyArg_ParseTuple()``).  In some instances, these builtins parse
+their arguments differently based on how many arguments were passed
+in.  This can provide some bewildering flexibility: there may be
+groups of optional parameters, which must either all be specified or
+none specified.  And occasionally these groups are on the *left!* (For
+example: ``curses.window.addch()``.)
+
+Argument Clinic supports these legacy use-cases with a special set of
+flags.  First, set the flag "``positional-only``" on the entire
+function.  Then, for every group of parameters that is collectively
+optional, add a "``group=``" flag with a unique string to all the
+parameters in that group.  Note that these groups are permitted on the
+right *or left* of any required parameters!  However, all groups
+(including the group of required parameters) must be contiguous.
+
+The impl function generated by Clinic will add an extra parameter for
+every group, "``int <group>_group``".  This argument will be nonzero
+if the group was specified on this call, and zero if it was not.
+
+Note that when operating in this mode, you cannot specify default
+arguments.  You can simulate defaults by putting parameters in
+individual groups and detecting whether or not they were specified;
+generally speaking it's better to simply not use "positional-only"
+where it isn't absolutely necessary.  (TBD: It might be possible to
+relax this restriction.  But adding default arguments into the mix of
+groups would seemingly make calculating which groups are active a good
+deal harder.)
+
+Also, note that it's possible to specify a set of groups to a function
+such that there are several valid mappings from the number of
+arguments to a valid set of groups.  If this happens, Clinic will exit
+with an error message.  This should not be a problem, as
+positional-only operation is only intended for legacy use cases, and
+all the legacy functions using this quirky behavior should have
+unambiguous mappings.
+
+
+Current Status
+==============
+
+As of this writing, there is a working prototype implementation of
+Argument Clinic available online. [7]_ The prototype implements the
+syntax above, and generates code using the existing ``PyArg_Parse``
+APIs.  It supports translating to all current format units except
+``"w*"``.  Sample functions using Argument Clinic exercise all major
+features, including positional-only argument parsing.
+
+
+Extending Argument Clinic
+-------------------------
+
+The prototype also currently provides an experimental extension
+mechanism, allowing adding support for new types on-the-fly.  See
+``Modules/posixmodule.c`` in the prototype for an example of its use.
+
+
+Notes / TBD
+===========
+
+* Guido proposed having the "function docstring" be hand-written inline,
+  in the middle of the output, something like this:
+
+  ::
+
+   /*[clinic]
+     ... prototype and parameters (including parameter docstrings) go here
+   [clinic]*/
+   ... some output ...
+   /*[clinic docstring start]*/
+   ... hand-edited function docstring goes here   <-- you edit this by hand!
+   /*[clinic docstring end]*/
+   ... more output
+   /*[clinic output end]*/
+
+  I tried it this way and don't like it -- I think it's clumsy.  I
+  prefer that everything you write goes in one place, rather than
+  having an island of hand-edited stuff in the middle of the DSL
+  output.
+
+* Do we need to support tuple unpacking?  (The "``(OOO)``" style
+  format string.)  Boy I sure hope not.
+
+* What about Python functions that take no arguments?  This syntax
+  doesn't provide for that.  Perhaps a lone indented "None" should
+  mean "no arguments"?
+
+* This approach removes some dynamism / flexibility.  With the
+  existing syntax one could theoretically pass in different encodings
+  at runtime for the "``es``"/"``et``" format units.  AFAICT CPython
+  doesn't do this itself, however it's possible external users might
+  do this.  (Trivia: there are no uses of "``es``" exercised by
+  regrtest, and all the uses of "``et``" exercised are in
+  socketmodule.c, except for one in _ssl.c.  They're all static,
+  specifying the encoding ``"idna"``.)
+
+* Right now the "basename" flag on a function changes the ``#define
+  methoddef`` name too.  Should it, or should the #define'd methoddef
+  name always be ``{module_name}_{function_name}`` ?
+
+
+References
+==========
+
+.. [1] ``PyArg_ParseTuple()``:
+   http://docs.python.org/3/c-api/arg.html#PyArg_ParseTuple
+
+.. [2] ``PyArg_ParseTupleAndKeywords()``:
+   http://docs.python.org/3/c-api/arg.html#PyArg_ParseTupleAndKeywords
+
+.. [3] ``PyArg_`` format units:
+   http://docs.python.org/3/c-api/arg.html#strings-and-buffers
+
+.. [4] Keyword parameters for extension functions:
+   http://docs.python.org/3/extending/extending.html#keyword-parameters-for-extension-functions
+
+.. [5] ``shlex.split()``:
+   http://docs.python.org/3/library/shlex.html#shlex.split
+
+.. [6] ``PyArg_`` "converter" functions, see ``"O&"`` in this section:
+   http://docs.python.org/3/c-api/arg.html#other-objects
+
+.. [7] Argument Clinic prototype:
+   https://bitbucket.org/larry/python-clinic/
+
+
+Copyright
+=========
+
+This document has been placed in the public domain.
+
+
+
+..
+   Local Variables:
+   mode: indented-text
+   indent-tabs-mode: nil
+   sentence-end-double-space: t
+   fill-column: 70
+   coding: utf-8
+   End:

-- 
Repository URL: http://hg.python.org/peps


More information about the Python-checkins mailing list