[Python-checkins] peps: PEP 292 inspired competitor to PEP 498
nick.coghlan
python-checkins at python.org
Sat Aug 8 11:20:41 CEST 2015
https://hg.python.org/peps/rev/73eee55cff6d
changeset: 5932:73eee55cff6d
user: Nick Coghlan <ncoghlan at gmail.com>
date: Sat Aug 08 19:20:33 2015 +1000
summary:
PEP 292 inspired competitor to PEP 498
files:
pep-0500.txt | 401 +++++++++++++++++++++++++++++++++++++++
1 files changed, 401 insertions(+), 0 deletions(-)
diff --git a/pep-0500.txt b/pep-0500.txt
new file mode 100644
--- /dev/null
+++ b/pep-0500.txt
@@ -0,0 +1,401 @@
+PEP: 500
+Title: Translation ready string interpolation
+Version: $Revision$
+Last-Modified: $Date$
+Author: Nick Coghlan <ncoghlan at gmail.com>
+Status: Draft
+Type: Standards Track
+Content-Type: text/x-rst
+Created: 08-Aug-2015
+Python-Version: 3.6
+Post-History: 08-Aug-2015
+
+Abstract
+========
+
+PEP 498 proposes new syntactic support for string interpolation that is
+transparent to the compiler, allow name references from the interpolation
+operation full access to containing namespaces (as with any other expression),
+rather than being limited to explicitly name references.
+
+This PEP agrees with the basic motivation of PEP 498, but proposes to focus
+both the syntax and the implementation on the il8n use case, drawing on the
+previous proposals in PEP 292 (which added string.Template) and its predecessor
+PEP 215 (which proposed syntactic support, rather than a runtime string
+manipulation based approach). The text of this PEP currently assumes that the
+reader is familiar with these three previous related proposals.
+
+The interpolation syntax proposed for this PEP is that of PEP 292, but expanded
+to allow arbitrary expressions and format specifiers when using the ``${ref}``
+interpolation syntax. The suggested new string prefix is "i" rather than "f",
+with the intended mnemonics being either "interpolated string" or
+"il8n string"::
+
+ >>> import datetime
+ >>> name = 'Jane'
+ >>> age = 50
+ >>> anniversary = datetime.date(1991, 10, 12)
+ >>> i'My name is $name, my age next year is ${age+1}, my anniversary is ${anniversary:%A, %B %d, %Y}.'
+ 'My name is Jane, my age next year is 51, my anniversary is Saturday, October 12, 1991.'
+ >>> i'She said her name is ${name!r}.'
+ "She said her name is 'Jane'."
+
+This PEP also proposes the introduction of three new builtin functions,
+``__interpolate__``, ``__interpolateb__`` and ``__interpolateu__``, which
+implement key aspects of the interpolation process, and may be overridden in
+accordance with the usual mechanisms for shadowing builtin functions.
+
+This PEP does not propose to remove or deprecate any of the existing
+string formatting mechanisms, as those will remain valuable when formatting
+strings that are present directly in the source code of the application.
+
+The key aim of this PEP that isn't inherited from PEP 498 is to help ensure
+that future Python applications are written in a "translation ready" way, where
+many interface strings that may need to be translated to allow an application
+to be used in multiple languages are flagged as a natural consequence of the
+development process, even though they won't be translated by default.
+
+
+Rationale
+=========
+
+PEP 498 makes interpolating values into strings with full access to Python's
+lexical namespace semantics simpler, but it does so at the cost of introducing
+yet another string interpolation syntax.
+
+The interpolation syntax devised for PEP 292 is deliberately simple so that the
+template strings can be extracted into an il8n message catalog, and passed to
+translators who may not themselves be developers. For these use cases, it is
+important that the interpolation syntax be as simple as possible, as the
+translators are responsible for preserving the substition markers, even as
+they translate the surrounding text. The PEP 292 syntax is also a common mesage
+catalog syntax already supporting by many commercial software translation
+support tools.
+
+PEP 498 correctly points out that the PEP 292 syntax isn't as flexible as that
+introduced for general purpose string formatting in PEP 3101, so this PEP adds
+that flexibility to the ``${ref}`` construct in PEP 292, and allows translation
+tools the option of rejecting usage of that more advanced syntax at runtime,
+rather than categorically rejecting it at compile time. The proposed permitted
+expressions inside ``${ref}`` are exactly as defined in PEP 498.
+
+
+Specification
+=============
+
+In source code, i-strings are string literals that are prefixed by the
+letter 'i'. The string will be parsed into its components at compile time,
+which will then be passed to the new ``__interpolate__`` builtin at runtime.
+
+The 'i' prefix may be combined with 'b', where the 'i' must appear first, in
+which case ``__interpolateb__`` will be called rather than ``__interpolate__``.
+Similarly, 'i' may also be combined with 'u' to call ``__interpolateu__``
+rather than ``__interpolate__``.
+
+The 'i' prefix may also be combined with 'r', with or without 'b' or 'u', to
+produce raw i-strings. This disables backslash escape sequences in the string
+literal as usual, but has no effect on the runtime interpolation behaviour.
+
+In all cases, the only permitted location for the 'i' prefix is before all other
+prefix characters - it indicates a runtime operation, which is largely
+independent of the compile time prefixes (aside from calling different
+interpolation functions when combined with 'b' or 'u').
+
+i-strings are parsed into literals and expressions. Expressions
+appear as either identifiers prefixed with a single "$" character, or
+surrounded be a leading '${' and a trailing '}. The parts of the format string
+that are not expressions are separated out as string literals.
+
+While parsing the string, any doubled ``$$`` is replaced with a single ``$``
+and is considered part of the literal text, rather than as introducing an
+expression.
+
+These components are then organised into 3 parallel tuples:
+
+* parsed format string fields
+* expression text
+* expression values
+
+And then passed to the ``__interpolate__`` builtin at runtime::
+
+ __interpolate__(fields, expressions, values)
+
+The format string field tuple is inspired by the interface of
+``string.Formatter.parse``, and consists of a series of 4-tuples each containing
+a leading literal, together with a trailing field number, format specifier,
+and conversion specifier. If a given substition field has no leading literal
+section, format specifier or conversion specifier, then the corresponding
+elements in the tuple are the empty string. If the final part of the string
+has no trailing substitution field, then the field number, format specifier
+and conversion specifier will all be ``None``.
+
+The expression text is simply the text of each interpolated expression, as it
+appeared in the original string, but without the leading and/or surrounding
+expression markers.
+
+The expression values are the result of evaluating the interpolated expressions
+in the exact runtime context where the i-string appears in the source code.
+
+For the following example i-string::
+
+ i'abc${expr1:spec1}${expr2!r:spec2}def${expr3:!s}ghi $ident $$jkl'``,
+
+the fields tuple would be::
+
+ (
+ ('abc', 0, 'spec1', ''),
+ ('', 1, 'spec2' 'r'),
+ (def', 2, '', 's'),
+ ('ghi', 3, '', ''),
+ ('$jkl', None, None, None)
+ )
+
+For the same example, the expression text and value tuples would be::
+
+ ('expr1', 'expr2', 'expr3', 'ident') # Expression text
+ (expr1, expr2, expr2, ident) # Expression values
+
+The fields and expression text tuples can be constant folded at compile time,
+while the expression values tuple will always need to be constructed at runtime.
+
+The default ``__interpolate__`` implementation would have the following
+semantics, with field processing being defined in terms of the ``format``
+builtin and ``str.format`` conversion specifiers::
+
+ _converter = string.Formatter().convert_field
+
+ def __interpolate__(fields, expressions, values):
+ template_parts = []
+ for leading_text, field_num, format_spec, conversion in fields:
+ template_parts.append(leading_text)
+ if field_num is not None:
+ value = values[field_num]
+ if conversion:
+ value = _converter(value, conversion)
+ field_text = format(value, format_spec)
+ template_parts.append(field_str)
+ return "".join(template_parts)
+
+The default ``__interpolateu__`` implementation would be the
+``__interpolate__`` builtin.
+
+The default ``__interpolateb__`` implementation would be defined in terms of
+the binary mod-formatting reintroduced in PEP 461::
+
+ def __interpolateb__(fields, expressions, values):
+ template_parts = []
+ for leading_data, field_num, format_spec, conversion in fields:
+ template_parts.append(leading_data)
+ if field_num is not None:
+ if conversion:
+ raise ValueError("Conversion specifiers not supported "
+ "in default binary interpolation")
+ value = values[field_num]
+ field_data = ("%" + format_spec) % (value,)
+ template_parts.append(field_data)
+ return b"".join(template_parts)
+
+This definition permits examples like the following::
+
+ >>> data = 10
+ >>> ib'$data'
+ b'10'
+ >>> b'${data:%4x}'
+ b' a'
+ >>> b'${data:#4x}'
+ b' 0xa'
+ >>> b'${data:04X}'
+ b'000A'
+
+
+Expression evaluation
+---------------------
+
+The expressions that are extracted from the string are evaluated in
+the context where the i-string appeared. This means the expression has
+full access to local, nonlocal and global variables. Any valid Python
+expression can be used inside ``${}``, including function and method calls.
+References without the surrounding braces are limited to looking up single
+identifiers.
+
+Because the i-strings are evaluated where the string appears in the
+source code, there is no additional expressiveness available with
+i-strings. There are also no additional security concerns: you could
+have also just written the same expression, not inside of an
+i-string::
+
+ >>> bar=10
+ >>> def foo(data):
+ ... return data + 20
+ ...
+ >>> i'input=$bar, output=${foo(bar)}'
+ 'input=10, output=30'
+
+Is equivalent to::
+
+ >>> 'input={}, output={}'.format(bar, foo(bar))
+ 'input=10, output=30'
+
+Format specifiers
+-----------------
+
+Format specifiers are not interpreted by the i-string parser - that is
+handling at runtime by the called interpolation function.
+
+Concatenating strings
+---------------------
+
+As i-strings are shorthand for a runtime builtin function call, implicit
+concatenation is a syntax error (similar to attempting implicit concatenation
+between bytes and str literals)::
+
+ >>> i"interpolated" "not interpolated"
+ File "<stdin>", line 1
+ SyntaxError: cannot mix interpolation call with plain literal
+
+Error handling
+--------------
+
+Either compile time or run time errors can occur when processing
+i-strings. Compile time errors are limited to those errors that can be
+detected when parsing an i-string into its component tuples. These errors all
+raise SyntaxError.
+
+Unmatched braces::
+
+ >>> i'x=${x'
+ File "<stdin>", line 1
+ SyntaxError: missing '}' in interpolation expression
+
+Invalid expressions::
+
+ >>> i'x=${!x}'
+ File "<fstring>", line 1
+ !x
+ ^
+ SyntaxError: invalid syntax
+
+Run time errors occur when evaluating the expressions inside an
+i-string. See PEP 498 for some examples.
+
+Different interpolation functions may also impose additional runtime
+constraints on acceptable interpolated expressions and other formatting
+details, which will be reported as runtime exceptions.
+
+Leading whitespace in expressions is not skipped
+------------------------------------------------
+
+Unlike PEP 498, leading whitespace in expressions doesn't need to be skipped -
+'$' is not a legal character in Python's syntax, so it can't appear inside
+a ``${}`` field except as part of another string, whether interpolated or not.
+
+
+Internationalising interpolated strings
+=======================================
+
+So far, this PEP has said nothing practical about internationalisation - only
+formatting text using either str.format or bytes.__mod__ semantics depending
+on whether or not a str or bytes object is being interpolated.
+
+Internationalisation enters the picture by overriding the ``__interpolate__``
+builtin on a module-by-module basis. For example, the following implementation
+would delegate interpolation calls to string.Template::
+
+ def _interpolation_fields_to_template(fields, expressions):
+ if not all(expr.isidentifier() for expr in expressions):
+ raise ValueError("Only variable substitions permitted for il8n")
+ template_parts = []
+ for literal_text, field_num, format_spec, conversion in fields:
+ if format_spec:
+ raise ValueError("Format specifiers not permitted for il8n")
+ if conversion:
+ raise ValueError("Conversion specifiers not permitted for il8n")
+ template_parts.append(literal_text)
+ if field_num is not None:
+ template_parts.append("${" + expressions[field_num] + "}")
+ return "".join(template_parts)
+
+ def __interpolate__(fields, expressions, values):
+ catalog_str = _interpolation_fields_to_template(fields, expressions)
+ translated = _(catalog_str)
+ values = {k:v for k, v in zip(expressions, values)}
+ return string.Template(translated).safe_substitute(values)
+
+If a module were to import that definition of __interpolate__ into the
+module namespace, then:
+
+* Any i"translated & interpolated" strings would be translated
+* Any iu"untranslated & interpolated" strings would not be translated
+* Any ib"untranslated & interpolated" strings would not be translated
+* Any other string and bytes literals would not be translated unless explicitly
+ passed to the relevant translation machinery at runtime
+
+This shifts the behaviour from the status quo, where translation support needs
+to be added explicitly to each string requiring translation to one where
+opting *in* to translation is done on a module by module basis, and
+individual interpolated strings can then be opted *out* of translation by
+adding the "u" prefix to the string literal in order to call
+``__interpolateu__`` instead of ``__interpolate__``.
+
+
+Discussion
+==========
+
+Refer to PEP 498 for additional discussion, as several of the points there
+also apply to this PEP.
+
+Preserving the unmodified format string
+---------------------------------------
+
+A lot of the complexity in the il8n example is actually in recreating the
+original format string from its component parts. It may make sense to preserve
+and pass that entire string to the interpolation function, in addition to
+the broken down field definitions.
+
+This approach would also allow translators to more consistently benefit from
+the simplicity of the PEP 292 approach to string formatting (in the example
+above, surrounding braces are added to the catalog strings even for cases that
+don't need them)
+
+
+References
+==========
+
+.. [#] %-formatting
+ (https://docs.python.org/3/library/stdtypes.html#printf-style-string-formatting)
+
+.. [#] str.format
+ (https://docs.python.org/3/library/string.html#formatstrings)
+
+.. [#] string.Template documentation
+ (https://docs.python.org/3/library/string.html#template-strings)
+
+.. [#] PEP 215: String Interpolation
+ (https://www.python.org/dev/peps/pep-0215/)
+
+.. [#] PEP 292: Simpler String Substitutions
+ (https://www.python.org/dev/peps/pep-0215/)
+
+.. [#] PEP 3101: Advanced String Formatting
+ (https://www.python.org/dev/peps/pep-3101/)
+
+.. [#] PEP 498: Literal string formatting
+ (https://www.python.org/dev/peps/pep-0498/)
+
+.. [#] string.Formatter.parse
+ (https://docs.python.org/3/library/string.html#string.Formatter.parse)
+
+Copyright
+=========
+
+This document has been placed in the public domain.
+
+
+..
+ Local Variables:
+ mode: indented-text
+ indent-tabs-mode: nil
+ sentence-end-double-space: t
+ fill-column: 70
+ coding: utf-8
+ End:
--
Repository URL: https://hg.python.org/peps
More information about the Python-checkins
mailing list