PEP 292 - Simpler String Substitutions

Barry Warsaw barry at python.org
Sun Aug 22 23:47:12 EDT 2004


Attached is the latest version of PEP 292, which describes simpler
string substitutions.  This PEP is slated for inclusion in Python 2.4
and will likely be checked in before the next alpha release.

The canonical location of the PEP is here:

http://www.python.org/peps/pep-0292.html

Cheers,
-Barry


-------------- next part --------------
PEP: 292
Title: Simpler String Substitutions
Version: $Revision: 1.12 $
Last-Modified: $Date: 2004/08/23 03:31:45 $
Author: barry at python.org (Barry A. Warsaw)
Status: Draft
Type: Standards Track
Created: 18-Jun-2002
Python-Version: 2.4
Post-History: 18-Jun-2002, 23-Mar-2004, 22-Aug-2004


Abstract

    This PEP describes a simpler string substitution feature, also
    known as string interpolation.  This PEP is "simpler" in two
    respects:

    1. Python's current string substitution feature
       (i.e. %-substitution) is complicated and error prone.  This PEP
       is simpler at the cost of some expressiveness.

    2. PEP 215 proposed an alternative string interpolation feature,
       introducing a new `$' string prefix.  PEP 292 is simpler than
       this because it involves no syntax changes and has much simpler
       rules for what substitutions can occur in the string.


Rationale

    Python currently supports a string substitution syntax based on
    C's printf() '%' formatting character[1].  While quite rich,
    %-formatting codes are also error prone, even for
    experienced Python programmers.  A common mistake is to leave off
    the trailing format character, e.g. the `s' in "%(name)s".

    In addition, the rules for what can follow a % sign are fairly
    complex, while the usual application rarely needs such complexity.
    Most scripts need to do some string interpolation, but most of
    those use simple `stringification' formats, i.e. %s or %(name)s
    This form should be made simpler and less error prone.


A Simpler Proposal

    We propose the addition of a new class -- called 'Template', which
    will live in the string module -- derived from the built-in
    unicode type.  The Template class supports new rules for string
    substitution; its value contains placeholders, introduced with the
    $ character.  The following rules for $-placeholders apply:

    1. $$ is an escape; it is replaced with a single $

    2. $identifier names a substitution placeholder matching a mapping
       key of "identifier".  By default, "identifier" must spell a
       Python identifier as defined in [2].  The first non-identifier
       character after the $ character terminates this placeholder
       specification.

    3. ${identifier} is equivalent to $identifier.  It is required
       when valid identifier characters follow the placeholder but are
       not part of the placeholder, e.g. "${noun}ification".

    If the $ character appears at the end of the line, or is followed
    by any other character than those described above, it is treated
    as if it had been escaped, appearing in the resulting string
    unchanged.  NOTE: see open issues below.

    No other characters have special meaning, however it is possible
    to derive from the Template class to define different rules for
    the placeholder.  For example, a derived class could allow for
    periods in the placeholder (e.g. to support a kind of dynamic
    namespace and attribute path lookup).

    Once the Template has been created, substitutions can be performed
    using traditional Python syntax.  For example:

        >>> from string import Template
        >>> mapping = dict(name='Guido', country='the Netherlands')
        >>> s = Template('${name} was born in ${country}')
        >>> print s % mapping
        Guido was born in the Netherlands

    Another class is provided which derives from Template.  This class
    is called 'SafeTemplate' and supports rules identical to those
    above.  The difference between Template instances and SafeTemplate
    instances is that in SafeTemplate if a placeholder is missing from
    the interpolation mapping, no KeyError is raised.  Instead, the
    original placeholder is included in the result string unchanged.
    For example:

        >>> from string import Template, SafeTemplate
        >>> mapping = dict(name='Guido', country='the Netherlands')
        >>> s = Template('$who was born in $country')
        >>> print s % mapping
        Traceback (most recent call last):
          [...traceback omitted...]
        KeyError: u'who'
        >>> s = SafeTemplate('$who was born in $country')
        >>> print s % mapping
        $who was born in the Netherlands


Why `$' and Braces?

    The BDFL said it best: The $ means "substitution" in so many
    languages besides Perl that I wonder where you've been. [...]
    We're copying this from the shell.


Comparison to PEP 215

    PEP 215 describes an alternate proposal for string interpolation.
    Unlike that PEP, this one does not propose any new syntax for
    Python.  All the proposed new features are embodied in a new
    library module.  PEP 215 proposes a new string prefix
    representation such as $"" which signal to Python that a new type
    of string is present.  $-strings would have to interact with the
    existing r-prefixes and u-prefixes, essentially doubling the
    number of string prefix combinations.

    PEP 215 also allows for arbitrary Python expressions inside the
    $-strings, so that you could do things like:

        import sys
        print $"sys = $sys, sys = $sys.modules['sys']"

    which would return

        sys = <module 'sys' (built-in)>, sys = <module 'sys' (built-in)>

    It's generally accepted that the rules in PEP 215 are safe in the
    sense that they introduce no new security issues (see PEP 215,
    "Security Issues" for details).  However, the rules are still
    quite complex, and make it more difficult to see the substitution
    placeholder in the original $-string.

    The interesting thing is that the Template class defined in this
    PEP has nothing to say about the values that are substituted for
    the placeholders.  Thus, with a little extra work, it's possible
    to support PEP 215's functionality using existing Python syntax.

    For example, one could define subclasses of Template and dict that
    allowed for a more complex placeholder syntax and a mapping that
    evaluated those placeholders.


Internationalization

    The implementation supports internationalization magic by keeping
    the original string value intact.  In fact, all the work of the
    special substitution rules are implemented by overriding the
    __mod__() operator.  However the string value of a Template (or
    SafeTemplate) is the string that was passed to its constructor.

    This approach allows a gettext-based internationalized program to
    use the Template instance as a lookup into the catalog; in fact
    gettext doesn't care that the catalog key is a Template.  Because
    the value of the Template is the original $-string, translators
    also never need to use %-strings.  The right thing will happen at
    run-time.


Reference Implementation

    A SourceForge patch[4] is available which implements this
    proposal, include unit tests and documentation changes.


Open Issues

    - Should the Template and SafeTemplate classes convert mapping
      values to strings (or unicodes)?  I.e. what should this code do:

      >>> from string import Template
      >>> Template('The cost was $amount euros') % {'amount': 7}

      Should this raise an exception such as TypeError, or should this
      return the string 'The cose was 7 euros'?

      Proposed resolution: no automatic stringification.

    - The pattern for placeholders in the Template and SafeTemplate
      classes matches Python identifiers.  Some people want to match
      Python attribute paths, e.g. "$os.path.sep".  This can be useful
      in some applications, however note that it is up to the
      interpolation mapping to provide namespace lookup for the
      attribute paths.

      Should we include AttributeTemplate and SafeAttributeTemplate in
      the standard library?  What about more complex patterns such as
      Python expressions?

      Proposed resolution: No, we don't include them for now.  Such
      classes are easily derived, and besides, we're not proposing to
      include any interpolation mappings, and without such a
      specialized mapping, a pattern matching attribute paths or
      expressions aren't useful.

    - Where does the Template and SafeTemplate classes live?  Some
      people have suggested creating a stringtools or stringlib module
      to house these two classes.  The PEP author has proposed a
      re-organization of the existing string module, turning it into a
      string package.

      Proposed resolution: There seems little consensus around either
      suggestion, and since the classes are just a few lines of
      Python, I propose no string module re-organization, but to add
      these two classes to string.py.

    - Should the $-placeholder rules be more strict?  Specifically,
      objections have been raised about 'magically' escaping $'s at
      the end of strings, or in strings like '$100'.  The suggestion
      was that we add another matching group which matches bare $'s,
      raising a ValueError if we find such a match.

      Proposed resolution: There seems to be consensus for strictness
      on the grounds of explicit is better than implicit.


References

    [1] String Formatting Operations
        http://www.python.org/doc/current/lib/typesseq-strings.html

    [2] Identifiers and Keywords
        http://www.python.org/doc/current/ref/identifiers.html

    [3] Guido's python-dev posting from 21-Jul-2002
        http://mail.python.org/pipermail/python-dev/2002-July/026397.html

    [4] Reference Implementation
        http://sourceforge.net/tracker/index.php?func=detail&aid=1014055&group_id=5470&atid=305470

Copyright

    This document has been placed in the public domain.



Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
End:
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 316 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/python-list/attachments/20040822/ea3c15c6/attachment.sig>


More information about the Python-list mailing list