[Python-Dev] PEP 3101: Advanced String Formatting

Zachary Pincus zpincus at stanford.edu
Sun Apr 30 19:52:20 CEST 2006


I'm not sure about introducing a special syntax for accessing  
dictionary entries, array elements and/or object attributes *within a  
string formatter*... much less an overloaded one that differs from  
how these elements are accessed in "regular python".

>      Compound names are a sequence of simple names seperated by
>      periods:
>
>          "My name is {0.name} :-\{\}".format(dict(name='Fred'))
>
>      Compound names can be used to access specific dictionary entries,
>      array elements, or object attributes.  In the above example, the
>      '{0.name}' field refers to the dictionary entry 'name' within
>      positional argument 0.

Barring ambiguity about whether .name would mean the "name" attribute  
or the "name" dictionary entry if both were defined, I'm not sure I  
really see the point. How is:
   d = {last:'foo', first:'bar'}
   "My last name is {0.last}, my first name is {0.first}.".format(d)

really that big a win over:
   d = {last:'foo', first:'bar'}
   "My last name is {0}, my first name is {1}.".format(d['last'], d 
['first'])

Plus, the in-string syntax is limited -- e.g. what if I want to call  
a function on an attribute? Unless you want to re-implement all  
python syntax within the formatters, someone will always be able to  
level these sort of complaints. Better, IMO, to provide none of that  
than a restricted subset of the language -- especially if the syntax  
looks and works differently from real python.

Zach Pincus

Program in Biomedical Informatics and Department of Biochemistry
Stanford University School of Medicine

On Apr 29, 2006, at 11:24 AM, Talin wrote:

> PEP: 3101
> Title: Advanced String Formatting
> Version: $Revision$
> Last-Modified: $Date$
> Author: Talin <talin at acm.org>
> Status: Draft
> Type: Standards
> Content-Type: text/plain
> Created: 16-Apr-2006
> Python-Version: 3.0
> Post-History:
>
>
> Abstract
>
>      This PEP proposes a new system for built-in string formatting
>      operations, intended as a replacement for the existing '%' string
>      formatting operator.
>
>
> Rationale
>
>      Python currently provides two methods of string interpolation:
>
>      - The '%' operator for strings. [1]
>
>      - The string.Template module. [2]
>
>      The scope of this PEP will be restricted to proposals for  
> built-in
>      string formatting operations (in other words, methods of the
>      built-in string type).
>
>      The '%' operator is primarily limited by the fact that it is a
>      binary operator, and therefore can take at most two arguments.
>      One of those arguments is already dedicated to the format string,
>      leaving all other variables to be squeezed into the remaining
>      argument.  The current practice is to use either a dictionary  
> or a
>      tuple as the second argument, but as many people have commented
>      [3], this lacks flexibility.  The "all or nothing" approach
>      (meaning that one must choose between only positional arguments,
>      or only named arguments) is felt to be overly constraining.
>
>      While there is some overlap between this proposal and
>      string.Template, it is felt that each serves a distinct need,
>      and that one does not obviate the other.  In any case,
>      string.Template will not be discussed here.
>
>
> Specification
>
>      The specification will consist of 4 parts:
>
>      - Specification of a set of methods to be added to the built-in
>        string class.
>
>      - Specification of a new syntax for format strings.
>
>      - Specification of a new set of class methods to control the
>        formatting and conversion of objects.
>
>      - Specification of an API for user-defined formatting classes.
>
>
> String Methods
>
>      The build-in string class will gain a new method, 'format',
>      which takes takes an arbitrary number of positional and keyword
>      arguments:
>
>          "The story of {0}, {1}, and {c}".format(a, b, c=d)
>
>      Within a format string, each positional argument is identified
>      with a number, starting from zero, so in the above example,  
> 'a' is
>      argument 0 and 'b' is argument 1.  Each keyword argument is
>      identified by its keyword name, so in the above example, 'c' is
>      used to refer to the third argument.
>
>      The result of the format call is an object of the same type
>      (string or unicode) as the format string.
>
>
> Format Strings
>
>      Brace characters ('curly braces') are used to indicate a
>      replacement field within the string:
>
>          "My name is {0}".format('Fred')
>
>      The result of this is the string:
>
>          "My name is Fred"
>
>      Braces can be escaped using a backslash:
>
>          "My name is {0} :-\{\}".format('Fred')
>
>      Which would produce:
>
>          "My name is Fred :-{}"
>
>      The element within the braces is called a 'field'.  Fields  
> consist
>      of a name, which can either be simple or compound, and an  
> optional
>      'conversion specifier'.
>
>      Simple names are either names or numbers.  If numbers, they must
>      be valid decimal numbers; if names, they must be valid Python
>      identifiers.  A number is used to identify a positional argument,
>      while a name is used to identify a keyword argument.
>
>      Compound names are a sequence of simple names seperated by
>      periods:
>
>          "My name is {0.name} :-\{\}".format(dict(name='Fred'))
>
>      Compound names can be used to access specific dictionary entries,
>      array elements, or object attributes.  In the above example, the
>      '{0.name}' field refers to the dictionary entry 'name' within
>      positional argument 0.
>
>      Each field can also specify an optional set of 'conversion
>      specifiers'.  Conversion specifiers follow the field name, with a
>      colon (':') character separating the two:
>
>          "My name is {0:8}".format('Fred')
>
>      The meaning and syntax of the conversion specifiers depends on  
> the
>      type of object that is being formatted, however many of the
>      built-in types will recognize a standard set of conversion
>      specifiers.
>
>      The conversion specifier consists of a sequence of zero or more
>      characters, each of which can consist of any printable character
>      except for a non-escaped '}'.  The format() method does not
>      attempt to intepret the conversion specifiers in any way; it
>      merely passes all of the characters between the first colon ':'
>      and the matching right brace ('}') to the various underlying
>      formatters (described later.)
>
>
> Standard Conversion Specifiers
>
>      For most built-in types, the conversion specifiers will be the
>      same or similar to the existing conversion specifiers used with
>      the '%' operator.  Thus, instead of '%02.2x", you will say
>      '{0:2.2x}'.
>
>      There are a few differences however:
>
>      - The trailing letter is optional - you don't need to say '2.2d',
>        you can instead just say '2.2'.  If the letter is omitted, the
>        value will be converted into its 'natural' form (that is, the
>        form that it take if str() or unicode() were called on it)
>        subject to the field length and precision specifiers (if
>        supplied).
>
>      - Variable field width specifiers use a nested version of the {}
>        syntax, allowing the width specifier to be either a positional
>        or keyword argument:
>
>          "{0:{1}.{2}d}".format(a, b, c)
>
>        (Note: It might be easier to parse if these used a different
>        type of delimiter, such as parens - avoiding the need to create
>        a regex that handles the recursive case.)
>
>      - The support for length modifiers (which are ignored by Python
>        anyway) is dropped.
>
>      For non-built-in types, the conversion specifiers will be  
> specific
>      to that type.  An example is the 'datetime' class, whose
>      conversion specifiers are identical to the arguments to the
>      strftime() function:
>
>          "Today is: {0:%x}".format(datetime.now())
>
>
> Controlling Formatting
>
>      A class that wishes to implement a custom interpretation of its
>      conversion specifiers can implement a __format__ method:
>
>      class AST:
>          def __format__(self, specifiers):
>              ...
>
>      The 'specifiers' argument will be either a string object or a
>      unicode object, depending on the type of the original format
>      string.  The __format__ method should test the type of the
>      specifiers parameter to determine whether to return a string or
>      unicode object.  It is the responsibility of the __format__  
> method
>      to return an object of the proper type.
>
>      string.format() will format each field using the following steps:
>
>       1) See if the value to be formatted has a __format__ method.  If
>          it does, then call it.
>
>       2) Otherwise, check the internal formatter within string.format
>          that contains knowledge of certain builtin types.
>
>       3) Otherwise, call str() or unicode() as appropriate.
>
>
> User-Defined Formatting Classes
>
>      The code that interprets format strings can be called explicitly
>      from user code.  This allows the creation of custom formatter
>      classes that can override the normal formatting rules.
>
>      The string and unicode classes will have a class method called
>      'cformat' that does all the actual work of formatting; The
>      format() method is just a wrapper that calls cformat.
>
>      The parameters to the cformat function are:
>
>          -- The format string (or unicode; the same function handles
>             both.)
>          -- A field format hook (see below)
>          -- A tuple containing the positional arguments
>          -- A dict containing the keyword arguments
>
>      The cformat function will parse all of the fields in the format
>      string, and return a new string (or unicode) with all of the
>      fields replaced with their formatted values.
>
>      For each field, the cformat function will attempt to call the
>      field format hook with the following arguments:
>
>         field_hook(value, conversion, buffer)
>
>      The 'value' field corresponds to the value being formatted, which
>      was retrieved from the arguments using the field name.  (The
>      field_hook has no control over the selection of values, only
>      how they are formatted.)
>
>      The 'conversion' argument is the conversion spec part of the
>      field, which will be either a string or unicode object, depending
>      on the type of the original format string.
>
>      The 'buffer' argument is a Python array object, either a byte
>      array or unicode character array.  The buffer object will contain
>      the partially constructed string; the field hook is free to  
> modify
>      the contents of this buffer if needed.
>
>      The field_hook will be called once per field. The field_hook may
>      take one of two actions:
>
>          1) Return False, indicating that the field_hook will not
>             process this field and the default formatting should be
>             used.  This decision should be based on the type of the
>             value object, and the contents of the conversion string.
>
>          2) Append the formatted field to the buffer, and return True.
>
>
> Alternate Syntax
>
>      Naturally, one of the most contentious issues is the syntax of  
> the
>      format strings, and in particular the markup conventions used to
>      indicate fields.
>
>      Rather than attempting to exhaustively list all of the various
>      proposals, I will cover the ones that are most widely used
>      already.
>
>      - Shell variable syntax: $name and $(name) (or in some variants,
>        ${name}).  This is probably the oldest convention out there,  
> and
>        is used by Perl and many others.  When used without the braces,
>        the length of the variable is determined by lexically scanning
>        until an invalid character is found.
>
>        This scheme is generally used in cases where interpolation is
>        implicit - that is, in environments where any string can  
> contain
>        interpolation variables, and no special subsitution function
>        need be invoked.  In such cases, it is important to prevent the
>        interpolation behavior from occuring accidentally, so the '$'
>        (which is otherwise a relatively uncommonly-used character) is
>        used to signal when the behavior should occur.
>
>        It is the author's opinion, however, that in cases where the
>        formatting is explicitly invoked, that less care needs to be
>        taken to prevent accidental interpolation, in which case a
>        lighter and less unwieldy syntax can be used.
>
>      - Printf and its cousins ('%'), including variations that add a
>        field index, so that fields can be interpolated out of order.
>
>      - Other bracket-only variations.  Various MUDs (Multi-User
>        Dungeons) such as MUSH have used brackets (e.g. [name]) to do
>        string interpolation.  The Microsoft .Net libraries uses braces
>        ({}), and a syntax which is very similar to the one in this
>        proposal, although the syntax for conversion specifiers is  
> quite
>        different. [4]
>
>      - Backquoting.  This method has the benefit of minimal  
> syntactical
>        clutter, however it lacks many of the benefits of a function
>        call syntax (such as complex expression arguments, custom
>        formatters, etc.).
>
>      - Other variations include Ruby's #{}, PHP's {$name}, and so
>        on.
>
>
> Sample Implementation
>
>      A rought prototype of the underlying 'cformat' function has been
>      coded in Python, however it needs much refinement before being
>      submitted.
>
>
> Backwards Compatibility
>
>      Backwards compatibility can be maintained by leaving the existing
>      mechanisms in place.  The new system does not collide with any of
>      the method names of the existing string formatting techniques, so
>      both systems can co-exist until it comes time to deprecate the
>      older system.
>
>
> References
>
>      [1] Python Library Reference - String formating operations
>      http://docs.python.org/lib/typesseq-strings.html
>
>      [2] Python Library References - Template strings
>      http://docs.python.org/lib/node109.html
>
>      [3] [Python-3000] String formating operations in python 3k
>          http://mail.python.org/pipermail/python-3000/2006-April/ 
> 000285.html
>
>      [4] Composite Formatting - [.Net Framework Developer's Guide]
>
> http://msdn.microsoft.com/library/en-us/cpguide/html/ 
> cpconcompositeformatting.asp?frame=true
>
>
> Copyright
>
>      This document has been placed in the public domain.
>
> Local Variables:
> mode: indented-text
> indent-tabs-mode: nil
> sentence-end-double-space: t
> fill-column: 70
> coding: utf-8
> End:
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/ 
> zpincus%40stanford.edu



More information about the Python-Dev mailing list