[Python-3000] Format specifier proposal

Tue Aug 14 05:53:26 CEST 2007

On 8/13/07, Talin <talin at acm.org> wrote:
> So I sat down with Guido and as I expected he has simplified my thoughts
> greatly. Based on the conversation we had, I think we both agree on what
> should be done:
>
>
> 1) There will be a new built-in function "format" that formats a single
> field. This function takes two arguments, a value to format, and a
> format specifier string.
>
> The "format" function does exactly the following:
>
>     def format(value, spec):
>        return value.__format__(spec)
>
> (I believe this even works if value is 'None'.)

Yes, assuming the definition of object.__format__ you give later.

> In other words, any type conversion or fallbacks must be done by
> __format__; Any interpretation or parsing of the format specifier is
> also done by __format__.
>
> "format" does not, however, handle the "!r" specifier. That is done by
> the caller of this function (usually the Formatter class.)
>
>
> 2) The various type-specific __format__ methods are allowed to know
> about other types - so 'int' knows about 'float' and so on.
>
> Note that other than the special case of int <--> float, this knowledge
> is one way only, meaning that the dependency graph is a acyclic.

Though we don't necessarily care (witness the exception for
int<->float -- other types could know about each other too, if it's
useful).

> For most types, if they see a type letter that they don't recognize,
> they should coerce to their nearest built-in type (int, float, etc.) and
> re-invoke __format__.

Make that "for numeric types".

One of my favorite examples of non-numeric types are the date, time
and datetime types from the datetime module; here I propose that their
__format__ be defined like this:

  def __format__(self, spec):
      return self.strftime(spec)

> 3) In addition to int.__format__, float.__format__, and str.__format__,
> there will also be object.__format__, which simply coerces the object to
> a string, and calls __format__ on the result.
>
>    class object:
>       def __format__(self, spec):
>          return str(self).__format__(spec)
>
> So in other words, all objects are formattable if they can be converted
> to a string.
>
>
> 4) Explicit type coercion is a separate field from the format spec:
>
>      {name[:format_spec][!coercion]}

Over lunch we discussed putting !coercion first. IMO {foo!r:20} reads
more naturally from left to right: take foo, call repr() on it, then
call format(_, '20') on the resulting string.

> Where 'coercion' can be 'r' (to convert to repr()), 's' (to convert to
> string.) Other letters may be added later based on need.
>
> The coercion field cases the formatter class to attempt to coerce the
> value to the specified type before calling format(value, format_spec)
>
>
> 5) Mini-language for format specifiers:
>
> So I do like your (Ron's) latest proposal, and I am thinking about it
> quite a bit.
>
> Guido suggested (and I am favorable to the idea) that we simply keep the
> 2.5 format syntax, or the slightly more advanced variation that's in the
> PEP now.
>
> This has a couple of advantages:
>
> -- It means that Python programmers won't have to learn a new syntax.
> -- It makes the 2to3 conversion of format strings trivial. (Although
> there are some other difficulties with automatic conversion of '%', but
> they are unrelated to format specifiers.)
>
> Originally I liked the idea of putting the type letter at the front,
> instead of at the back like it is in 2.5. However, when you think about
> it, it actually makes sense to have it at the back. Because the type
> letter is now optional, it won't need to be there most of the time. The
> type letter is really just an optional modifier flag, not a "type" at all.
>
> Two features of your proposal that aren't supported in the old syntax are:
>
>    -- Arbitrary fill characters, as opposed to just '0' and ' '.
>    -- Taking the string value from the left or right.
>
> I'm not sure how much we need the first. The second sounds kind of
> useful though.

The second could be added to the mini-language for strings
(str.__format__); I don't see how it would make sense for numbers. (If
you want the last N digits of an int x, by all means use x%10**N.)

> I'm thinking that we might be able to take your ideas and simply extend
> the old 2.5 syntax, so that it would be backwards compatible. On the
> other hand, it seems to me that once we have a *real* implementation
> (which we will soon), it will be relatively easy for people to
> experiment with new features and syntactical innovations.
>
>
> 6) Finally, Guido stressed that he wants to make sure that the
> implementation supports fields within fields, such as:
>
>     {0:{1}.{2}}
>
> Fortunately, the 'format' function doesn't have to handle this (it only
> formats a single value.) This would be done by the higher-level code.

Yup. Great summary overall!

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)