[Python-3000] Format specifier proposal

Ron Adam rrr at ronadam.com
Tue Aug 14 07:49:11 CEST 2007



Talin wrote:
> Ron Adam wrote:
>>
>>>     :f<+015.5 # Floating point, left aligned, always show sign,
>>>               # leading zeros, field width 15 (min), 5 decimal places.
>>
>> Which has precedence... left alignment or zero padding?
>>
>> Or should this be an error?
> 
> The answer is: Just ignore that proposal entirely :)

Ok :)

> ------
> 
> So I sat down with Guido and as I expected he has simplified my thoughts 
> greatly. Based on the conversation we had, I think we both agree on what 
> should be done:
> 
> 
> 1) There will be a new built-in function "format" that formats a single 
> field. This function takes two arguments, a value to format, and a 
> format specifier string.
> 
> The "format" function does exactly the following:
> 
>    def format(value, spec):
>       return value.__format__(spec)
 >
> (I believe this even works if value is 'None'.)
> 
> In other words, any type conversion or fallbacks must be done by 
> __format__; Any interpretation or parsing of the format specifier is 
> also done by __format__.
> 
> "format" does not, however, handle the "!r" specifier. That is done by 
> the caller of this function (usually the Formatter class.)
> 
> 
> 2) The various type-specific __format__ methods are allowed to know 
> about other types - so 'int' knows about 'float' and so on.
> 
> Note that other than the special case of int <--> float, this knowledge 
> is one way only, meaning that the dependency graph is a acyclic.
> 
> For most types, if they see a type letter that they don't recognize, 
> they should coerce to their nearest built-in type (int, float, etc.) and 
> re-invoke __format__.

If it coerces the value first, it can then just call format(value, spec).


> 3) In addition to int.__format__, float.__format__, and str.__format__, 
> there will also be object.__format__, which simply coerces the object to 
> a string, and calls __format__ on the result.
> 
>   class object:
>      def __format__(self, spec):
>         return str(self).__format__(spec)
> 
> So in other words, all objects are formattable if they can be converted 
> to a string.
> 
> 
> 4) Explicit type coercion is a separate field from the format spec:
> 
>     {name[:format_spec][!coercion]}
> 
> Where 'coercion' can be 'r' (to convert to repr()), 's' (to convert to 
> string.) Other letters may be added later based on need.
> 
> The coercion field cases the formatter class to attempt to coerce the 
> value to the specified type before calling format(value, format_spec)

So the !letters refer to actual types, where the format specifier letters 
are output format designators mean what ever the object interprets them as.

Hmmm... ok, I see why Guido leans towards putting it before the colon.  In 
a way it's more like a function call and not related to the format 
specifier type at all.

     {repr(name):format_spec}

Heck, it could even be first...

     {r!name:format_spec}

Or maybe because it's closer to name.__repr__ he prefers the name!r ordering?


A wilder idea I was thinking about somewhat related to this was to be able 
to chain format specifiers, but I haven't worked out the details yet.


> 5) Mini-language for format specifiers:
> 
> So I do like your (Ron's) latest proposal, and I am thinking about it 
> quite a bit.

I'm actually testing them before I post them.  That filters out most of the 
really bad ideas.  ;-)

Although I'd also like to see a few more people agree with it before 
committing to something new.

> Guido suggested (and I am favorable to the idea) that we simply keep the 
> 2.5 format syntax, or the slightly more advanced variation that's in the 
> PEP now.
> 
> This has a couple of advantages:
> 
> -- It means that Python programmers won't have to learn a new syntax.
> -- It makes the 2to3 conversion of format strings trivial. (Although 
> there are some other difficulties with automatic conversion of '%', but 
> they are unrelated to format specifiers.)

Yes, the 2 to 3 conversion will be a challenge with a new syntax, but as 
long as the new syntax is richer than the old one, it shouldn't be that 
much trouble.  If we remove things we could do before, then it gets much 
harder.


> Originally I liked the idea of putting the type letter at the front, 
> instead of at the back like it is in 2.5. However, when you think about 
> it, it actually makes sense to have it at the back. Because the type 
> letter is now optional, it won't need to be there most of the time. The 
> type letter is really just an optional modifier flag, not a "type" at all.

The reason it's in the back for % formatting is it serves as the closing 
bracket.  With the {}'s we can put it anywhere it makes that makes the most 
sense.


> Two features of your proposal that aren't supported in the old syntax are:
> 
>   -- Arbitrary fill characters, as opposed to just '0' and ' '.
>   -- Taking the string value from the left or right.
> 
> I'm not sure how much we need the first. The second sounds kind of 
> useful though.

The fill characters are already implemented in the strings rjust, ljust, 
and center methods.

  |  center(...)
  |      S.center(width[, fillchar]) -> string
  |
  |      Return S centered in a string of length width. Padding is
  |      done using the specified fill character (default is a space)

So adding it, is just a matter of calling these with the fillchar.

And as Guido also pointed out... the taking of string values from the left 
and right should work on strings and not numbers.


> I'm thinking that we might be able to take your ideas and simply extend 
> the old 2.5 syntax, so that it would be backwards compatible. On the 
> other hand, it seems to me that once we have a *real* implementation 
> (which we will soon), it will be relatively easy for people to 
> experiment with new features and syntactical innovations.

I'm looking forward to that.  :-)


> 6) Finally, Guido stressed that he wants to make sure that the 
> implementation supports fields within fields, such as:
> 
>    {0:{1}.{2}}

I've been thinking about this also for the use of dynamically formatting 
strings.  Is that the use case he is after?

     "{0:{1},{2}}".format(value, '^40', 'f(20.2)')

Which would first insert {1} and {2} into the string before formatting 0.

      {0:^40,f(20.2)}   Use your favorite syntax of course. ;-)

The items 1, and 2 would probably not be string literals in this case, but 
come from a data source associated to the value.

And of course what actually gets inserted in inner fields can be anything.


> Fortunately, the 'format' function doesn't have to handle this (it only 
> formats a single value.) This would be done by the higher-level code.

Looks like this is moving along nicely now. :-)

Cheers,
    Ron





More information about the Python-3000 mailing list