[Python-ideas] new format spec for iterable types

Wolfgang Maier wolfgang.maier at biologie.uni-freiburg.de
Wed Sep 9 15:41:56 CEST 2015


Thanks for all the feedback!

Just to summarize ideas and to clarify what I had in mind when proposing 
this:

1)
Yes, I would like to have this work with any (or at least most) 
iterables, not just with my own custom type that I used for illustration.
So having this handled by the format method rather than each object's 
__format__ method could make sense. It was just simple to implement it 
in Python through the __format__ method.

Why did I propose * as the first character of the new format spec string?
Because I think you really need some token to state unambiguously[1] 
that what follows is a format specification that involves going through 
the elements of the iterable instead of working on the container object 
itself. I thought that * is most intuitive to understand because of its 
use in unpacking.

[1] unfortunately, in my original proposal the leading * can still be 
ambiguous because *<, *> *= and *^ could mean element joining with <, >, 
= or ^ as separators or aligning of the container's formatted string 
representation using * as the fill character.


Ideally, the * should be the very first thing inside a replacement field 
- pretty much as suggested by Oscar - and should not be part of the 
format spec. This is not feasible through a format spec handled by the 
__format__ method, but through a modified str.format method, i.e., 
that's another argument for this approach. Examples:

'foo {*name:<sep>} bar'.format(name=<expr>)
'foo {*0:<sep>} bar {1}'.format(x, y)
'foo {*:<sep>} bar'.format(x)


2)
As for including an additional format spec to apply to the elements of 
the iterable:
I decided against including this in the original proposal to keep it 
simple and to get feedback on the general idea first.
The problem here is that any solution requires an additional token to 
indicate the boundary between the <separator> part and the element 
format spec. Since you would not want to have anyone's custom format 
spec broken by this, this boils down to disallowing one reserved 
character in the <separator> part, like in Oscar's example:

'foo {*name:<sep>:<fmt>} bar'.format(name=<expr>)

where <sep> cannot contain a colon.

So that character would have to be chosen carefully (both : and | are 
quite readable, but also relatively common element separators I guess).
In addition, the <separator> part should be non-optional (though the 
empty string should be allowed) to guarantee the presence of the 
delimiter token, which avoids accidental splitting of lonely element 
format specs into a "<sep>" and <fmt> part:

# format the elements of name using <fmt>, join them using <sep>
'foo {*name:<sep>:<fmt>} bar'.format(name=<expr>)
# format the elements of name using <fmt>, join them using ''
'foo {*name::<fmt>} bar'.format(name=<expr>)
# a syntax error
'foo {*name:<fmt>} bar'.format(name=<expr>)

On the other hand, these restriction do not look too dramatic given the 
flexibility gain in most situations.

So to sum up how this could work:
If str.format encounters a leading * in a replacement field, it splits 
the format spec (i.e. everything after the first colon) on the first 
occurrence of the <sep>|<fmt> separator (possibly ':' or '|') and does, 
essentially:

<sep>.join(format(e, <fmt>) for e in iterable)

Without the *, it just works the current way.


3)
Finally, the alternative idea of having the new functionality handled by 
a new !converter, like:

"List: {0!j:,}".format([1.2, 3.4, 5.6])

I considered this idea before posting the original proposal, but, in 
addition to requiring a change to str.format (which would need to 
recognize the new token), this approach would need either:

- a new special method (e.g., __join__) to be implemented for every type 
that should support it, which is worse than for my original proposal or

- the str.format method must react directly to the converter flag, which 
is then no different to the above solution just that it uses !j instead 
of *. Personally, I find the * syntax more readable, plus, the !j syntax 
would then suggest that this is a regular converter (calling a special 
method of the object) when, in fact, it is not.
Please correct me, if I misunderstood something about this alternative 
proposal.

Best,
Wolfgang



More information about the Python-ideas mailing list