[Python-Dev] PEP-498: Literal String Formatting

Mon Aug 17 16:13:04 CEST 2015

On 08/16/2015 03:37 PM, Guido van Rossum wrote:
> On Sun, Aug 16, 2015 at 8:55 PM, Eric V. Smith <eric at trueblade.com
> <mailto:eric at trueblade.com>> wrote:
> 
>     Thanks, Paul. Good feedback.
> 
> 
> Indeed, I smiled when I saw Paul's post.
>  
> 
>     Triple quoted and raw strings work like you'd expect, but you're
>     right: the PEP should make this clear.
> 
>     I might drop the leading spaces, for a technical reason having to
>     deal with passing the strings in to str.format. But I agree it's not
>     a big deal one way or the other.
> 
> 
> Hm. I rather like allow optional leading/trailing spaces. Given that we
> support arbitrary expressions, we have to support internal spaces; I
> think that some people would really like to use leading/trailing spaces,
> especially when there's text immediately against the other side of the
> braces, as in
> 
>   f'Stuff{ len(self.busy) }more stuff'
> 
> I also expect it might be useful to allow leading/trailing newlines, if
> they are allowed at all (i.e. inside triple-quoted strings). E.g.
> 
>   f'''Stuff{
>       len(self.busy)
>       }more stuff'''

Okay, I'm sold. This works in my current implementation:

>>> f'''foo
... { 3 }
... bar'''
'foo\n3\nbar'

And since this currently works, there's no implementation specific
reason to disallow leading and trailing whitespace:

>>> '\n{\n3 + \n 1\t\n}\n'.format_map({'\n3 + \n 1\t\n':4})
'\n4\n'

My current plan is to replace an f-string with a call to .format_map:
>>> foo = 100
>>> bar = 20
>>> f'foo: {foo} bar: { bar+1}'

Would become:
'foo: {foo} bar: { bar+1}'.format_map({'foo': 100, ' bar+1': 21})

The string on which format_map is called is the identical string that's
in the source code. With the exception noted in PEP 498, I think this
satisfies the principle of least surprise.

As I've said elsewhere, we could then have some i18n function look up
and replace the string before format_map is called on it. As long as it
leaves the expression text alone, everything will work out fine. There
are some quirks with having the same expression appear twice, if the
expression has side effects. But I'm not so worried about that.

> Here's another thing for everybody's pondering: when tokenizing an
> f-string, I think the pieces could each become tokens in their own
> right. Then the rest of the parsing (and rules about whitespace etc.)
> would become simpler because the grammar would deal with them. E.g. the
> string above would be tokenized as follows:
> 
> f'Stuff{
> len
> (
> self
> .
> busy
> )
> }more stuff'
> 
> The understanding here is that there are these new types of tokens:
> F_STRING_OPEN for f'...{, F_STRING_MIDDLE for }...{, F_STRING_END for
> }...', and I suppose we also need F_STRING_OPEN_CLOSE for f'...' (i.e.
> not containing any substitutions). These token types can then be used in
> the grammar. (A complication would be different kinds of string quotes;
> I propose to handle that in the lexer, otherwise the number of
> open/close token types would balloon out of proportions.)

This would save a few hundred lines of C code. But a quick glance at the
lexer and I can't see how to make the opening quotes agree with the
closing quotes.

I think the i18n case (if we chose to support it) is better served by
having the entire, unaltered source string available at run time. PEP
501 comes to a similar conclusion
(http://legacy.python.org/dev/peps/pep-0501/#preserving-the-unmodified-format-string).

Eric.