[Python-Dev] Subtle difference between f-strings and str.format()

Thu Mar 29 13:33:16 EDT 2018

On 3/29/2018 12:13 PM, Nick Coghlan wrote:
> On 29 March 2018 at 21:50, Eric V. Smith <eric at trueblade.com> wrote:
>> #1 seems so complex as to not be worth it, given the likely small overall
>> impact of the optimization to a large program. If the speedup really is
>> sufficiently important for a particular piece of code, I'd suggest just
>> rewriting the code to use f-strings, and the author could then determine if
>> the transformation breaks anything. Maybe write a 2to3 like tool that would
>> identify places where str.format or %-formatting could be replaced by
>> f-strings? I know I'd run it on my code, if it existed. Because the
>> optimization can only work code with literals, I think manually modifying
>> the source code is an acceptable solution if the possible change in
>> semantics implied by #3 are unacceptable.
> 
> While more projects are starting to actively drop Python 2.x support,
> there are also quite a few still straddling the two different
> versions. The "rewrite to f-strings" approach requires explicitly
> dropping support for everything below 3.6, whereas implicit
> optimization of literal based formatting will work even for folks
> preserving backwards compatibility with older versions.

Sure. But 3.6 will be 3 years old before this optimization is released. 
I've been seeing 3.4 support dropping off, and expect to see 3.5 follow 
suit by the time 3.8 is released. Although maybe the thought is to do 
this in a bug-fix release? If we're changing semantics at all, that 
seems like a non-starter.

> As far as the semantics go, perhaps it would be possible to explicitly
> create a tuple as part of the implementation to ensure that the
> arguments are still evaluated in order, and everything gets calculated
> exactly once? This would have the benefit that even format strings
> that used numbered references could be optimised in a fairly
> straightforward way.
> 
>      '{}{}'.format(a, b)
> 
> would become:
> 
>      _hidden_ref = (a, b)
>      f'{_hidden_ref[0]}{_hidden_ref[1]}'
> 
> while:
> 
>      '{1}{0}'.format(a, b)
> 
> would become:
> 
>      _hidden_ref = (a, b)
>      f'{_hidden_ref[1]}{_hidden_ref[0]}'
> 
> This would probably need to be implemented as Serhiy's option 1
> (generating a distinct AST node), which in turn leads to 2a: adding
> extra stack manipulation opcodes in order to more closely replicate
> str.format semantics.

I still think the complexity isn't worth it, but maybe I'm a lone voice 
on this.

Eric.