[Python-ideas] String interpolation for all literal strings

Eric V. Smith eric at trueblade.com
Fri Aug 7 13:52:19 CEST 2015


On 8/7/2015 6:13 AM, Guido van Rossum wrote:
> On Fri, Aug 7, 2015 at 11:50 AM, Nick Coghlan <ncoghlan at gmail.com
> <mailto:ncoghlan at gmail.com>> wrote:
> 
>     On 7 August 2015 at 19:03, Nathaniel Smith <njs at pobox.com
>     <mailto:njs at pobox.com>> wrote:
>     > On Fri, Aug 7, 2015 at 1:49 AM, Guido van Rossum <guido at python.org <mailto:guido at python.org>> wrote:
>     >> On Fri, Aug 7, 2015 at 9:33 AM, Nick Coghlan <ncoghlan at gmail.com <mailto:ncoghlan at gmail.com>> wrote:
>     >>>
>     >>> We could potentially make f-strings translation friendly by
>     >>> introducing a bit of indirection into the f-string design: an
>     >>> __interpolate__ builtin, along the lines of __import__.
>     >>
>     >>
>     >> This seems interesting, but doesn't it require sys._getframe() or similar
>     >> again? Translations may need to reorder variables. (Or even change the
>     >> expressions? E.g. to access odd plurals?)
>     >>
>     >> The sys._getframe() requirement (if true) would kill this idea thoroughly
>     >> for me.
>     >
>     > AFAICT sys._getframe is unneeded -- I understand Nick's suggestion to
>     > be that we desugar f"..." to:
>     >
>     >    __interpolate__("...", locals(), globals())
>     >
>     > with the reference to __interpolate__ resolved using the usual lookup
>     > rules (locals -> globals -> builtins).
> 
>     Not quite. While I won't be entirely clear on Eric's latest proposal
>     until the draft PEP is available, my understanding is that an f-string
>     like:
> 
>         f"This interpolates \{a} and \{b}"
> 
>     would currently end up effectively being syntactic sugar for a
>     formatting operation like:
> 
>         "This interpolates " + format(a) + " and " + format(b)
> 
>     While str.format itself probably doesn't provide a good signature for
>     __interpolate__, the essential information to be passed in to support
>     lossless translation would be an ordered series of:
> 
>         * string literals
>         * (expression_str, value, format_str) substitution triples
> 
>     Since the fastest string formatting operation we have is actually
>     still mod-formatting, lets suppose the default implementation of
>     __interpolate__ was semantically equivalent to:
> 
>         def __interpolate__(target, expressions, values, format_specs):
>             return target % tuple(map(format, values, format_specs)
> 
>     With that definition for default interpolation, the f-string above
>     would be translated at compile time to the runtime call:
> 
>         __interpolate__("This interpolates %s and %s", ("a", "b"), (a,
>     b), ("", ""))
> 
>     All of those except for the __interpolate__ lookup and the (a, b)
>     tuple would then be stored on the function object as constants.
> 
>     An opt-in translation interpolator might then look like:
> 
>         def __interpolate__(target, expressions, values, format_spec):
>             if not all(expr.isidentifier() for expr in expressions):
>                 raise ValueError("Only variable substitions are permitted
>     for il8n interpolation")
>             if any(spec for spec in format_specs):
>                 raise ValueError("Format specifications are not permitted
>     for il8n interpolation")
>             catalog_str = target % tuple("${%s}" % expr for expr in
>     expressions)
>             translated = _(catalog_str)
>             values = {k:v for k, v in zip(expressions, values)}
>             return string.Template(translated).safe_substitute()
> 
>     The string extractor for the il8n library providing that
>     implementation would also need to know to do the transformation from
>     f-string formatting to string.Template formatting when generating the
>     catalog strings
> 
> 
> OK, that sounds reasonable, except that translators need to control
> substitution order, so s % tuple(...) doesn't work. However, if we use
> s.format(...) we can use "This interpolates {0} and {1}", and then I'm
> satisfied. (Further details of the signature of __interpolate__ TBD.)

The example from C# is interesting. Look at IFormattable:

https://msdn.microsoft.com/en-us/library/Dn961160.aspx
https://msdn.microsoft.com/en-us/library/system.iformattable.aspx

>From http://roslyn.codeplex.com/discussions/570292:
"""
When it is converted to the type IFormattable, the result of the string
interpolation is an object that stores a compiler-constructed format
string along with an array storing the evaluated expressions. The
object's implementation of

IFormattable.ToString(string format, IFormatProvider formatProvider)

is an invocation of

String.Format(IFormatProviders provider, String format, params object
args[])

By taking advantage of the conversion from an interpolated string
expression to IFormattable, the user can cause the formatting to take
place later in a selected locale. See the section
System.Runtime.CompilerServices.FormattedString for details.
"""

So (reverting to Python syntax, with the f-string syntax), in addition
to converting directly to a string, there's a way to go from:

f'abc{expr1:spec1}def{expr2:spec2}ghi'

to:

('abc{0:spec1}def{1:spec2}ghi', (value-of-expr1, value-of-expr2))

The general idea is that you now have access to an i18n-able string, and
the values of the embedded expressions as they were evaluated "in situ"
where the f-string literal was present in the source code.
Y
ou can imagine the f-string above evaluating to a call to:

__interpolate__('abc{0:spec1}def{1:spec2}ghi', (value-of-expr1,
value-of-expr2))

The default implementation of __interpolate__ would be:

def __interpolate__(fmt_str, values):
    return fmt_str.format(*values)

Then you could hook this on a per-module (or global, I guess) basis to
do the i18n of fmt_str.

I don't see the need to separate out the format specifies (spec1 and
spec2) from the generated format string. They belong to the type of
values of the evaluated expressions, so you can just embed them in the
generated fmt_str.


Eric.


More information about the Python-ideas mailing list