[Python-ideas] Draft PEP on string interpolation

Mon Aug 24 02:35:17 CEST 2015

On 08/22/2015 09:37 PM, Nick Coghlan wrote:
> On 23 August 2015 at 08:50, Guido van Rossum <guido at python.org> wrote:
>> OTOH this topic is rich enough that I have no problem spending a few more
>> PEP numbers on it. If Mike asks for a PEP number I am not going to withhold
>> it.
> 
> Aye, agreed - at the very least, we want to preserve his survey of
> interpolation in other languages, as I found that to be an incredibly
> valuable contribution.
> 
>>>> 2.  Have I died and gone to Perl?
>>>
>>> That's my question in relation to PEP 498 - it seems to introduce lots
>>> of line noise for people to learn to read for little to no benefit (my
>>> perspective is heavily influenced by the fact that most of the code I
>>> write myself these days consists of network API calls + logging
>>> messages + UI template rendering, with only very occasional direct
>>> calls to str.format that use anything more complicated than "{}" or
>>> "{!r}" as the substitution field).
>>>
>>> As a result, I'd be a lot more comfortable with PEP 498 if it had more
>>> examples of potential practical use cases, akin to the examples
>>> section from PEP 343 for context managers.
>>
>> Since you accept "!r", you must be asking about the motivation for including
>> ":spec", right?
> 
> Sorry, I wasn't clear - PEP 501 also retains the field formatting
> capabilities, and is hence strictly "noisier" than PEP 498 (especially
> the ! prefix version of the syntax). It's just that it solves enough
> *other* problems for it to seem worth the cost to me. When the benefit
> is "str.format is prettier, all other forms of interpolation remain
> repetitively verbose", it seems a very invasive change just to
> replace:
> 
>     print("Chopped {} onions in {:.3f} seconds.".format(n, t1-t0))
> 
> with:
> 
>     print(f"Chopped {n} onions in {t1-t0:.3f} seconds.")
>
>>> While the second draft of PEP 501 is even more line-noisy than PEP 498
>>> due to the use of both "!" and "$", it at least generalises the
>>> underlying semantics of compiler-assisted interpolation to apply to
>>> additional use cases like logging, i18n (including compatibility
>>> with Mozilla's l20n syntax), safe SQL interpolation, safe shell
>>> command interpolation, HTML template rendering, etc.
>>
>>
>> That's perhaps a bit *too* ambitious. The claim of "safety" for PEP 498 is
>> simple -- it does not provide a way for a dynamically generated string to
>> access values in the current scope (and it does this by not supporting
>> dynamically generated strings). For most domains you mention, safety is much
>> more complex, and in fact mostly orthogonal -- code injection attacks rely
>> on the value of the interpolated variables, so PEP 498's "safety" does not
>> help at all.
> 
> Right, but that's where I came to the conclusion that the lack of
> arbitrary interpolation support ends up making PEP 498 actively
> dangerous, as string interpolation based substitution ends up being so
> much prettier than doing things right. Compare:
> 
>     os.system(f"echo {filename}")
>     subprocess.call(f"echo {filename}")
>     subprocess.call(["echo", filename])
> 
> Even in that simple case, the two unsafe approaches are much nicer to
> read, and as the command line gets more complex, the safe version gets
> harder and harder to read relative to the unsafe ones.
> 
> With the latest PEP 501 draft (which switched the proposed syntax and
> semantics to behave more like a traditional binary operator), we could
> make invoking a subprocess *safely* look like:
> 
>     subprocess.call $"echo $filename"
> 
> However, I'm now coming full circle back to the idea of making this a
> string prefix, so that would instead look like:
> 
>     subprocess.call($"echo $filename")
> 
> The trick would be to make interpolation lazy *by default* (preserving
> the triple of the raw template string, the parsed fields, and the
> expression values), and put the default rendering in the resulting
> object's *__str__* method.

At this point, I think PEPs 498 and 501 have converged, except for the
delayed string interpolation object (which I realize is important) and
how expressions are identified in the strings (which I consider less
important).

I think the string interpolation object is interesting. It's basically
what Petr Viktorin and Chris Angelico discussed and suggested here:
https://mail.python.org/pipermail/python-ideas/2015-August/035303.html.

My suggestion would be to add both f-strings (PEP 498) and i-strings (as
they're currently called in PEP 501), but with the exact same syntax to
identify and evaluate expressions. I don't particularly care what the
prefixes are. I'd add the plain f-strings first, then i-strings maybe
later. There are definitely some issues with delayed interpolation we
need to think about. An f-string would be shorthand for str(i-string).

I think it's hyperbolic to refers f-strings as a new string formatting
language. With one small difference (detailed in PEP 498, and with zero
usage I could find in the stdlib outside of tests), f-strings are a
strict superset of str.format() strings (but not the arguments to
.format of course). I think f-strings are no more different from
str.format strings than PEP 501 i-strings are to string.Template strings.

>From what I can tell in the stdlib and in the wild, str.format() has
hundreds or thousands of times more usage that string.Template. I
realize that the reasons are not necessarily related to the syntax of
the replacement strings, but you can't say most people aren't familiar
with str.format().

> That description is probably as clear as mud, though, so back to the
> PEP I go! :)

Thanks for PEP 501. Maybe I'll add delayed interpolation to PEP 498!

On a more serious note, I'm thinking of adding i-strings to my f-string
implementation. I have some ideas that the format_spec (the :.3f stuff)
could be used by the code that eventually does the string interpolation.
For example, sql(i-string) might want to interpret this expression using
__sql__, instead of how str(i-string) would use __format__. Then the
sql() machinery could look at the format_spec and pass it to the value's
__sql__ method.

For example:
sql(i'select {date:as_date} from {tablename}'

might call date.__sql__('as_date'), which would know how to cast to the
write datatype (this happens to me all the time).

This is one reason I'm thinking of ditching !s, !r, and !a, at least for
the first implementation of PEP 498: they're not needed, and are not
generally applicable if we add the hooks I'm considering into i-strings.

Eric.