[Python-ideas] Draft PEP on string interpolation

Fri Aug 21 18:35:42 CEST 2015

> On 08/21/2015 07:49 AM, Nick Coghlan wrote:
>> On 21 August 2015 at 21:06, Nathaniel Smith <njs at pobox.com> wrote:
>>> On Aug 20, 2015 23:40, "Nick Coghlan" <ncoghlan at gmail.com> wrote:
>> [...]
>>>    myquery = i"SELECT $column FROM $table;"
>>>    mycommand = i"cat $filename"
>>>    mypage = i"<html><body>$content</body></html>"
>>> 
>>> It's the opposite of the "interpolating untrusted strings that may
>>> contain aribtrary expressions" problem - what happens when the
>>> variables being *substituted* are untrusted? It's easy to say "don't
>>> do that", but if doing the right thing incurs all the repetition
>>> currently involved in calling str.format, we're going to see a *lot*
>>> of people doing the wrong thing. At that point, the JavaScript
>>> backticks-with-arbitrary-named-callable solution starts looking very
>>> attractive:
>>> 
>>>    myquery = sql`SELECT $column FROM $table;`
>>>    mycommand = sh`cat $filename`
>>>    mypage = html`<html><body>$content</body></html>`
>> 
>> Surely if using backticks we would drop the ugly prefix syntax and just make
>> it a function call?
> 
> Not really, no, as `obj` already means repr(obj) in Python 2, and we
> can't silently make it do something else in Python 3 (although we can
> break it noisily and thus strongly encourage folks to switch to using
> the builtin instead).
> 
> The attractiveness of "little bobby tables" [1] vulnerabilities with
> an interpolation syntax that *doesn't* support custom interpolation
> engines has switched me from being mildly interested in the idea of
> good support for SQL, shell command and HTML generation to considering
> it a necessary capability, though.

The various string interpolation proposals are conflating two things:

1: extracting the expressions from the source string, and evaluating
them in the correct context, and

2: taking the source string and the evaluated values, and building the
resulting string.

The problem is that in #1, the compiler has to be in on what's going on.
That's because this problem can't be solved with normal function calls.
So if normal function calls can't do it, what choices do we have? Either
syntax, or special function names known to the compiler. I think syntax
is clearly the right choice here.

The only syntax changes that anyone has come up with so far are string
prefixes, maybe suffixes, and back-ticks (ick). Of those, prefixes make
the most sense. I'm interested in other suggestions, though. (Since I
wrote this, I see Barry's import-based approach, but it's similar:
instructions to the compiler.)

Yuri's proposal was to implement #1 by having _any_ string prefix
trigger the compiler to get involved to extract the source string and
the compute the values. Then for #2, he invoked normal function calls,
derived from the string prefix. He also loosened the restriction that
strings would be the result: because any function could be invoked with
the source string and the values, that function could return anything.

If you really want string interpolation to be extensible to domains such
as SQL and HTML, then I think an approach like Yuri's is the only way to
do it: some syntax to tell the compiler to treat a string differently,
coupled with some user-specifiable function that gets called to do the
real work, and no need for the result to be a string.

Eric.