[Python-ideas] Draft PEP on string interpolation

Wed Aug 26 01:55:24 CEST 2015

On 26 August 2015 at 07:19, Paul Moore <p.f.moore at gmail.com> wrote:
> Of course, having two functions one of which is e-string aware and
> safe, and one of which isn't, and is unsafe, is a pretty bad API. Or
> is it? Developers will for quite a long time have to deal with
> providing compatibility for versions of Python with and without
> e-strings. Consider pyinvoke - invoke.run() runs shell commands, much
> like os.system. Suppose version X of pyinvoke adds e-string support.
> If I write a program using e-strings and invoke.run, it's safe for
> people with pyinvoke version X installed, but unsafe if my users have
> version X-1 installed. That's a pretty nasty bug.
>
> I honestly have no idea how significant this risk is. But it's
> something that should be considered when claiming that the proposal
> makes it "hard to do the wrong thing".

Right, injection is number 1 on the OWASP top 10 list for a reason:
https://www.owasp.org/index.php/Top_10_2013-A1-Injection

The problem is that "things you want to make easy for a developer to
do" often necessarily translates to "things you make easy for a
developer to do with untrusted user supplied data".

Unfortunately, it isn't generally viable to make the paranoid
behaviour the default if "empowering and easy to learn" are two of
your language design goals, as it means you end up not trusting the
*developer*, and make them jump through annoying hoops just to get
things done on their own local system. There *are* languages that work
that way, but "we're protecting you from problems you don't know you
have yet" is generally a poor sales pitch when someone is just trying
to write their first "Hello World!" app (it's still a good goal, but
it needs to be unobtrusive).

Thus, the trick you want to pull off is:

1. Make the wrong thing relatively easy for a security scanner (or the
mark 1 human eyeball) to detect
2. Make the right thing a simple mechanical change away from the wrong thing
3. Make the right thing just as easy to read as the wrong thing so
folks don't resent having to switch

That's the line I now want to walk with f-strings vs i-strings: given
a static analyser with a list of APIs that it deems to be security
sensitive, it can say "passing an f-string here is wrong, and a plain
string is dubious, but an i-string is OK".

Hiding the difference between eager interpolation and deferred
interpolation from the developer is a non-goal from my perspective -
it makes it too hard to glance at a piece of code and say "yes, that's
a security sensitve API, but it's using deferred interpolation, so
it's likely OK (and if not, that's a bug in the security sensitive
API)" or "hmm, that's using eager interpolation with a sensitive API,
that could be an issue, we should look closer and consider switching
to deferred interpolation here".

I'm also going to switch to using completely made up API names, since
folks otherwise anchor on "but that's not the way that API currently
works" without accounting for the fact that APIs can be updated to
dispatch to different behaviours based on the types of their arguments
:)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia