[Python-ideas] Draft PEP on string interpolation

Tue Aug 25 00:32:25 CEST 2015

On Mon, Aug 24, 2015 at 2:28 PM, Nikolaus Rath <Nikolaus at rath.org> wrote:
> On Aug 24 2015, Mike Miller <python-ideas-9N9vo3BbZlHk1uMJSBkQmQ at public.gmane.org> wrote:
>> Also, 2) a bit of responsibility is pushed to stdlib/pypi.  In a
>> handful of sensitive places, the object is checked beforehand and
>> escaped when needed:
>>
>>     def os_system(command):   # imagine os.system, subprocess, dbapi, etc.
>>         if isinstance(command, estr):
>>             command = command.escape(shlex.quote)  # each chooses its own rules
>>         do_something(command)
>>
>> This means a billion lines of code using e-strings won't have to care
>> about them, only a handful of places.  What is easiest to type is now
>> safe as well:
>>
>>     os.system(e'cat {filename}')  # sleep easy
>
> *shudder*. After years of efforts to get people not to do this, you want
> to change course by 180 degrees and start telling people this is ok if
> they add an additional single character in front of the string?

The problem is that despite years of effort trying to get people not
to do things like this, it's still the case that if you look at, say,
MITRE's ranked list of the "top 25 most dangerous software errors":

    https://cwe.mitre.org/top25/index.html

then numbers #1, #2, and #4 are improper quoting. (#3 is buffer overflows.)

Or if you look at the OWASP consensus list on the most critical web
application security risks ("based on 8 datasets from 7 firms that
specialize in application security, including 4 consulting companies
and 3 tool/SaaS vendors (1 static, 1 dynamic, and 1 with both). This
data spans over 500,000 vulnerabilities..."), then numbers #1 and #3
are improper quoting:

    https://www.owasp.org/index.php/Top_10_2013-Top_10

I mean, it's great that the rise of languages like Python that have
easy range-checked string manipulation has knocked buffer overflows
out of the #1 spot, but... :-)

Guido is right that the nice thing about classic string interpolation
is that its use in many languages gives us tons of data about how it
works in practice. But one of the things that data tells us is that it
actually causes a lot of problems! Do we actually want to continue the
status quo, where one set of people keep designing languages features
to make it easier and easier to slap strings together, and then
another set of people spend increasing amounts of energy trying to
educate all the users about why they shouldn't actually use those
features? It wouldn't be the end of the world (that's why we call it
"the status quo" ;-)), and trying to design something new and better
is always difficult and risky, but this seems like a good moment to
think very hard about whether there's a better way.

(And possibly about whether that better way is something we could put
up on PyPI now while the 3.6 freeze is still a year out...)

-n

-- 
Nathaniel J. Smith -- http://vorpus.org