[Python-ideas] Draft PEP on string interpolation

Mike Miller python-ideas at mgmiller.net
Tue Aug 25 21:06:55 CEST 2015


TL;DR:  (Version 2, hopefully more clear)

Let's discuss whether to make "doing the right thing as easy as doing the wrong 
thing" a desired goal for string interpolation.

Details -- we could:

     1) Automatically escape potentially dangerous input variables to sensitive
        functions, or
     2) Make developers do it the hard way, making them completely responsible
        for safety, and always responsible.
        (Knowing that often they don't).
     3) Some combination of the two.

A trivial implementation of 1) is below.  Instead of rendering the string 
immediately, it is deferred until use, with template and parameters stashed 
inside an object, allowing the receiver to specify escaping/quoting rules.

---------------------------------

Let's call these e-strings (for expression), as it's easier to refer to the 
letter of the proposals than three digit numbers.

So, an e-string looks like an f-string, though at compile-time, it is converted 
to an object instead (like an i-string):

     print(e'Hello {friend}, filename: {filename}.')   # converts to ==>

     print(estr('Hello {friend}, filename: {filename}.', friend=friend,
                                                         filename=filename))

An estr is a subclass of str, therefore able to do the nice things a string can 
do.  Rendering is deferred until the variable is used, and it also has a .raw 
member, escape(), and translate() methods:

     class estr(str):
         # init: saves self.raw, args, kwargs for later
         # methods, ops render it
         # def escape(self, escape_func):  # handles escaping
         # def translate(self, template, safe=True): # optional i18n support

To make it as simple as possible to use by end-developers, it:

     1) Doesn't require str() to be run explicitly, it renders itself when
        needed via its various methods and operators.
        Look for .raw, if you need the original.  Also,

     2) A bit of responsibility is pushed to stdlib/pypi.  In a handful of
        sensitive places, the object is checked beforehand and escaped when
        needed:

         # imagine html, db, subprocess input etc.
         def sensitive_func_that_escapes(input):
             if isinstance(input, estr):
                 input = input.escape(shlex.quote)  # each chooses its own rules
             do_something(input)

This means numerous callers using e-strings won't have to do explicit escaping, 
only a handful of callee libraries will--which is common with database apis, for 
example.  What is easiest to type is now safe as well::

     sensitive_func_that_escapes_input(e'user input: {input}')  # sleep easy

This could enable the safety and features we'd like, without burdening the 
everyday user.  I've created a sample script to demonstrate at:

     https://bitbucket.org/mixmastamyk/docs/src/default/pep/estring_example.py

Here is the output:

     # consider:   e'Hello {friend}, filename: {filename}.'
     friend:       'John'
     filename:     "somefile; rm -rf ~ 'foo' <html>"

     original:     Hello {friend}, filename: {filename}.
     w/ print():   Hello John, filename: somefile; rm -rf ~ 'foo' <html>.

     shell escape:
         Hello John, filename: 'somefile; rm -rf ~ '"'"'foo'"'"' <html>'.
     html escape:
         Hello John, filename: somefile; rm -rf ~ &#x27;foo&#x27; <html>.
     sql escape:   Hello "John", filename: "somefile; rm -rf ~ 'foo' <html>".
     logger DEBUG  Hello John, filename: somefile; rm -rf ~ 'foo' <html>.

     upper+encode: b"HELLO JOHN, FILENAME: SOMEFILE; RM -RF ~ 'FOO' <HTML>."
     translated?:  Hola John, archivo: somefile; rm -rf ~ 'foo' <html>.


Is this automatic escaping desired?  Or should we continue to make the 
end-developer fully responsible for escaping input?

-Mike



More information about the Python-ideas mailing list