Draft PEP: string interpolation with backquotes
Oren Tirosh
oren-py-l at hishome.net
Sun Dec 2 08:06:26 EST 2001
PEP: XXXX
Title: String interpolation with backquotes
Author: oren at hishome.net (Oren Tirosh)
Created: 2-Dec-2001
Abstract
This document proposes a string interpolation feature for Python
to allow easier string formatting. The suggested syntax change is
the introduction of a new 'i' prefix for strings that triggers the
special interpretation of the backquote "`" character within a
string.
Example:
i"X=`x`, Y=`calc_y(x)`."
Copyright
This document is in the public domain.
Specification
A new character prefix "i" is defined for strings. This prefix
precedes the "u" and "r" prefixes, if present. The prefix "i"
stands for "interpolation" or "in-line". Within a string with an
"i" prefix an expression enclosed in backquotes is converted into
its string representation and embedded into the string. An empty
interpolation ("``") is not allowed. The expression may be any
valid Python expression not containing the backquote character.
Since the backquote character may be replaced with the repr()
function this does not present any actual limitation on embedded
expressions.
Rationale
A similar proposal was made by Marnix Klooster in a python-list
posting [1] without the "i" prefix. Marnix noted that this is
the way it is done in Python's ancestor ABC from which it inherits
many features and design decisions.
The most apparent difference between this proposal and a previous
proposal by Ka-Ping Yee [2] is the use of backquotes rather than
the '$' character. Backquotes are familiar to Pythoneers as
equivalent to the repr() function whereas the $ notation is alien
to Python. With backquotes there is only one interpolation format
with simple rules compared to $ interpolation which uses a tricky
algorithm to detect the end of the interpolation or the use of an
alterative format with braces.
A more significant but less apparent difference is that with this
proposal the embedded expressions are not characters in a string -
they are real Python expressions compiled into byte code and the
validity of the interpolation syntax and the syntax of embedded
expressions is fully checked at compile-time.
This design does not sneak runtime evaluation or lazy evaluation
into the language in the back door. To get lazy-evaluated string
interpolation the programmer may explicitly use a lambda function
consisting of a single interpolated string.
Implementation notes
Most of the logic of this proposal is in the tokenizer. The
example above is broken down into the following tokens:
<i"X=`> - INTERPOLATE
<x> - NAME
<`, Y=`> - INTFRAG (interpolation fragment)
<calc_y> - NAME
<(> - LPAR
<x> - NAME
<)> - RPAR
<`."> - DETERPOLATE
An INTERPOLATE token instructs the compiler to start a tuple. An
INTFRAG is similar to a comma separating items in a tuple and the
DETERPOLATE token terminates the tuple.
The code generated by this form of interpolation may be the same
as that generated by the "%" operator. The INTERPOLATE, INTFRAG,
and DETERPOLATE tokens are concatenated together, any "%"
characters in the string are replaced with "%%", the now-empty
backquotes "``" are replaces with "%s". Finally, the "%" operator
is applied to the resulting string and the tuple.
Correctly generating INTFRAG and DETERPOLATE tokens requires some
stateful logic in the tokenizer. If the INTERPOLATE token started
with a single quote an INTFRAG may not contain and unescaped
single quote and DETERPOLATE ends with a single quote. Similar
rules apply to interpolations with double quotes, triple single
quotes and triple double quotes.
State 0 - Default state. A backquote is a single character
BACKQUOTE token.
State 1 - backquote starts an INTFRAG or DETERPOLATE that ends
with "'".
State 2 - backquote starts an INTFRAG or DETERPOLATE that ends
with '"'.
State 3 - backquote starts an INTFRAG or DETERPOLATE that ends
with "'''".
State 4 - backquote starts an INTFRAG or DETERPOLATE that ends
with '"""'.
Reference implementation
A reference implementation in the form of a preprocessor for Python
sources files is available at:
http://www.tothink.com/python/interpp
This prepreocessor is based on a modified version of Ka-Ping Yee's
tokenize.py module.
Security
String interpolation involving actual run-time parsing of a string
opens many potential security holes. This form of interpolation should
be secure against this class of attacks.
References
[1] 1996/11/07 python-list posting (Marnix Klooster)
http://groups.google.com/groups?group=comp.lang.python
&selm=328195a1.1211700%40news.worldonline.nl
[2] PEP 215, String Interpolation (Ka-Ping Yee)
http://www.python.org/peps/pep-0215.html
More information about the Python-list
mailing list