[Python-Dev] PEP 292-related: why string substitution is not the same operation as data formatting

Guido van Rossum guido@python.org
Fri, 12 Jul 2002 15:07:06 -0400


> The syntax rules of PEP 292 are likely to cause confusion for
> newbies who have never used sh or perl. They will ask why Python
> have two syntaxes for doing string substitutions? Why not always
> spell the substitution string with ${identifier} or %(identifier)?
> The third rule of PEP292 in particular look like a patch to fix a
> kludge when an unanticipated exception was discovered.
> 
>    3. ${identifier} is equivalent to $identifier.  It is required for
>           when valid identifier characters follow the placeholder but are
>           not part of the placeholder, e.g. "${noun}ification".
> 
> > On Sunday 23 June 2002 02:16 pm, Lalo Martins wrote:
> > > More, I'm completely opposed to "<<name>> is <<age:.0d>> years
> > > old" because it's still cryptic and invasive. This should
> > > instead read similar to "<<name>> is <<age>> years
> > > old".sub({'name': x.name, 'age': x.age.format(None, 0)})
> >
> > > Guido, can you please, for our enlightenment, tell us what are the
> > > reasons you feel %(foo)s was a mistake?
> >
> > Because of the trailing 's'.  It's very easy to leave it out by
> > mistake, and because the definition of printf formats skips over
> > spaces (don't ask me why), the first character of the following word
> > is used as the type indicator.
> 
> It's easy to leave it out by mistake, but the error is almost always
> immediately obvious. In the interest of keeping the language as
> simple as possible, I hope no changes are made. If a method based
> .sub() capability is to be added, why not reuse the %(identifier)
> syntax instead of introducing $ and ${} syntax? The .sub() string
> method would use the %(identifier) syntax without the 's' to spell
> the new substitution format. Instead of the proposed:
> 
> 	'$name was born in ${country}'.sub()
> 
> the phrase would be spelled:
> 
> 	'%(name) was born in %(country)'.sub()
> 
> This approach would introduce one new string method with a small
> variation on the existing '%' substitution syntax.

An argument can be made that since this works rather different than
the current % operator, it's better to avoid confusion by using a
different character.  One can also argue that many Perl and shell
programmers are migrating to Python, for whom this would be helpful --
for others, $ or % makes little difference (DOS batch file programmers
aren't that common, most Windows users never get to this).

But the exact syntax to use in the template is a relatively trivial
detail IMO.  Whether to pick `name`, <<name>>, $name, $(name),
${name}, %name, %{name}, or %(name), is a choice we can make later.
Ditto about whether to allow full expressions, dotted names only, or
simple names only, and whether to allow leaving off the brackets for
simple names (or even for dotted names, as in PEP 215).  User testing
would be good.

User testing has already shown that the current %(name)s notation
causes too many mistakes, because of the odd trailing 's'.  These
errors may be immediately obvious when you run the code, but
constructs that are easily mistyped should still be avoided if
possible.  Also, I believe that the error has actually been puzzling
for many people (e.g. sometimes no error is raised but on close
inspection a few characters appear to be omitted from the output).

The real issues are IMO:

- Compile-time vs. run-time parsing.  I've become convinced that the
  compiler should do the parsing: this is the only way to make access
  to variables in nested scopes work, avoids security issues, and
  makes it easier to diagnose errors (e.g. in PyChecker).

- How to support translation.  Here the template must be replaced at
  run-time, but it is still desirable that the collection of available
  names is known at compile time (to avoid the security issues).

- Optional formatting specifiers.  I agree with Lalo that these should
  not be part of the interpolation syntax but need to be dealt with at
  a different level.  I think these are only relevant for numeric
  data.  Funny, there's still a (now-deprecated) module fpformat.py
  that supports arbitrary floating point formatting, and
  string.zfill() supports a bit of integer formatting.

--Guido van Rossum (home page: http://www.python.org/~guido/)