[issue12014] str.format parses replacement field incorrectly

Sat Jun 4 02:36:31 CEST 2011

Ben Wolfson <wolfson at gmail.com> added the comment:

"""
>From the PEP: "Format strings consist of intermingled character data and markup."
"""

I know. Here is an example of a format string:

"hello, {0}"

Here is the character data from that format string:

"hello, "

Here is the markup:

"{0}"

This follows *directly* from the definition of "character data", which I've quoted several times now. In the following expression:

"{0}".format(1)

there is NO character data, because there is NOTHING which is "which is transferred unchanged from the format string to the output string".

The "{0}" doesn't appear in the output string at all. And the 1 isn't transferred unchanged: it has str() called on it. Since there is nothing which meets the definition of character data, there is nothing which *is* character data in the string, regarded as a format string. It is pure markup---it consists solely of a replacement field delimited by curly braces. I really don't see why this matters at all, but, nevertheless, I apologize if I'm explaining it poorly.

"""
Again, I'm not sure what you're getting at. The inner "{0}" is not interpreted (per the PEP). So the entire string is replaced by d['{0}'], or 'spam'.

Let me try to explain it again. str.format() parses the string, looking for matched sets of braces. In your last example above, the very first character '{' is matched to the very last character '}'. They match, in sense that all of the nested ones inside match. Once the markup is separated from the character data, the interpretation of what's inside the markup is then done. In this example, there is no character data.
"""

Yes, there is no character data. And I understand perfectly what is happening. Here's the problem: your description of what the implementation does is incorrect. You say that 

"""
The current implementation of str.format() finds matched pairs of braces and call what's inside "markup", then parse that markup.
"""

Now, the only reason for thinking that this:

"{0[}]}"

should be treated differently from this:

"{0[a]}"

is that inside square brackets curly brackets indicate replacement fields. If you want to justify what the current implementation does as an implementation of the PEP and an interpretation of what the PEP says, you *have* to think that. But if you think that, then the current implementation should *not* treat this:

"{0[{0}]}"

the way it does, because it does *not* treat the interior curly braces as indications of a replacement field---or rather, it does at one point in the source (in MarkupIterator_next) and it doesn't at another (in FieldNameIterator). I agree that what the current implementation does in the last example is in fact correct. But if it's correct in the one case, it's incorrect in the other, and vice versa. There is no justification, in terms of the PEP, for the present behavior.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue12014>
_______________________________________