[Python-ideas] Briefer string format

Steven D'Aprano steve at pearwood.info
Thu Jul 23 16:22:14 CEST 2015


On Wed, Jul 22, 2015 at 09:28:19PM -0700, Bruce Leban wrote:
> On Wed, Jul 22, 2015 at 8:31 PM, Steven D'Aprano <steve at pearwood.info>
> wrote:
> 
> >
> > Constant-folding 'a' + 'b' to 'ab' is an optimization, it doesn't change
> > the semantics of the concat. But constant-folding f'{a}' + '{b}' would
> > change the semantics of the concatenation, because f strings aren't
> > constants, they only look like them.
> >
> 
> It doesn't have to change semantics and it shouldn't. This is a strawman
> argument. 

If I had a dollar for everytime somebody on the Internet misused 
"strawman argument", I would be a rich man. Just because you disagree 
with me or think I'm wrong doesn't make my argument a strawman. It just 
makes me wrong-headed, or wrong :-)

I'm having trouble understand what precisely you are disagreeing with. 
The example I give which you quote involves explicit concatenation with 
the + operator, but your examples below use implicit concatenation with 
no operator at all.

Putting aside the question of implementation, I think:

(1) Explicit concatenation with the + operator should be treated as 
occuring after the f strings are evaluated, *as if* the following 
occurs:

    f'{spam}' + '{eggs}'
    => compiles to format(spam) + '{eggs}'

If you can come up with a clever optimization that avoids the need to 
*actually* build two temporary strings and then concatenate them, I 
don't have a problem with that. I'm only talking about the semantics. I 
don't want this:

    f'{spam}' + '{eggs}'
    => compiles to format(spam) + format(eggs)  # not this!

Do you agree with those semantics for explicit + concatenation? If not, 
what behaviour do you want?


(2) Implicit concatenation should occur as early as possible, before 
the format. Take the easy case first: both fragments are f-strings.

    f'{spam}' f'{eggs}'
    => behaves as if you wrote f'{spam}{eggs}'
    => which compiles to format(spam) + format(eggs)

Do you agree with those semantics for implicit concatenation?


(3) The hard case, when you mix f and non-f strings.

    f'{spam}' '{eggs}'

Notwithstanding raw strings, the behaviour which makes sense to me is 
that the implicit string concatenation occurs first, followed by format. 
So, semantically, if the parser sees the above, it should concat the 
string:

    => f'{spam}{eggs}'

then transform it to a call to format:

    => format(spam) + format(eggs)


I described that as the f "infecting" the other string. Guido has said 
he doesn't like this, but I'm not sure what behaviour he wants instead.


I don't think I want this behaviour:

    f'{spam}' '{eggs}'
    => format(spam) + '{eggs}'

for two reasons. Firstly, I already have (at least!) one way of getting 
that behaviour, such as explicit + concatenation as above.

Secondly, it feels that this does the concatenation in the wrong order. 
Implicit concatenation occurs as early as possible in every other case. 
But here, we're delaying the concatenation until after the format. So 
this feels wrong to me.

(Again, I'm talking semantics, not implementation. Clever tricks with 
escaping the brackets don't matter.)

If there's no consensus on the behaviour of mixed f and non-f strings 
with implicit concatenation, rather than pick one and frustrate and 
surprise half the users, we should make it an error:

    f'{spam}' '{eggs}'
    => raises SyntaxError

and require people to be explicit about what they want, e.g.:

    f'{spam}' + '{eggs}'  # concatenation occurs after the format()
    f'{spam}' f'{eggs}'  # implicit concatenation before format()


(for the avoidance of doubt, I don't care whether the concatenation 
*actually* occurs after the format, I'm only talking about semantics, 
not implementation, sorry to keep beating this dead horse).


> > I would go further and allow all the f prefixes apart from the first to
> > be optional. To put it another way, the first f prefix "infects" all the
> > other string fragments:
> >
> I'd call that a bug. I suppose one person's bug is another person's
> feature. It violates the principle of least surprise. When I look at a line
> in isolation and it starts and ends with a quote, I would not expect that
> to not just be a plain string.

I don't think we can look at strings in isolation line-by-line.

s = r'''This is a long \raw s\tring
that goes over mul\tiple lines and contains

"\backslashes"

okay?
'''


> > (Implicit concatenation is a compile-time operation, the format(...)
> > stuff is run-time, so there is a clear and logical order of operations.)
> >
> 
> To you, maybe. To the average developer, I doubt it. 

I'm not sure if you are complementing me on being a genius, or putting 
the average developer down for being even more dimwitted than me :-)


> I view the compile
> time evaluation of implicit concatenation as a compiler implementation
> detail as it makes essentially no difference to the semantics of the
> program.

But once you bring f strings into the picture, then it DOES make a very 
large semantic difference. f'{spam}' '{eggs}' is very different 
depending on whether that is semantically the same as:

    - concat '{spam}' and '{eggs}', then format

    - format spam alone, then concat '{eggs}'

We can't just say that when the concatenation actually occurs is an 
optimization, as we can with raw and cooked string literals, because the 
f string is not a literal, it's actually a function call in disguise. So 
we have to pick one or the other (or refuse to guess and raise a syntax 
error).

You're right that it doesn't have to occur at compile time. (Although 
that has been the case all the way back to at least Python 1.5.) But it 
is a syntactic feature:

"Note that this feature is defined at the syntactical level, but 
implemented at compile time. The ‘+’ operator must be used to 
concatenate string expressions at run time."

https://docs.python.org/3/reference/lexical_analysis.html#string-literal-concatenation

which suggests to me that *semantically* it should occur as early as 
possible, before the format() operation. That is, it should be 
equivalent to:

    - concat '{spam}' and '{eggs}', then format

and not format followed by concat.

You mentioned the principle of least surprise. I think it would be very 
surprising to have implicit concatenation behave *as if* it were 
occurring after the format, which is what you get if you escape the 
{{eggs}}.

But YMMV. If we (the community) cannot reach consensus, perhaps the 
safest thing would be to just refuse to guess and raise an error on 
implicit concat of f and non-f strings.



-- 
Steve


More information about the Python-ideas mailing list