[Python-ideas] Implicit string literal concatenation considered harmful (options)

Mon May 20 18:46:28 CEST 2013

On 05/19/2013 05:33 PM, Nick Coghlan wrote:
> If it's based on the contents of these threads, be aware that at least one
> core developer (me) and probably more have already mostly tuned out on the
> grounds that the feature is obviously in wide enough use that changing it
> will break the world without adequate gain. We don't even have to speculate
> on what others might be doing, we know it would break *our* code.

Ok, so is it your opinion, that in order to remove implicit string joining, 
that an explicit replacement must be put in at the same time?

> For example, porting Fedora to Python 3 is already going to be a pain.
> Breaking implicit string concatenation would be yet another road block
> making that transition more difficult.

This sounds more like a general request to not make any changes, rather 
than something about the specific item it self.

To be clear, this is going to need a long removal schedule.  Nothing will 
probably be actually be removed before 3.7 or later.  Maybe two years from now?

How about this:

First, lets please differentiate string continuation from string 
concatenation.  A string continuation to be a pre-run-time alteration.  A 
string concatenation to be a run time operation.

By documenting them that way, it will help make them easier to discuss and 
teach to new users.

Redefine a line continuation character to be strictly a \+\n sequence. 
That removes the "character after line continuation" errors because a '\' 
without a newline after it isn't technically a line continuation character.

Then use the '\' except when it's at the end of a line to be the explicit 
string continuation character.

This should be easy to do also.

We could add this in sooner rather than later.  I don't think it would be a 
difficult patch, and I also don't think it would break anything.  Implicit 
string continuations could be depreciated at the same time with the 
recommendation to start using the more explicit variation.

*But not remove implicit string continuations until Python 4.0.*

String continuations are a similar concept to line continuations, so the 
reuse of '\' for it is an easy concept to learn and remember.  It's also 
easy to explain.  This does not change a '\' used inside a string.  String 
escape codes have their own rules.

Examples:

     foo('a' 'b'):  # This won't cause an error until Python 4.0

     x = 'foo\n' \ 'bar\n' \ 'baz\n'

     x = ( 'foo\n'      # easy to see trailing commas here.
         \ 'bar\n'
         \ 'baz\n'
         )

     x = 'foo\n' \
       \ 'bar\n' \
       \ 'baz\n'

If we allow \+newline to work as both a string continuation and line 
continuation, this could be...

     x = 'foo\n' \
         'bar\n' \
         'baz\n'

This is probably the least disruptive way to do this, and the '\' as a 
string continuation, is consistent with the \+\n as a line continuation.

A final note ...

I think we can easily allow comments after line continuations if there is 
no space between the '\' and the '#'.

     x = 'foo\n' \# This comment is removed.
         'bar\n' \# The new-line at the end is not removed.
         'baz\n'

If when the tokenizer finds a '\' followed by a '#', then it could remove 
the comment, backup one, and continue.  What would happen is the 
\+comment+\n would be converted to \+\n. No space can be between the '\' 
and '#' for this to work.

Seems like this should already work, but the current check for an invalid 
character after a line continuation raises an error before this can happen.

Cheers,
    Ron