Indented multi-line strings (was: "Data blocks" syntax specification draft)

Peter J. Holzer hjp-python at hjp.at
Wed May 30 18:47:33 EDT 2018


On 2018-05-29 07:57:18 -0600, Ian Kelly wrote:
> On Tue, May 29, 2018 at 3:19 AM, Peter J. Holzer <hjp-python at hjp.at> wrote:
> > On 2018-05-23 11:08:48 -0600, Ian Kelly wrote:
> >>

[...]
> > What if I want all lines to start with some white space?
[...]
> 
> Fair points.
[...]

> >> Also, how about using a string prefix character instead of making
> >> quad-quote meaningful? Apart from being hard to visually distinguish
> >> from triple-quote, this would break existing triple-quote strings that
> >> happen to start with the quote character, e.g ''''What?' she asked.'''
> >
> > No confusion here, since in my proposal there is always a newline after
> > the leading delimiter, since otherwise the first line wouldn't line up
> > with the rest. So the parser would notice that this is a triple-quote
> > and not a quad-quote as soon as it sees the "W".
> 
> Then how about a triple-quote string that starts with a quote
> character followed by a newline?

Collateral damage. Seriously, if you write something like this

    s = ''''
            A quoted
            multline
            string
''''

instead of this

    s = """'
            A quoted
            multline
            string
'"""

outside of an obfuscation contest, you get what you deserve.

> >> b = i'''
> >>     Here is a multi-line string
> >>     with indentation, which is
> >>     determined from the second
> >>     line.'''
> >
> > Visually, that letter doesn't look like a part of the quote, so I would
> > like to pull the contents of the string over to align with the quote:
> >
> > b = i'''
> >      Here is a multi-line string
> >      with indentation, which is
> >      determined from the second
> >      line.'''
> >
> > But that creates an ambiguity: Is the whole string now indented one
> > space or not? Where is the left edge?
> 
> I don't follow. In the first case you have a multi-line string where
> every line is indented four spaces, so four spaces are to be removed
> from every line. In the second case you have a multi-line string where
> every line is indented by five spaces, so five spaces are to be
> removed from every line.

Nope. Remember that I want to be able to have *all* lines start with
white space.

So I can't simply strip all the common whitespace.

This is the reason why I want the quote (leading or trailing, preferably
both) to indicate where the left edge of the string is.

For a quad quote this is IMHO unambiguous:

    b = ''''
           Py-
          thon
        ''''

The first line starts with 3 spaces, the second one with 2.

But the prefix makes is visually ambiguous: Is the left edge where the
prefix is or where the first quote character is.

This is of course not a problem if the *trailing* quote determines the
indentation:

    a_multi_line_string = i'''
           Py-
          thon
        '''


> What about the second string would make the algorithm think that four
> spaces are to be removed from every line, leaving one?

Because you have to skip 4 spaces to get below the "i" which starts the
quote.

> Why not remove three, leaving two? Or remove one, leaving
> four? And why is the first string safe from this?

Not sure what you mean by "first string". The way you wrote has either
no leading whitespace (if the "i" signals the left edge) or is a syntax
error (if the first "'" signals the left edge, because then
non-whitespace characters would have to be discarded).

> In any case, Chris made a good point that I agree with. This doesn't
> really need to be syntax at all, but could just be implemented as a
> new string method.

Depending on the details, not quite. A method wouldn't get the
horizontal position of the leading quote. It could infer the position of
the trailing quote, though.

So, yes, it would be possible. And the optimizer might call the method
at compile time instead of runtime. There is still the visual noise,
though.

This is a bit of a pet peeve of mine: It is common to indent code in all
languages invented in the 40 years or so (and even older languages like
Fortran have adopted that convention). But *none*[1] of them has syntax
to let the programmer write multiline strings that are  properly aligned
with the rest of the code. Not even Python, where indentation is part of
the syntax.

[1] I exaggerate. SPL is the one exception I know. And there are
    probably a few other similarly obscure languages.

-- 
   _  | Peter J. Holzer    | we build much bigger, better disasters now
|_|_) |                    | because we have much more sophisticated
| |   | hjp at hjp.at         | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-list/attachments/20180531/a74eb253/attachment.sig>


More information about the Python-list mailing list