Indented multi-line strings (was: "Data blocks" syntax specification draft)

Peter J. Holzer hjp-python at hjp.at
Tue May 29 05:19:19 EDT 2018


On 2018-05-23 11:08:48 -0600, Ian Kelly wrote:
> On Wed, May 23, 2018 at 10:25 AM, Peter J. Holzer <hjp-python at hjp.at> wrote:
> > How about this?
> >
> >     x = ''''
> >         Here is a multi-line string
> >             with
> >           indentation.
> >         ''''
> >
> > This would be equivalent to
> >
> >     x = 'Here is a multi-line string\n    with\n  indentation.'
> >
> > Rules:
> >
> >  * The leading and trailing '''' must be aligned vertically.
> 
> Ick, why?

To create an unambiguous left edge.

> What's wrong with letting the trailing delimiter be at the
> end of a line, or the beginning with no indentation?

If "no indentation" means "its indentation defines the left edge", so
that 

    a_very_long_variable_name = ''''
        A string.
      ''''

is equivalent to "  A string.", I could live with that. The downside is
that the parser has to scan to the end of the string before it knows how
much whitespace to strip from each line. OTOH it makes consistent
indentation with tabs easier:

    a_very_long_variable_name = ''''
    »···»·······»·······»·······  A string.
    »···»·······»·······»·······''''

Are the quad-quotes aligned? It depends on how wide a tab is. (I used
»··· to visualize a tab)

If "no indentation literally means no indentation, like this:

    a = foo()
    b = ''''
          A string.
''''
    c = bar()

then the reason for not allowing this is that it subverts the reason for
proposing this feature (to have multiline strings which nicely align
with the indentation of the code and don't stick out to the left like a
sore thumb). 

Similarily, if "no indentation" means "no additional indentation
relative to the surrounding code", then reason is that in a multiline
statement, the continuation lines should be indented more than the first
line (seep PEP 8).

The trailing delimiter could be at the end of the line to signify that
there is no newline at the end of the string:

    s = ''''
          A string.''''
    t = ''''
          A string.
        ''''

would then be equivalent to 

    s = '  A string.'
    t = '  A string.\n'

Then the indentation of the first delimiter alone determines how much
white space is stripped. I think this looks untidy, though, and my rule
4 is more symmetrical.


> >  * The contents of the string must be indented at least as far as the
> >    delimiters (and with consistent tabs/spaces).
> >    This leading white space is ignored.
> >  * All the leading white space beyond this 'left edge' is preserved.
> >  * The newlines after the leading '''' and before the trailing '''' are
> >    ignored, all the others preserved. (I thought about preserving the
> >    trailing newline, but it is easier to add one than remove one.)
> 
> How about we instead just use the rules from PEP 257 so that there
> aren't two different sets of multi-line string indentation rules to
> have to remember?
> 
> https://www.python.org/dev/peps/pep-0257/#handling-docstring-indentation

These rules are nice for a specific application, but I think they are
too ad-hoc and not general enough for a language feature which should be
able to represent arbitrary strings.

In particular:

| will strip a uniform amount of indentation from the second and further
| lines of the docstring, equal to the minimum indentation of all
| non-blank lines after the first line

What if I want all lines to start with some white space? 

|  Any indentation in the first line of the docstring (i.e., up to the
|  first newline) is insignificant and removed.

What if I want the string to start with white space?

|  Blank lines should be removed from the beginning and end of the
|  docstring.

What if I want leading or trailing blank lines?


> Also, how about using a string prefix character instead of making
> quad-quote meaningful? Apart from being hard to visually distinguish
> from triple-quote, this would break existing triple-quote strings that
> happen to start with the quote character, e.g ''''What?' she asked.'''

No confusion here, since in my proposal there is always a newline after
the leading delimiter, since otherwise the first line wouldn't line up
with the rest. So the parser would notice that this is a triple-quote
and not a quad-quote as soon as it sees the "W". 

A prefix might still be a good idea, but .. see below.


> I don't know if 'i' would be the right prefix character for this, but
> it's unused and is short for 'indented':

'i' is fine by me.

> b = i'''
>     Here is a multi-line string
>     with indentation, which is
>     determined from the second
>     line.'''

Visually, that letter doesn't look like a part of the quote, so I would
like to pull the contents of the string over to align with the quote:

b = i'''
     Here is a multi-line string
     with indentation, which is
     determined from the second
     line.'''

But that creates an ambiguity: Is the whole string now indented one
space or not? Where is the left edge?

        hp

-- 
   _  | Peter J. Holzer    | we build much bigger, better disasters now
|_|_) |                    | because we have much more sophisticated
| |   | hjp at hjp.at         | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-list/attachments/20180529/64f38f59/attachment.sig>


More information about the Python-list mailing list