Indented multi-line strings (was: "Data blocks" syntax specification draft)

Ian Kelly ian.g.kelly at gmail.com
Tue May 29 09:57:18 EDT 2018


On Tue, May 29, 2018 at 3:19 AM, Peter J. Holzer <hjp-python at hjp.at> wrote:
> On 2018-05-23 11:08:48 -0600, Ian Kelly wrote:
>>
>> How about we instead just use the rules from PEP 257 so that there
>> aren't two different sets of multi-line string indentation rules to
>> have to remember?
>>
>> https://www.python.org/dev/peps/pep-0257/#handling-docstring-indentation
>
> These rules are nice for a specific application, but I think they are
> too ad-hoc and not general enough for a language feature which should be
> able to represent arbitrary strings.
>
> In particular:
>
> | will strip a uniform amount of indentation from the second and further
> | lines of the docstring, equal to the minimum indentation of all
> | non-blank lines after the first line
>
> What if I want all lines to start with some white space?
>
> |  Any indentation in the first line of the docstring (i.e., up to the
> |  first newline) is insignificant and removed.
>
> What if I want the string to start with white space?
>
> |  Blank lines should be removed from the beginning and end of the
> |  docstring.
>
> What if I want leading or trailing blank lines?

Fair points. I still dislike reinventing the wheel here. Note that
even as I proposed reusing the single existing indentation-removal
scheme in the language, I misremembered a few things about how it
works.

>> Also, how about using a string prefix character instead of making
>> quad-quote meaningful? Apart from being hard to visually distinguish
>> from triple-quote, this would break existing triple-quote strings that
>> happen to start with the quote character, e.g ''''What?' she asked.'''
>
> No confusion here, since in my proposal there is always a newline after
> the leading delimiter, since otherwise the first line wouldn't line up
> with the rest. So the parser would notice that this is a triple-quote
> and not a quad-quote as soon as it sees the "W".

Then how about a triple-quote string that starts with a quote
character followed by a newline?

>> b = i'''
>>     Here is a multi-line string
>>     with indentation, which is
>>     determined from the second
>>     line.'''
>
> Visually, that letter doesn't look like a part of the quote, so I would
> like to pull the contents of the string over to align with the quote:
>
> b = i'''
>      Here is a multi-line string
>      with indentation, which is
>      determined from the second
>      line.'''
>
> But that creates an ambiguity: Is the whole string now indented one
> space or not? Where is the left edge?

I don't follow. In the first case you have a multi-line string where
every line is indented four spaces, so four spaces are to be removed
from every line. In the second case you have a multi-line string where
every line is indented by five spaces, so five spaces are to be
removed from every line. What about the second string would make the
algorithm think that four spaces are to be removed from every line,
leaving one? Why not remove three, leaving two? Or remove one, leaving
four? And why is the first string safe from this?

In any case, Chris made a good point that I agree with. This doesn't
really need to be syntax at all, but could just be implemented as a
new string method.



More information about the Python-list mailing list