Raw string statement (proposal)

Mikhail V mikhailwas at gmail.com
Fri May 25 00:34:33 EDT 2018


Hi.
I've put some thoughts together, and
need some feedback on this proposal.
Main question is:  Is it convincing?
Is there any flaw?
My own opinion - there IS something to chase.
Still the justification for such syntax is hard.



Raw string statement
--------------

Issue
---------

Vast majority of tasks include operations with text in
various grades of complexity. It is relevant even by
simple ubiquitous tasks, like for example defining file paths.
String literals are interpreted - i.e. special character "\" may
change the contents of a string.
Python raw string r"" has the least amount of such cases :
namely the inclusion of the quote character requires escaping.
There is still no string type which is totally uninterpreted.
As a result, any text piece that contains a quote must
be *edited* before it can be used in sources.

This may seem a minor problem, but if we count all
cases, then the cumulative long-term impact may be significant.
Also this problem may become more acute in cases related to:
- development of text/code generators
- and, in general, all text processing with a lot of
  literal data definition
- proofreading

Such applications may *require* a lot of embedded text definitions and
this may even lead to frustration by proofreading of 'escaped' pieces
and it adds necessity for keeping track of changes in these pieces.
Using external resources for these tasks could help, but it may lead to
even worse experience because of spread definitions
and increased maintenance times.

The most common solution in existing syntax - triple quoted
strings and raw strings has some additional issues:

- Data is parsed including indents. This may be a benefit for some
cases (e.g. start lines always without any indent) but it also may
become confusing for the readers when not aligned with
containing block. So-called "de-denting" is also needed.

- Triple quotes cause visual ambiguity in some edge-cases,
e.g. when a string starts or ends with a quote. Also in many fonts
a pair of single quotes is visually identical to one double quote.


Proposal
-----------

Current proposal suggests adding syntax for the "raw text" statement.
This should enable the possibility to define text pieces in source
code without the need for interpreted characters.
Thereby it should solve the mentioned issues.
Additionally it should solve some issues with visual appearance.


Specification
---------

Raw string statement has the following form:

    name >>> "condition_string"
    ... text ...

in example:

    data >>> "  "
      begin
      end
    #rest

will parse the block by comparing each next
line part with the string "  " (2 spaces here).
This will return: "  begin\n  end"

-- Additional option: parse and remove:

    data >>> !"  "
      begin
      end
    #rest

Will parse by the same rule but also remove
the string from the result: "begin\nend"

- Additional option: parse *until* condition:

    data >>> ?"#eof"
    begin
    end
    #eof

Will parse up to character sequence "#eof" (if it is on the
same level) and returns: "begin\nend".
The benefit of last option - the data can be put at zero
level. It may be also prefered due to explicit terminator.


General rules:
--------
- parsing is aware of the indent of containing
  block, i.e. no de-dention needed.
- single line assignment may be allowed with
  some restrictions.

Difficulties:
--------
- change of core parsing rules
- backward compatibility broken
- syntax highlighting may not work



More information about the Python-list mailing list