[Python-ideas] PEP 8: raw strings & regular expressions

Thu Oct 22 01:44:00 EDT 2015

Nathaniel Smith <njs at pobox.com> writes:

> On Wed, Oct 21, 2015 at 7:44 PM, Ben Finney <ben+python at benfinney.id.au> wrote:
> > [Automatically interpreting every raw string as a regex pattern] is
> > evidently a simple mistake. Merely knowing that a token is a raw
> > string does not justify the assumption that the string is a regular
> > expression, or a filesystem entry name, or a line in a network
> > protocol, or anything except plain text.
>
> This isn't necessarily true, just as a matter of like... epistemology.

Well, yes, if you like. Epistemically, a sytntax highlighter cannot know
that a raw string is, merely because it's a raw string, definitely a
regular expression pattern.

We have a definition of the language which allows syntax highlighters to
know with certainty what is and is not a particular element of the
language. So if the highlighter shows a sequence of characters as being
what the Python language definition says it is, then it will not be
wrong in any case.

We do not have a definition which allows syntax highlighters to decide
that a raw string is or is not a regular expression, merely because it's
a raw string. So if a highlighter shows a sequence of characters in a
Python program as being a regular expression, it will be wrong for some
cases.

> For example, if hypothetically it turned out that 99% of raw strings
> are in fact regular expressions, then knowing something is a raw
> string would give you quite a bit of evidence that it's a regular
> expression -- quite possibly enough to justify treating it as such for
> something like code highlighting.

Presenting the code highlighted to show particular semantics is a binary
state: it either is shown as (for example) a regular expression, or it
is not. The reader only gets to see what the highlighter decided, not
how certain the epistemic decision was.

How is the person viewing it to know whether the highlighter is wrong
about the intention of the code in any particular case, or if the
highlighter is right and the code doesn't match the author's intention?

If the reader has to second-guess the highlighter (am I wrong here, or
is the highlighter wrong, or both?) every time it doesn't match
expectations, that's a poor syntax highlighter which should never have
made such a binary decision on uncertain data.

-- 
 \           “Never express yourself more clearly than you are able to |
  `\                                               think.” —Niels Bohr |
_o__)                                                                  |
Ben Finney