Unrecognized escape sequences in string literals

Douglas Alan darkwater42 at gmail.com
Tue Aug 11 17:29:43 EDT 2009


Steven D'Aprano wrote:

> Because the cost isn't zero. Needing to write \\ in a string
> literal when you want \ is a cost,

I need to preface this entire post with the fact that I've
already used ALL of the arguments that you've provided on my
friend before I ever even came here with the topic, and my
own arguments on why Python can be considered to be doing
the right thing on this issue didn't even convince ME, much
less him. When I can't even convince myself with an argument
I'm making, then you know there's a problem with it!

Now back the our regularly scheduled debate:

I think that the total cost of all of that extra typing for
all the Python programmers in the entire world is now
significantly less than the time it took to have this
debate. Which would have never happened if Python did things
the right way on this issue to begin with. Meaning that
we're now at LESS than zero cost for doing things right!

And we haven't even yet included all the useless heat that
is going to be generated during code reviews and in-house coding
standard debates.

That's why I stand by Python's motto:

   THERE SHOULD BE ONE-- AND PREFERABLY ONLY ONE --OBVIOUS
   WAY TO DO IT.

> and having to read \\ in source code and mentally
> translate that to \ is also a cost.

For me that has no mental cost. What does have a mental cost
is remembering whether "\b" is an "unrecognized escape
sequence" or not.

> By all means argue that it's a cost that is worth paying,
> but please stop pretending that it's not a cost.

I'm not "pretending". I'm pwning you with logic and common
sense!

> Having to remember that \n is a "special" escape and \y
> isn't is also a cost, but that's a cost you pay in C++ too,
> if you want your code to compile.

Ummm, no I don't! I just always use "\\" when I want a
backslash to appear, and I only think about the more obscure
escape sequences if I actually need them, or some code that
I am reading has used them.

> By the way, you've stated repeatedly that \y will compile
> with a warning in g++. So what precisely do you get if you
> ignore the warning?

A program with undefined behavior. That's typically what a
warning means from a C++ compiler. (Sometimes it means
use of a deprecated feature, though.)

> What do other C++ compilers do?

The Microsoft compilers also consider it to be incorrect
code, as I documented in a different post.

> Apart from the lack of warning, what actually is the
> difference between Python's behavior and C++'s behavior?

That question makes just about as much sense as, "Apart
from the lack of a fatal error, what actually is the
difference between Python's behavior and C++'s?"

Sure, warnings aren't fatal errors, but if you ignore them,
then you are almost always doing something very
wrong. (Unless you're building legacy code.)

> > Furthermore, Python's strategy here is SPECIFICALLY
> > DESIGNED, according to the reference manual to catch
> > bugs. I.e., from the original posting on this issue:
>
> >      Unlike Standard C, all unrecognized escape sequences
> >      are left in the string unchanged, i.e., the backslash
> >      is left in the string.  (This behavior is useful when
> >      debugging: if an escape sequence is mistyped, the
> >      resulting output is more easily recognized as
> >      broken.)
>
> You need to work on your reading comprehension. It doesn't
> say anything about the motivation for this behaviour, let
> alone that it was "SPECIFICALLY DESIGNED" to catch bugs. It
> says it is useful for debugging. My shoe is useful for
> squashing poisonous spiders, but it wasn't designed as a
> poisonous-spider squashing device.

As I have a BS from MIT in BS-ology, I can readily set aside
your aspersions to my intellect, and point out the gross
errors of your ways: Natural language does not work the way
you claim. It is is much more practical, implicit, and
elliptical.

More specifically, if your shoe came with a reference manual
claiming that it was useful for squashing poisonous spiders,
then you may now validly assume poisonous spider squashing
was a design requirement of the shoe. (Or at least it has
become one, even if ipso facto.) Furthermore, if it turns out
that the shoe is deficient at poisonous spider squashing,
and consequently causes you to get bitten by a poisonous
spider, then you now have grounds for a lawsuit.

> > Because in the former cases it can't catch the the bug,
> > and in the latter case, it can.
>
> I'm not convinced this is a bug that needs catching, but if
> you think it is, then that's a reasonable argument.

All my arguments are reasonable.

> >> Perhaps it can catch *some* errors of that type, but
> >> only at the cost of extra effort required to defeat the
> >> compiler (forcing the programmer to type \\d to prevent
> >> the compiler complaining about \d). I don't think the
> >> benefit is worth the cost. You and your friend do. Who
> >> is to say you're right?
>
> > Well, Bjarne Stroustrup, for one.
>
> Then let him design his own language *wink*

Oh, I'm not sure that's such a good idea. He might come up
with a language as crazy as C++.

> >> In C++, if you see an escape you don't recognize, do you
> >> care?
>
> > Yes, of course I do. If I need to know what the program
> > does.
>
> Precisely the same as in Python.

Not so at all!

In C++ I only have to run for the manual only when someone
actually puts a *real* escape sequence in their code. With
Python, I have to run for the manual (or at least the REPL),
every time some lame-brained person who thinks they should be
allowed near a keyboard programs using "unrecognized escape
sequences" because they can't be bothered to hit the "\" key
twice.

> Seems to me that the answer is "It's not worse than C++,
> it's the same" -- in both cases, you have to memorize the
> "special" escape sequences, and in both cases, if you see
> an escape you don't recognize, you need to look it up.

The answer is that in this particular case, C++ causes me
far fewer woes! And if C++ is causing me fewer woes than
Language X, then you've got to know that Language X has a
problem.

> I disagree with your sense of aesthetics. I think that
> having to write \\y when I want \y just to satisfy a
> bondage-and-discipline compiler is ugly. That's not to deny
> that B&D isn't useful on occasion, but in this case I
> believe the benefit is negligible, and so even a tiny cost
> is not worth the pain.

EXPLICIT IS BETTER THAN IMPLICIT.

> > (2) That argument disagrees with the Python reference
> > manual, which explicitly states that "unrecognized escape
> > sequences are left in the string unchanged", and that the
> > purpose for doing so is because it "is useful when
> > debugging".
>
> How does it disagree? \y in the source code mapping to \y in
> the string object is the sequence being left unchanged. And
> the usefulness of doing so is hardly a disagreement over the
> fact that it does so.

Because you've stated that "\y" is a legal escape sequence,
while the Python Reference Manual explicitly states that it
is an "unrecognized escape sequence", and that such
"unrecognized escape sequences" are sources of bugs.

> > What makes it "illegal". As far as I can tell, it's just
> > another "unrecognized escape sequence".
>
> No, it's recognized, because \x is the prefix for an
> hexadecimal escape code. And it's illegal, because it's
> missing the actual hexadecimal digits.

So? Why does that make it "illegal" rather than merely
"unrecognized?"

SIMPLE IS BETTER THAN COMPLEX.

> All joking aside, syntax varies from one language to
> another. What counts as a legal escape sequence in
> Javascript and what counts as a legal escape sequence in
> Python are different. What makes you think I'm talking
> about Javascript?

Because anyone with common sense will agree that "\y" is an
illegal escape sequence. The only disagreement should then
be how illegal escape sequences should be handled. Python is
not currently handling them in a way that makes the most
sense.

ERRORS SHOULD NEVER PASS SILENTLY.

> But the morass only exists in the first place because you
> have adopted C++'s approach instead of Python's approach --
> and (possibly) not even a standard part of the C++ approach,
> but a non-standard warning provided by one compiler out of
> many.

Them's fighting words! I rarely adopt the C++ approach to
anything! In this case, (1) C++ just coincidentally happens
to be right, and (2) as far as I can tell, g++ implements
the C++ standard correctly here.

> > It may not be a complex form of DWIMing, but it's still
> > DWIMing a bit.  Python is figuring that if I typed "\z",
> > then either I must have really meant to type "\\z",
>
> Nope, not in the least. Python NEVER EVER EVER tries to
> guess what you mean.

Neither does Perl. That doesn't mean that Perl isn't often
DWIMy.

> This is *exactly* like C++, except that in Python the
> semantics of \y and \\y are identical. Python doesn't
> guess what you mean, it *imposes* a meaning on the escape
> sequence. You just don't like that meaning.

That's because I don't like things that are ill-conceived.

> > I.e., more or less like continuing on in the face of
> > what the Python Reference manual refers to as an
> > "unrecognized escape sequence".

> The wording could be better, I accept. It would be better
> to talk about "special escapes" (e.g. \n) and "any
> non-special escape" (e.g. \y).

Or maybe the wording is just fine, and it's the treatment of
unrecognized escape sequences that could be better.

|>ouglas





More information about the Python-list mailing list