Unrecognized escape sequences in string literals

Mon Aug 10 03:32:30 EDT 2009

On Aug 10, 2:03 am, Steven D'Aprano
<ste... at REMOVE.THIS.cybersource.com.au> wrote:

> On Sun, 09 Aug 2009 17:56:55 -0700, Douglas Alan wrote:

> > Because in Python, if my friend sees the string "foo\xbar\n", he has no
> > idea whether the "\x" is an escape sequence, or if it is just the
> > characters "\x", unless he looks it up in the manual, or tries it out in
> > the REPL, or what have you.
>
> Fair enough, but isn't that just another way of saying that if you look
> at a piece of code and don't know what it does, you don't know what it
> does unless you look it up or try it out?

Not really. It's more like saying that easy things should be easy, and
hard things should possible. But in this case, Python is making
something that should be really easy, a bit harder and more error
prone than it should be.

In C++, if I know that the code I'm looking at compiles, then I never
need worry that I've misinterpreted what a string literal means. At
least not if it doesn't have any escape characters in it that I'm not
familiar with. But in Python, if I see, "\f\o\o\b\a\z", I'm not really
sure what I'm seeing, as I surely don't have committed to memory some
of the more obscure escape sequences. If I saw this in C++, and I knew
that it was in code that compiled, then I'd at least know that there
are some strange escape codes that I have to look up. Unlike with
Python, it would never be the case in C++ code that the programmer who
wrote the code was just too lazy to type in "\\f\\o\\o\\b\\a\\z"
instead.

> > My friend is adamant that it would be better
> > if he could just look at the string literal and know. He doesn't want to
> > be bothered to have to store stuff like that in his head. He wants to be
> > able to figure out programs just by looking at them, to the maximum
> > degree that that is feasible.
>
> I actually sympathize strongly with that attitude. But, honestly, your
> friend is a programmer (or at least pretends to be one *wink*).

Actually, he's probably written more code than you, me, and ten other
random decent programmers put together. As he can slap out massive
amounts of code very quickly, he'd prefer not to have crap getting in
his way. In the time it takes him to look something up, he might have
written another page of code.

He's perfectly capable of dealing with crap, as years of writing large
programs in Perl and PHP quickly proves, but his whole reason for
learning Python, I take it, is so that he will be bothered with less
crap and therefore write code even faster.

> You can't be a programmer without memorizing stuff: syntax, function
> calls, modules to import, quoting rules, blah blah blah. Take C as
> an example -- there's absolutely nothing about () that says "group
> expressions or call a function" and {} that says "group a code
> block".

I don't really think that this is a good analogy. It's like the
difference between remembering rules of grammar and remembering
English spelling. As a kid, I was the best in my school at grammar,
and one of the worst at speling.

> You just have to memorize it. If you don't know what a backslash
> escape is going to do, why would you use it?

(1) You're looking at code that someone else wrote, or (2) you forget
to type "\\" instead of "\" in your code (or get lazy sometimes), as
that is okay most of the time, and you inadvertently get a subtle bug.

> This is especially important when reading (as opposed to writing) code.
> You read somebody else's code, and see "foo\xbar\n". Let's say you know
> it compiles without warning. Big deal -- you don't know what the escape
> codes do unless you've memorized them. What does \n resolve to? chr(13)
> or chr(97) or chr(0)? Who knows?

It *is* a big deal. Or at least a non-trivial deal. It means that you
can tell just by looking at the code that there are funny characters
in the string, and not just a backslashes. You don't have to go
running for the manual every time you see code with backslashes, where
the upshot might be that the programmer was merely saving themselves
some typing.

> > In comparison to Python, in C++, he can just look "foo\xbar\n" and know
> > that "\x" is a special character. (As long as it compiles without
> > warnings under g++.)
>
> So what you mean is, he can just look at "foo\xbar\n" AND COMPILE IT
> USING g++, and know whether or not \x is a special character.

I'm not sure that your comments are paying due diligence to full
life-cycle software development issues that involve multiple
programmers (or even just your own program that you wrote a year ago,
and you don't remember all the details of what you did) combined with
maintaining and modifying existing code, etc.

> Aside:
> \x isn't a special character:
>
> >>> "\x"
>
> ValueError: invalid \x escape

I think that this all just goes to prove my friend's point! Here I've
been programming in Python for more than a decade (not full time, mind
you, as I also program in other languages, like C++), and even I
didn't know that "\xba" was an escape sequence, and I inadvertently
introduced a subtle bug into my argument because it just so happens
that the first two characters of "bar" are legal hexadecimal! If I did
the very same thing in a real program, it might take me a lot of time
to track down the bug.

Also, it seems that Python is being inconsistent here. Python knows
that
the string "\x" doesn't contain a full escape sequence, so why doesn't
it
treat the string "\x" the same way that it treats the string "\z"?
After all, if you're a Python programmer, you should know that "\x"
doesn't contain a complete escape sequence, and therefore, you would
not be surprised if Python were so kind as to just leave it alone,
rather than raising a ValueError.

I.e., "\z" is not a legal escape sequence, so it gets left as
"\\z". "\x" is not a legal escape sequence. Shouldn't it also get left
as "\\x"?

> > He's particularly annoyed too, that if he types "foo\xbar" at the REPL,
> > it echoes back as "foo\\xbar". He finds that to be some sort of annoying
> > DWIM feature, and if Python is going to have DWIM features, then it
> > should, for example, figure out what he means by "\" and not bother him
> > with a syntax error in that case.
>
> Now your friend is confused. This is a good thing. Any backslash you see
> in Python's default string output is *always* an escape:

Well, I think he's more annoyed that if Python is going to be so
helpful as to put in the missing "\" for you in "foo\zbar", then it
should put in the missing "\" for you in "\". He considers this to be
an
inconsistency.

Me, I'd never, ever, EVER want a language to special-case something at
the end of a string, but I can see that from his new-to-Python
perspective, Python seems to be DWIMing in one place and not the
other, and he thinks that it should either do no DWIMing at all, or
consistently DWIM. To not be consistent in this regard is "inelegant",
says he.

And I can see his point that allowing "foo\zbar" and "foo\\zbar" to be
synonymous is a form of DWIMing.

> > My point of view is that every language has *some* warts; Python
> > just has a bit fewer than most.  It would have been nice, I should
> > think, if this wart had been "fixed" in Python 3, as I do consider
> > it to be a minor wart.

> And if anyone had cared enough to raise it a couple of years back, it
> possibly might have been.

So, now if only my friend had learned Python years ago, when I told
him to, he possibly might be happy with Python by now!

|>ouglas