Unrecognized backslash escapes in string literals

Chris Angelico rosuav at gmail.com
Sun Feb 22 22:20:38 EST 2015


On Mon, Feb 23, 2015 at 1:41 PM, Ben Finney <ben+python at benfinney.id.au> wrote:
> Right. Text strings literals are documented to work that way
> <URL:https://docs.python.org/3/library/stdtypes.html#text-sequence-type-str>,
> which refers the reader to the language reference
> <URL:https://docs.python.org/3/reference/lexical_analysis.html#strings>.

BTW, quoting from that:

"""
Unlike Standard C, all unrecognized escape sequences are left in the
string unchanged, i.e., the backslash is left in the result. (This
behavior is useful when debugging: if an escape sequence is mistyped,
the resulting output is more easily recognized as broken.)
"""

I'm not sure it's more obviously broken. Comparing Python and Pike:

>>> "asdf\qwer"
'asdf\\qwer'

> "asdf\qwer";
(1) Result: "asdfqwer"

Which is the "more easily recognized as broken" depends on what the
actual intention was. If you wanted to have a backslash (eg a path
name), then the second one is, because you've just run two path
components together. If you wanted to have some sort of special
character ("\n"), then they're both going to be about the same - you'd
expect to see "\n" in the output, one has added a backslash (assuming
you're looking at the repr), the other has removed it. Likewise if you
wanted some other symbol (eg forward slash), they're about the same (a
doubled backslash, or a complete omission, same diff). But if you just
fat-fingered a backslash into a string where it completely doesn't
belong, then seeing a doubled backslash is definitely better than
seeing just the following character (which would mask the error
entirely). Since the interpreter can't know what the intention was, it
obviously has to do just one thing and stick with it.

I'm not convinced this is really an advantage. Python has been aiming
more and more towards showing problems immediately, rather than having
them depend on your data - for instance, instead of letting you treat
bytes and characters as identical until you hit something that isn't
ASCII, Py3 forces you to distinguish from the start. That said,
though, there's probably a lot of code out there that depends on
backslashes being non-special, so it's quite probably something that
can't be changed. But it'd be nice to be able to turn on a warning for
it.

ChrisA



More information about the Python-list mailing list