How to split with "\" character, and licence copyleft mirror of ©
Steven D'Aprano
steve+comp.lang.python at pearwood.info
Mon Sep 2 22:06:54 EDT 2013
On Mon, 02 Sep 2013 13:22:37 -0700, Ethan Furman wrote:
> In a raw string, the backslash is buggy (IMNSHO) when it's the last
> character. Given the above error, you might think that to get a
> single-quote in a string delimited by single-quotes that you would use
> r'\'', but no:
>
> --> r'\''
> "\\'"
You get exactly what you asked for. It's a raw string, right, so
backslash has no special powers, and "backslash C" should give you
exactly backslash followed by C, for any character C. Which is exactly
what you do get. So that's working correctly, as far as it goes.
> you get a backslash and a single-quote. And if you try to escape the
> backslash to get only one?
>
> --> r'\\'
> '\\\\'
>
> You get two. Grrrr.
Again, working as expected. Since backslash has no special powers, if you
enter a string with backslash backslash, you ought to get two
backslashes. Just as you do.
The *real* mystery is how the first example r'\'' succeeds in the first
place, and that gives you a clue as to why r'\' doesn't. The answer is
discussed in this bug report:
http://bugs.python.org/issue1271
Summarising, the parser understands backslash as an escape character, and
when it scans the string r'\'' the backslash escapes the inner quote, but
then when Python generates the string it skips the backslash escape
mechanism. Since the parser knows that backslash escapes, it fails to
parse r'\' and you get a SyntaxError. If you stick stuff at the end of
the line, you get the SyntaxError at another place:
py> s = r'\'[:] # and more
File "<stdin>", line 1
s = r'\'[:] # and more
^
SyntaxError: EOL while scanning string literal
So the real bug is with the parser.
It is likely that nobody noticed this bug in the first place because the
current behaviour doesn't matter for regexes, which is the primary
purpose of raw strings. You can't end a regex with an unescaped
backslash, so r'abc\'' is an illegal regex and it doesn't matter if you
can't create it.
--
Steven
More information about the Python-list
mailing list