How to split with "\" character, and licence copyleft mirror of ©

Steven D'Aprano steve+comp.lang.python at pearwood.info
Mon Sep 2 22:06:54 EDT 2013


On Mon, 02 Sep 2013 13:22:37 -0700, Ethan Furman wrote:

> In a raw string, the backslash is buggy (IMNSHO) when it's the last
> character.  Given the above error, you might think that to get a
> single-quote in a string delimited by single-quotes that you would use
> r'\'', but no:
> 
> --> r'\''
> "\\'"

You get exactly what you asked for. It's a raw string, right, so 
backslash has no special powers, and "backslash C" should give you 
exactly backslash followed by C, for any character C. Which is exactly 
what you do get. So that's working correctly, as far as it goes.


> you get a backslash and a single-quote.  And if you try to escape the
> backslash to get only one?
> 
> --> r'\\'
> '\\\\'
> 
> You get two.  Grrrr.

Again, working as expected. Since backslash has no special powers, if you 
enter a string with backslash backslash, you ought to get two 
backslashes. Just as you do.


The *real* mystery is how the first example r'\'' succeeds in the first 
place, and that gives you a clue as to why r'\' doesn't. The answer is 
discussed in this bug report:

http://bugs.python.org/issue1271


Summarising, the parser understands backslash as an escape character, and 
when it scans the string r'\'' the backslash escapes the inner quote, but 
then when Python generates the string it skips the backslash escape 
mechanism. Since the parser knows that backslash escapes, it fails to 
parse r'\' and you get a SyntaxError. If you stick stuff at the end of 
the line, you get the SyntaxError at another place:

py> s = r'\'[:] # and more
  File "<stdin>", line 1
    s = r'\'[:] # and more
                         ^
SyntaxError: EOL while scanning string literal



So the real bug is with the parser.

It is likely that nobody noticed this bug in the first place because the 
current behaviour doesn't matter for regexes, which is the primary 
purpose of raw strings. You can't end a regex with an unescaped 
backslash, so r'abc\'' is an illegal regex and it doesn't matter if you 
can't create it.


-- 
Steven



More information about the Python-list mailing list