[Python-Dev] \u and \U escapes in raw unicode string literals

Guido van Rossum guido at python.org
Fri May 11 00:11:37 CEST 2007


On 5/10/07, M.-A. Lemburg <mal at egenix.com> wrote:
> On 2007-05-10 20:53, Paul Moore wrote:
> > On 10/05/07, Guido van Rossum <guido at python.org> wrote:
> >> I just discovered that, in all versions of Python as far back as I
> >> have access to (2.0), \uXXXX escapes are interpreted inside raw
> >> unicode strings. Thus:
> > [...]
> >> Does anyone remember why it is done this way? The reference manual
> >> describes this behavior, but doesn't give an explanation:
> >
> > My memory is so dim as to be more speculation than anything else, but
> > I suspect it's simply because there's no other way of including
> > characters outside the ASCII range in a raw string.
>
> This is per design (see PEP 100) and was done for the reason given
> by Paul. The motivation for the chosen approach was to make Python's
> raw Unicode strings compatible to Java's raw Unicode strings:
>
> http://java.sun.com/docs/books/jls/second_edition/html/lexical.doc.html

I'm not sure what Java compatibility buys us. It is also far from
perfect -- IIUC, in Java if you write \u0022 (that's the " character)
it counts as an opening or closing quote, and if you write \u005c (a
backslash) it can be used to escape the following character. OTOH, in
Python, you can write ur"C:\Program Files\u005c" and voila, a raw
string terminating in a backslash. (In Java this would escape the "
instead.)

However, I understand the other reason (inclusion of non-ASCII
characters in raw strings) and I reluctantly agree with it.
Reluctantly, because it means I can't create a raw string containing a
\ followed by u or U -- I needed one of those today.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)


More information about the Python-Dev mailing list