[Python-Dev] \u and \U escapes in raw unicode string literals

"Martin v. Löwis" martin at v.loewis.de
Fri May 11 07:52:39 CEST 2007


> This is what prompted my question, actually: in Py3k, in the
> str/unicode unification branch, r"\u1234" changes meaning: before the
> unification, this was an 8-bit string, where the \u was not special,
> but now it is a unicode string, where \u *is* special.

That is true for non-raw strings also: the meaning of "\u1234" also
changes.

However, traditionally, there was *no* escaping mechanism in raw strings
in Python, and I feel that this is a good principle, because it is
easy to learn (if you leave out the detail that \ can't be the last
character in a raw string - which should get fixed also, IMO). So I
think in Py3k, "\u1234" should continue to be a string with 6
characters. Otherwise, people will complain that
os.stat(r"c:\windows\system32\user32.dll") fails. Telling them to write
os.stat(r"c:\windows\system32\u005Cuser32.dll") will just cause puzzled
faces.

Windows path names are one of the two primary applications of raw
strings (the other being regexes).

Regards,
Martin



More information about the Python-Dev mailing list