python3 raw strings and \u escapes

Wed May 30 13:15:52 EDT 2012

On 05/30/2012 10:46 AM, Terry Reedy wrote:
> On 5/30/2012 2:52 AM, rurpy at yahoo.com wrote:
>> In python2, "\u" escapes are processed in raw unicode
>> strings.  That is, ur'\u3000' is a string of length 1
>> consisting of the IDEOGRAPHIC SPACE unicode character.
>
> That surprised me until I rechecked the fine manual and found:
>
> "When an 'r' or 'R' prefix is present, a character following a backslash
> is included in the string without change, and all backslashes are left
> in the string."
>
> "When an 'r' or 'R' prefix is used in conjunction with a 'u' or 'U'
> prefix, then the \uXXXX and \UXXXXXXXX escape sequences are processed
> while all other backslashes are left in the string."
>
> When 'u' was removed in Python 3, a choice had to be made and the first
> must have seemed to be the obvious one, or perhaps the automatic one.
>
> In 3.3, 'u' is being restored. I have inquired on pydev list whether the
> difference above should also be restored, and mentioned this thread.

As mentioned is a different message, another option might
be to leave raw strings as is (more consistent since all
backslashes are treated the same) and have the "re" module
un-escape "\uxxxx" (and similar) literals in regex string
(also more consistent since that's what it does with '\\n',
'\\t', etc.)

I do realize though that this may have back-compatibilty
problems that makes it impossible to do.