[Python-Dev] should we keep the \xnnnn escape in unicode strings?

Tim Peters tim_one@email.msn.com
Sun, 16 Jul 2000 14:14:02 -0400


[/F]
>     for maximum compatibility with 8-bit strings and SRE,
>     let's change "\x" to mean "binary byte" in unicode string
>     literals too.

[MAL]
> Hmm, this is probably not in sync with C9X (see section 6.4.4.4),

The behavior of \x in C9X is nearly incomprehensible -- screw it.

> but then perhaps we should depreciate usage of \xXX in the context
> of Unicode objects altogether. Our \uXXXX notation is far
> superior to what C9X tries to squeeze into \x (IMHO at least).

\x is a hack inherited from the last version of C, put in back when they
knew they had to do *something* to support "big characters" but had no real
idea what.  C9X was not allowed to break anything in the std it built on, so
they kept all the old implementation-defined \x behavior, and made it even
more complicated so it would make some kind sense with the new C9X character
gimmicks.

Python is stuck trying to make sense out of its ill-considered adoption of
old-C's \x notation too.  Letting it mean "a byte" regardless of context
should make it useless enough that people will eventually learn to avoid it
<wink>.

Note that C9X also has \u and \U notations, and \u in C9X means what it does
in Python, except that C9X explicitly punts on what happens for \u values in
these (inclusive) ranges:

    \u0000 - \u0020
    \u007f - \u009f
    \ud800 - \udfff

\U is used in C9X for 8-digit (hex) characters, deferring to ISO 10646.

If C9X didn't *have* to keep \x around, I'm sure they would have tossed it.