[Python-Dev] should we keep the \xnnnn escape in unicode strings?

Sat, 15 Jul 2000 15:57:03 +0200

as tim pointed out in an earlier thread (on SRE), the
\xnn escape code is something of a kludge.

I just noted that the unicode string type supports \x
as well as \u, with slightly different semantics:

    \u -- exactly four hexadecimal characters are read.

    \x -- 1 or more hexadecimal characters are read, and
    the result is casted to a Py_UNICODE character.

I'm pretty sure this is an optimal design, but I'm not sure
how it should be changed:

    1. treat \x as a hexadecimal byte, not a hexadecimal
    character.  or in other words, make sure that

        ord("\xabcd") =3D=3D ord(u"\xabcd")

    fwiw, this is how it's done in SRE's parser (see the
    python-dev archives for more background).

    2. ignore \x.  after all, \u is much cleaner.

        u"\xabcd" =3D=3D "\\xabcd"
        u"\u0061" =3D=3D "\x61" =3D=3D "\x0061" =3D=3D "\x00000061"

    3. treat \x as an encoding error.

    4. read no more than 4 characters.  (a comment in the
    code says that \x reads 0-4 characters, but the code
    doesn't match that comment)

        u"\x0061bcd" =3D=3D "abcd"

    5. leave it as it is (just fix the comment).

comments?

</F>