[Python-Dev] should we keep the \xnnnn escape in unicode strings?
Fredrik Lundh
Fredrik Lundh" <effbot@telia.com
Sat, 15 Jul 2000 15:57:03 +0200
as tim pointed out in an earlier thread (on SRE), the
\xnn escape code is something of a kludge.
I just noted that the unicode string type supports \x
as well as \u, with slightly different semantics:
\u -- exactly four hexadecimal characters are read.
\x -- 1 or more hexadecimal characters are read, and
the result is casted to a Py_UNICODE character.
I'm pretty sure this is an optimal design, but I'm not sure
how it should be changed:
1. treat \x as a hexadecimal byte, not a hexadecimal
character. or in other words, make sure that
ord("\xabcd") =3D=3D ord(u"\xabcd")
fwiw, this is how it's done in SRE's parser (see the
python-dev archives for more background).
2. ignore \x. after all, \u is much cleaner.
u"\xabcd" =3D=3D "\\xabcd"
u"\u0061" =3D=3D "\x61" =3D=3D "\x0061" =3D=3D "\x00000061"
3. treat \x as an encoding error.
4. read no more than 4 characters. (a comment in the
code says that \x reads 0-4 characters, but the code
doesn't match that comment)
u"\x0061bcd" =3D=3D "abcd"
5. leave it as it is (just fix the comment).
comments?
</F>