[Python-Dev] should we keep the \xnnnn escape in unicode strings?

M.-A. Lemburg mal@lemburg.com
Sun, 16 Jul 2000 14:13:57 +0200


Fredrik Lundh wrote:
> 
> mal wrote:
> 
> > >     1. treat \x as a hexadecimal byte, not a hexadecimal
> > >     character.  or in other words, make sure that
> > >
> > >         ord("\xabcd") == ord(u"\xabcd")
> > >
> > >     fwiw, this is how it's done in SRE's parser (see the
> > >     python-dev archives for more background).
> ...
> > >     5. leave it as it is (just fix the comment).
> >
> > I'd suggest 5 -- makes converting 8-bit strings using \x
> > to Unicode a tad easier.
> 
> if that's the main argument, you really want alternative 1.
> 
> with alternative 5, the contents of the string may change
> if you add a leading "u".

Ah ok, now I understand what you meant "\xfffe" will turn
out as "\xfe", while u"\xfffe" results in u"\ufffe".
 
> alternative 1 is also the only reasonable way to make ordinary
> strings compatible with SRE  (see the earlier discussion for why
> SRE has to be strict on this one...)
> 
> so let's change the question into a proposal:
> 
>     for maximum compatibility with 8-bit strings and SRE,
>     let's change "\x" to mean "binary byte" in unicode string
>     literals too.

Hmm, this is probably not in sync with C9X (see section 6.4.4.4),
but then perhaps we should depreciate usage of \xXX in the context
of Unicode objects altogether. Our \uXXXX notation is far
superior to what C9X tries to squeeze into \x (IMHO at least).

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/