[Python-Dev] Some thoughts on the codecs...

M.-A. Lemburg mal@lemburg.com
Tue, 16 Nov 1999 09:35:28 +0100


Tim Peters wrote:
> 
> [Guido]
> >> Does '\u0020' (no u prefix) have a meaning?
> 
> [MAL]
> > No, \uXXXX is only defined for u"" strings or strings that are
> > used to build Unicode objects with this encoding:
> 
> I believe your intent is that '\u0020' be exactly those 6 characters, just
> as today.  That is, it does have a meaning, but its meaning differs between
> Unicode string literals and regular string literals.

Right.
 
> > Note that writing \uXX is an error, e.g. u"\u12 " will cause
> > cause a syntax error.
> 
> Although I believe your intent <wink> is that, just as today, '\u12' is not
> an error.

Right again :-) "\u12" gives a 4 byte string, u"\u12" produces an
exception.
 
> > Aside: I just noticed that '\x2010' doesn't give '\x20' + '10'
> > but instead '\x10' -- is this intended ?
> 
> Yes; see 2.4.1 ("String literals") of the Lang Ref.  Blame the C committee
> for not defining \x in a platform-independent way.  Note that a Python \x
> escape consumes *all* following hex characters, no matter how many -- and
> ignores all but the last two.

Strange definition...
 
> > This [raw Unicode strings] can be had via unicode():
> >
> > u = unicode(r'\a\b\c\u0020','unicode-escaped')
> >
> > If that's too long, define a ur() function which wraps up the
> > above line in a function.
> 
> As before, I think that's fine for now, but won't stand forever.

If Guido agrees to ur"", I can put that into the proposal too
-- it's just that things are starting to get a little crowded
for a strawman proposal ;-)

-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                    45 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/