[Python-Dev] Some thoughts on the codecs...
M.-A. Lemburg
mal@lemburg.com
Tue, 16 Nov 1999 09:35:28 +0100
Tim Peters wrote:
>
> [Guido]
> >> Does '\u0020' (no u prefix) have a meaning?
>
> [MAL]
> > No, \uXXXX is only defined for u"" strings or strings that are
> > used to build Unicode objects with this encoding:
>
> I believe your intent is that '\u0020' be exactly those 6 characters, just
> as today. That is, it does have a meaning, but its meaning differs between
> Unicode string literals and regular string literals.
Right.
> > Note that writing \uXX is an error, e.g. u"\u12 " will cause
> > cause a syntax error.
>
> Although I believe your intent <wink> is that, just as today, '\u12' is not
> an error.
Right again :-) "\u12" gives a 4 byte string, u"\u12" produces an
exception.
> > Aside: I just noticed that '\x2010' doesn't give '\x20' + '10'
> > but instead '\x10' -- is this intended ?
>
> Yes; see 2.4.1 ("String literals") of the Lang Ref. Blame the C committee
> for not defining \x in a platform-independent way. Note that a Python \x
> escape consumes *all* following hex characters, no matter how many -- and
> ignores all but the last two.
Strange definition...
> > This [raw Unicode strings] can be had via unicode():
> >
> > u = unicode(r'\a\b\c\u0020','unicode-escaped')
> >
> > If that's too long, define a ur() function which wraps up the
> > above line in a function.
>
> As before, I think that's fine for now, but won't stand forever.
If Guido agrees to ur"", I can put that into the proposal too
-- it's just that things are starting to get a little crowded
for a strawman proposal ;-)
--
Marc-Andre Lemburg
______________________________________________________________________
Y2000: 45 days left
Business: http://www.lemburg.com/
Python Pages: http://www.lemburg.com/python/