[Python-Dev] Some thoughts on the codecs...

Tim Peters tim_one@email.msn.com
Tue, 16 Nov 1999 02:47:08 -0500


[Guido]
>> Does '\u0020' (no u prefix) have a meaning?

[MAL]
> No, \uXXXX is only defined for u"" strings or strings that are
> used to build Unicode objects with this encoding:

I believe your intent is that '\u0020' be exactly those 6 characters, just
as today.  That is, it does have a meaning, but its meaning differs between
Unicode string literals and regular string literals.

> Note that writing \uXX is an error, e.g. u"\u12 " will cause
> cause a syntax error.

Although I believe your intent <wink> is that, just as today, '\u12' is not
an error.

> Aside: I just noticed that '\x2010' doesn't give '\x20' + '10'
> but instead '\x10' -- is this intended ?

Yes; see 2.4.1 ("String literals") of the Lang Ref.  Blame the C committee
for not defining \x in a platform-independent way.  Note that a Python \x
escape consumes *all* following hex characters, no matter how many -- and
ignores all but the last two.

> This [raw Unicode strings] can be had via unicode():
>
> u = unicode(r'\a\b\c\u0020','unicode-escaped')
>
> If that's too long, define a ur() function which wraps up the
> above line in a function.

As before, I think that's fine for now, but won't stand forever.