[Python-Dev] Unicode proposal: %-formatting ?
Guido van Rossum
guido@CNRI.Reston.VA.US
Tue, 16 Nov 1999 08:28:42 -0500
> ... hmm, there is a problem there: how should the PyUnicode_Format()
> API deal with '%s' when it sees a Unicode object as argument ?
>
> E.g. what would you get in these cases:
>
> u = u"%s %s" % (u"abc", "abc")
From the user's perspective, it should clearly return u"abc abc".
> Perhaps we need a new marker for "insert Unicode object here".
No, please!
BTW, we also need to look at the proposal from JPython's perspective
(where all strings are Unicode; I don't know if they are UTF-16 or
UCS-2). It should be possible to add a small number of dummy things
to JPython so that a CPython program using unicode can be run
unchanged there. A minimal set seems to be:
- u"..." is treated the same as "..."; and ur"..." (if accepted) is r"..."
- unichr(c) is the same as chr(c)
- unicode(s[,encoding]) is added
- s.encode([encoding]) is added
Anything I forgot?
The default encoding may be tricky; it makes most sense to let the
default encoding be "native" so that unicode(s) and s.encode() can
return s unchanged. This can occasionally cause programs to fail that
work in CPython, e.g. a program that opens a file in binary mode,
reads a string from it, and converts it to unicode using the default
encoding. But such programs are on thin ice already (it's always
better to be explicit about encodings).
--Guido van Rossum (home page: http://www.python.org/~guido/)