[Python-Dev] Unicode proposal: %-formatting ?

Guido van Rossum guido@CNRI.Reston.VA.US
Tue, 16 Nov 1999 08:28:42 -0500


> ... hmm, there is a problem there: how should the PyUnicode_Format()
> API deal with '%s' when it sees a Unicode object as argument ?
> 
> E.g. what would you get in these cases:
> 
> u = u"%s %s" % (u"abc", "abc")

From the user's perspective, it should clearly return u"abc abc".

> Perhaps we need a new marker for "insert Unicode object here".

No, please!

BTW, we also need to look at the proposal from JPython's perspective
(where all strings are Unicode; I don't know if they are UTF-16 or
UCS-2).  It should be possible to add a small number of dummy things
to JPython so that a CPython program using unicode can be run
unchanged there.  A minimal set seems to be:

- u"..." is treated the same as "..."; and ur"..." (if accepted) is r"..."
- unichr(c) is the same as chr(c)
- unicode(s[,encoding]) is added
- s.encode([encoding]) is added

Anything I forgot?

The default encoding may be tricky; it makes most sense to let the
default encoding be "native" so that unicode(s) and s.encode() can
return s unchanged.  This can occasionally cause programs to fail that
work in CPython, e.g. a program that opens a file in binary mode,
reads a string from it, and converts it to unicode using the default
encoding.  But such programs are on thin ice already (it's always
better to be explicit about encodings).

--Guido van Rossum (home page: http://www.python.org/~guido/)