[Python-Dev] unicode inconsistency?

Neil Schemenauer nas at arctrix.com
Wed Mar 9 19:16:30 CET 2005


On Wed, Mar 09, 2005 at 11:10:59AM +0100, M.-A. Lemburg wrote:
> The patch implements the PyObjbect_Text() idea (an API that
> returns a basestring instance, ie. string or unicode) and
> then uses this in '%s' (the string version) to properly propogate
> to u'%s' (the unicode version).
> 
> Maybe we should also expose the C API as suggested in the patch,
> e.g. as text(obj).

Perhaps the right thing to do is introduce a new format code that
means insert text(obj) instead of str(obj), e.g %t.  If we do that
though then we should make "'%s' % u'xyz'" return a string instead of
a unicode object.  I suspect that would break a lot of code.

OTOH, having %s mean text(obj) instead of str(obj) may work just
fine.  People who want it to mean str() generally don't have any
unicode strings floating around so text() has the same effect.
People who are using unicode probably would find text() to be more
useful behavior.  I think that's why someone hacked PyString_Format
to sometimes return unicode strings.

Regarding the use of  __str__, to return a unicode object: we could
introduce a new slot (e.g. __text__) instead.  However, I can't see
any advantage to that.  If someone really wants a str object then
they call str() or PyObject_Str().

  Neil


More information about the Python-Dev mailing list