[Python-Dev] str() vs. unicode()

M.-A. Lemburg mal@lemburg.com
Fri, 21 Sep 2001 15:44:59 +0200


I'd like to query for the common opinion on an issue which I've
run into when trying to resynchronize unicode() and str() in terms
on what happens when you pass arbitrary objects to these constructors
which happen to implement tp_str (or __str__ for instances).

Currenty, str() will accept any object which supports the tp_str
interface and revert to tp_repr in case that slot should not
be available.

unicode() supported strings, character buffers and instances
having a __str__ method before yesterdays checkins.

Now the goal of the checkins was to make str() and unicode()
behave in a more compatible fashion. Both should accept
the same kinds of objects and raise exceptions for all others.

The path I chose was to fix PyUnicode_FromEncodedObject()
to also accept tp_str compatible objects. This API is used
by the unicode_new() constructor (which is exposed as unicode()
in Python) to create a Unicode object from the input object.

str() OTOH uses PyObject_Str() via string_new().

Now there also is a PyObject_Unicode() API which tries to
mimic PyObject_Str(). However, it does not support the additional
encoding and errors arguments which the unicode() constructor
has.

The problem which Guido raised about my checkins was that
the changes to PyUnicode_FromEncodedObject() are seen not
only in unicode(), but also all other instances where this
API is used.

OTOH, PyUnicode_FromEncodedObject() is the most generic constructor
for Unicode objects there currently is in Python.

So the questions are
- should I revert the change in PyUnicode_FromEncodedObject()
  and instead extend PyObject_Unicode() to support encodings ?
- should we make PyUnicode_Object() use=20
  PyUnicode_FromEncodedObject() instead of providing its
  own implementation ?

The overall picture of all this auto-conversion stuff going
on in str() and unicode() is very confusing. Perhaps what
we really need is first to agree on a common understanding
of which auto-conversion should take place and then make
str() and unicode() support exactly the same interface ?!

PS: Also see patch #446754 by Walter D=F6rwald:
http://sourceforge.net/tracker/?func=3Ddetail&atid=3D305470&aid=3D446754&=
group_id=3D5470

--=20
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Consulting & Company:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/