[Python-Dev] unicode and __str__

M.-A. Lemburg mal at egenix.com
Tue Aug 31 10:23:33 CEST 2004


Neil Schemenauer wrote:
> With Python 2.4:
> 
>     >>> u = u'\N{WHITE SMILING FACE}'
>     >>> class A:
>     ...   def __str__(self):
>     ...     return u
>     ... 
>     >>> class B:
>     ...   def __unicode__(self):
>     ...     return u
>     ... 
>     >>> u'%s' % A()
>     u'\u263a'
>     >>> u'%s' % B()
>     u'\u263a'
> 
> With Python 2.3:
> 
>     >>> u'%s' % A()
>     Traceback (most recent call last):
>       File "<stdin>", line 1, in ?
>     UnicodeEncodeError: 'ascii' codec can't encode character u'\u263a' in
>         position 0: ordinal not in range(128)
>     >>> u'%s' % B()
>     u'<__main__.B instance at 0x401f910c>'
> 
> The only thing I found in the NEWS file that seemed relevant is
> this note:
> 
>   u'%s' % obj will now try obj.__unicode__() first and fallback to
>   obj.__str__() if no __unicode__ method can be found.
> 
> I don't think that describes the behavior difference.  Allowing
> __str__ return unicode strings seems like a pretty noteworthy
> change (assuming that's what actually happened).

__str__ is indeed allowed to return Unicode objects
(and has been for quite a while).

The reason we added __unicode__ was to provide a hook for
PyObject_Unicode() to try before reverting to __str__. It is
needed because even though returning Unicode objects from
__str__ is allowed, in most cases PyObject_Str() gets to talk
to it and this API always converts Unicode to a string using
the default encoding which can easily fail.

> Also, I'm a little unclear on the purpose of the __unicode__ method.
> If you can return unicode from __str__ then why would I want to
> provide a __unicode__ method?  Perhaps it is meant for objects that
> can either return a unicode or a string representation depending on
> what the caller prefers.  I have a hard time imagining a use for
> that.

That's indeed the use case. An object might want to return
an approximate string representation in some form if ask for
a string, but a true content representation when asked for
Unicode. Because of the default encoding problems you might
run into with __str__, we need two slots to provide this kind of
functionality.

In Py3k we will probably see __str__ and __unicode__ reunite.

Now back to your original question: the change you see
in %-formatting was actually a bug fix. Python 2.3 should
have exposed the same behavior as 2.4 does now.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Aug 31 2004)
 >>> Python/Zope Consulting and Support ...        http://www.egenix.com/
 >>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
 >>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::


More information about the Python-Dev mailing list