[Python-Dev] unicode and __str__

Tim Peters tim.peters at gmail.com
Mon Aug 30 22:41:10 CEST 2004


[Neil Schemenauer]
> ...
> The only thing I found in the NEWS file that seemed relevant is
> this note:
>
>  u'%s' % obj will now try obj.__unicode__() first and fallback to
>  obj.__str__() if no __unicode__ method can be found.
>
> I don't think that describes the behavior difference.  Allowing
> __str__ return unicode strings seems like a pretty noteworthy
> change (assuming that's what actually happened).

It's confusing.  A __str__ method or tp_str type slot can return
unicode, but what happens after that depends on the caller. 
PyObject_Str() and PyObject_Repr() try to encode it as an 8-bit string
then.  But unicode.__mod__ says "oh, cool -- I'm done".

> Also, I'm a little unclear on the purpose of the __unicode__ method.
> If you can return unicode from __str__ then why would I want to
> provide a __unicode__ method? 

Is the purpose clearer if you purge your mind of the belief that str()
(as opposed to __str__!) can return unicode?  Here w/ current CVS:

>>> class A:
...     def __str__(self): return u'a'
>>> print A()
a
>>> type(str(A()))
<type 'str'>
>>>

>>> class A:
...     def __str__(self): return u'\u1234'
>>> print A()
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode character u'\u1234' in
position 0: ordinal not in range(128)
>>>

>>> '%s' % A()
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode character u'\u1234' in
position 0: ordinal not in range(128)

>>> u'%s' % A()
u'\u1234'
>>>

So unicode.__mod__ is what's special here,  But not sure that helps <wink>.


More information about the Python-Dev mailing list