unicode(obj, errors='foo') raises TypeError - bug?

Wed Feb 23 07:48:10 EST 2005

Steven Bethard wrote:
> Mike Brown wrote:
> 
>>>>> class C:
>>
>> ...   def __str__(self):
>> ...      return 'asdf\xff'
>> ...
>>
>>>>> o = C()
>>>>> unicode(o, errors='replace')
>>
>> Traceback (most recent call last):
>>   File "<stdin>", line 1, in ?
>> TypeError: coercing to Unicode: need string or buffer, instance found
>>
> [snip]
> 
>>
>> What am I doing wrong? Is this a bug in Python?
> 
> 
> No, this is documented behavior[1]:
> 
> """
> unicode([object[, encoding [, errors]]])
>     ...
>     For objects which provide a __unicode__() method, it will call this 
> method without arguments to create a Unicode string. For all other 
> objects, the 8-bit string version or representation is requested and 
> then converted to a Unicode string using the codec for the default 
> encoding in 'strict' mode.
> """
> 
> Note that the documentation basically says that it will call str() on 
> your object, and then convert it in 'strict' mode.  You should either 
> define __unicode__ or call str() manually on the object.

Not a bug, I guess, since it is documented, but it seems a bit bizarre that the encoding and errors 
parameters are ignored when object does not have a __unicode__ method.

Kent

> 
> STeVe
> 
> [1] http://docs.python.org/lib/built-in-funcs.html