unicode(obj, errors='foo') raises TypeError - bug?

Kent Johnson kent37 at tds.net
Wed Feb 23 17:36:31 EST 2005


Martin v. Löwis wrote:
> Steven Bethard wrote:
> 
>> Yeah, I agree it's weird.  I suspect if someone supplied a patch for 
>> this behavior it would be accepted -- I don't think this should break 
>> backwards compatibility (much).
> 
> 
> Notice that the "right" thing to do would be to pass encoding and errors
> to __unicode__. If the string object needs to be told what encoding it
> is in, why not any other other object as well?
> 
> Unfortunately, this apparently was overlooked, and now it is too late
> to change it (or else the existing __unicode__ methods would all break
> if they suddenly get an encoding argument).

Could this be handled with a try / except in unicode()? Something like this:
  >>> class A:
  ...   def u(self):  # __unicode__ with no args
  ...     print 'A.u()'
  ...
  >>> class B:
  ...   def u(self, enc, err):  # __unicode__ with two args
  ...     print 'B.u()', enc, err
  ...
  >>> def convert(obj, enc='ascii', err='strict'): # unicode() function delegates to u()
  ...   try:
  ...     obj.u(enc, err)
  ...   except TypeError:
  ...     obj.u()
  ...
  >>> convert(a)
A.u()
  >>> convert(a, 'utf-8', 'replace')
A.u()
  >>> convert(b)
B.u() ascii strict
  >>> convert(b, 'utf-8', 'replace')
B.u() utf-8 replace

> 
> As for using encoding and errors on the result of str() conversion
> of the object: how can the caller know what encoding the result of
> str() is in, reasonably? 

The same way that the caller will know the encoding of a byte string, or of the result of 
str(some_object) - in my experience, usually by careful detective work on the source of the string 
or object followed by attempts to better understand and control the encoding used throughout the 
application.

It seems more correct to assume that the
> str() result in in the system default encoding.

To assume that in absence of any guidance, sure, that is consistent. But to ignore the guidance the 
programmer attempts to provide?


One thing that hasn't been pointed out in this thread yet is that the OP could just define 
__unicode__() on his class to do what he wants...

Kent



More information about the Python-list mailing list