unicode(obj, errors='foo') raises TypeError - bug?

"Martin v. Löwis" martin at v.loewis.de
Wed Feb 23 15:45:27 EST 2005


Steven Bethard wrote:
> Yeah, I agree it's weird.  I suspect if someone supplied a patch for 
> this behavior it would be accepted -- I don't think this should break 
> backwards compatibility (much).

Notice that the "right" thing to do would be to pass encoding and errors
to __unicode__. If the string object needs to be told what encoding it
is in, why not any other other object as well?

Unfortunately, this apparently was overlooked, and now it is too late
to change it (or else the existing __unicode__ methods would all break
if they suddenly get an encoding argument).

As for using encoding and errors on the result of str() conversion
of the object: how can the caller know what encoding the result of
str() is in, reasonably? It seems more correct to assume that the
str() result in in the system default encoding.

If you can follow so far(*): if it is the right thing to ignore the
encoding argument for the case that the object was str() converted,
why should the errors argument not be ignored? It is inconsistent
to ignore one parameter to the decoding but not the other.

Regards,
Martin

(*) I admit that the reasoning for ignoring the encoding is
somewhat flawed. There are some types (e.g. numbers) where
str() always uses the system encoding (i.e. ASCII - actually,
it always uses ASCII, no matter what the system encoding is).
There may be types where the encoding of the str() result
is not ASCII, and the caller happens to know what it is,
but I'm not aware of any such type.



More information about the Python-list mailing list