unicode question

Mon Nov 22 22:53:03 EST 2004

On Tue, 23 Nov 2004 00:24:09 +0100, =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= <martin at v.loewis.de> wrote:

>Bengt Richter wrote:
>> So, bottom line, as Wolfgang effectively asked by his example, why does print try to coerce
>> the __str__ return value to ascii on the way to the ouput encoder, when there is encoding info
>> in the unicode object that it is happy to defer reencoding of for sys.stdout.encoding?
>
>[See my other posting:]
>Because print invokes str() on its argument, unless the argument is
>already a byte string (in which case it prints it directly), or a
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^-- effectively an assumption that
bytestring.decode('some_unknown_encoding').encode(sys.stdout.encoding)
has already been done, it seems (I'm not arguing against).

>Unicode string (in which case it encodes it with the stream encoding).
>It is str(y) that fails, not the printing.
>
Yes, I think my turgid post did demonstrate that, among other things ;-)

So how about changing print so that it doesn't blindly use str(y), but instead
first tries to get y.__str__() in case the latter returns unicode?
Then print y can succeed the way print y.__str__() does now.

The same goes for str.__mod__ -- it apparently knows how to deal with '%s'% unicode(y)
so why shouldn't '%s'%y benefit when y.__str__ returns unicode?

I.e., str doesn't know that printing and '%s' can use unicode to good effect
if it available, so for print and str.__mod__ blindly to use str() intermediately
throws away an opportunity to do better ISTM.

Regards,
Bengt Richter