unicode question
Steve Holden
steve at holdenweb.com
Tue Nov 23 08:26:21 EST 2004
Bengt Richter wrote:
> On Tue, 23 Nov 2004 00:24:09 +0100, =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= <martin at v.loewis.de> wrote:
>
>
>>Bengt Richter wrote:
>>
>>>So, bottom line, as Wolfgang effectively asked by his example, why does print try to coerce
>>>the __str__ return value to ascii on the way to the ouput encoder, when there is encoding info
>>>in the unicode object that it is happy to defer reencoding of for sys.stdout.encoding?
>>
>>[See my other posting:]
>>Because print invokes str() on its argument, unless the argument is
>>already a byte string (in which case it prints it directly), or a
>
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^-- effectively an assumption that
> bytestring.decode('some_unknown_encoding').encode(sys.stdout.encoding)
> has already been done, it seems (I'm not arguing against).
>
>
>>Unicode string (in which case it encodes it with the stream encoding).
>>It is str(y) that fails, not the printing.
>>
>
> Yes, I think my turgid post did demonstrate that, among other things ;-)
>
> So how about changing print so that it doesn't blindly use str(y), but instead
> first tries to get y.__str__() in case the latter returns unicode?
> Then print y can succeed the way print y.__str__() does now.
>
> The same goes for str.__mod__ -- it apparently knows how to deal with '%s'% unicode(y)
> so why shouldn't '%s'%y benefit when y.__str__ returns unicode?
>
> I.e., str doesn't know that printing and '%s' can use unicode to good effect
> if it available, so for print and str.__mod__ blindly to use str() intermediately
> throws away an opportunity to do better ISTM.
>
> Regards,
> Bengt Richter
Am I the only person who found it scary that Bengt could apparently
casually drop on a polynomial the would decode to " Löwis"?
feel-dumb-just-being-in-the-same-newsgroup-ly y'rs - steve
--
http://www.holdenweb.com
http://pydish.holdenweb.com
Holden Web LLC +1 800 494 3119
More information about the Python-list
mailing list