unicode question

Tue Nov 23 08:26:21 EST 2004

Bengt Richter wrote:

> On Tue, 23 Nov 2004 00:24:09 +0100, =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= <martin at v.loewis.de> wrote:
> 
> 
>>Bengt Richter wrote:
>>
>>>So, bottom line, as Wolfgang effectively asked by his example, why does print try to coerce
>>>the __str__ return value to ascii on the way to the ouput encoder, when there is encoding info
>>>in the unicode object that it is happy to defer reencoding of for sys.stdout.encoding?
>>
>>[See my other posting:]
>>Because print invokes str() on its argument, unless the argument is
>>already a byte string (in which case it prints it directly), or a
> 
>                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^-- effectively an assumption that
> bytestring.decode('some_unknown_encoding').encode(sys.stdout.encoding)
> has already been done, it seems (I'm not arguing against).
> 
> 
>>Unicode string (in which case it encodes it with the stream encoding).
>>It is str(y) that fails, not the printing.
>>
> 
> Yes, I think my turgid post did demonstrate that, among other things ;-)
> 
> So how about changing print so that it doesn't blindly use str(y), but instead
> first tries to get y.__str__() in case the latter returns unicode?
> Then print y can succeed the way print y.__str__() does now.
> 
> The same goes for str.__mod__ -- it apparently knows how to deal with '%s'% unicode(y)
> so why shouldn't '%s'%y benefit when y.__str__ returns unicode?
> 
> I.e., str doesn't know that printing and '%s' can use unicode to good effect
> if it available, so for print and str.__mod__ blindly to use str() intermediately
> throws away an opportunity to do better ISTM.
> 
> Regards,
> Bengt Richter

Am I the only person who found it scary that Bengt could apparently 
casually drop on a polynomial the would decode to " Löwis"?

feel-dumb-just-being-in-the-same-newsgroup-ly y'rs  - steve

-- 
http://www.holdenweb.com
http://pydish.holdenweb.com
Holden Web LLC +1 800 494 3119