unicode question

Kent Johnson kent3737 at yahoo.com
Sat Nov 20 21:10:14 EST 2004


Martin v. Löwis wrote:
> wolfgang haefelinger wrote:
> 
>> I wonder especially about case 2. I can see that "print y" makes a 
>> call to
>> Y.__str__() . But Y.__str__() can be printed?? So what is 'print' exactly
>> doing?
> 
> 
> It looks at sys.stdout.encoding. If this is set, and the thing to print
> is a unicode string, it converts it to the stream encoding, and prints
> the result of the conversion.

I hate to contradict an expert, but ISTM that it is 
sys.getdefaultencoding() ('ascii') that is the problem, not 
sys.stdout.encoding ('cp437')

gamma converts to cp437 just fine:
 >>> gamma = u"\N{GREEK CAPITAL LETTER GAMMA}"
 >>> sys.stdout.encoding
'cp437'
 >>> gamma.encode(sys.stdout.encoding)
'\xe2'
 >>> print gamma.encode(sys.stdout.encoding)
Γ
(prints a gamma)

Trying to encode gamma using the 'ascii' codec doesn't work:
 >>> str(gamma)
Traceback (most recent call last):
   File "<stdin>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode character u'\u0393' in 
position 0: ordinal not in range(128)

My guess is that internally, print keeps calling str() on its argument 
until it gets a string object. So it calls y.__str__() yielding gamma, 
then gamma.__str__() which raises the error.

If the default encoding is set to cp437 then it works fine:

 >>> import sys
 >>> sys.getdefaultencoding()
'cp437'
 >>> gamma = u"\N{GREEK CAPITAL LETTER GAMMA}"
 >>> str(gamma)
'\xe2'
 >>> print gamma
Γ
(prints a gamma)

 >>> print str(gamma)
Γ
(prints a gamma)

Kent

> 
> Regards,
> Martin



More information about the Python-list mailing list