Q: The `print' statement over Unicode

Sat May 7 08:01:09 EDT 2005

[Thomas Heller]
> François Pinard <pinard at iro.umontreal.ca> writes:

> > [...] given file `question.py' with this contents:

> >    # -*- coding: UTF-8 -*-
> >    texte = unicode("Fran\xe7ois", 'latin1')
> >    print type(texte), repr(texte), texte
> >    print type(texte), repr(texte), str(texte)

> > doing `python question.py' yields:

> >    <type 'unicode'> u'Fran\xe7ois' François
> >    <type 'unicode'> u'Fran\xe7ois'
> >    Traceback (most recent call last):
> >      File "question.py", line 4, in ?
> >        print type(texte), repr(texte), str(texte)
> >    UnicodeEncodeError: 'ascii' codec can't encode character u'\xe7' \
> >       in position 4: ordinal not in range(128)

> > [...] why is the first `print' working over its third argument, but
> > not the second?  How does `print' convert that Unicode string to a
> > 8-bit string for output, if not through `str()'?  What is missing to
> > the documentation, or to my way of understanding it?

> AFAIK, print uses sys.stdout.encoding to encode the unicode string.

Much thanks for this information.

I was not aware of this file attribute.  Looking around, I found a
quick description in the Library Reference, under "2.3.8 File Objects".
However, I did not find in the documentation the rules stating how
or when this attribute receives a value, and in particular here, for
the case of `sys.stdout'.  The Reference Manual, under "6.6 The print
statement", is silent about how Unicode strings are handled.

Am I looking in the wrong places, or else, should not the standard
documentation more handily explain such things?

-- 
François Pinard   http://pinard.progiciels-bpi.ca