Q: The `print' statement over Unicode

Thomas Heller theller at python.net
Wed May 4 11:58:53 EDT 2005


François Pinard <pinard at iro.umontreal.ca> writes:

> Hi, people.  I hope someone would like to enlighten me.
>
> For any application handling Unicode internally, I'm usually careful
> at properly converting those Unicode strings into 8-bit strings before
> writing them out.
>
> However, this morning, I mistakenly forgot to do so before using one
> Unicode string (containing a non-ASCII character) as an argument to
> the `print' statement, and I did _not_ get an error.  This is rather
> surprising to me.  I reread the section of the Python reference manual
> (version 2.3.4, this machine uses 2.3.3 currently), and it does not say
> anything about a special processing for Unicode strings.
>
> In my understanding, when `print' is given an argument which is not
> already a string (I read: 8-bit string), it first gets converted into
> a string (I read: calling __str__).  But if I call `str()' explicitly,
> _then_ I get an error as expected.  The question is, why is there no
> error if I do not call `str()' explicity?
>
> For example, given file `question.py' with this contents:
>
>    # -*- coding: UTF-8 -*-
>    texte = unicode("Fran\xe7ois", 'latin1')
>    print type(texte), repr(texte), texte
>    print type(texte), repr(texte), str(texte)
>
> doing `python question.py' yields:
>
>    <type 'unicode'> u'Fran\xe7ois' François
>    <type 'unicode'> u'Fran\xe7ois'
>    Traceback (most recent call last):
>      File "question.py", line 4, in ?
>        print type(texte), repr(texte), str(texte)
>    UnicodeEncodeError: 'ascii' codec can't encode character u'\xe7' \
>       in position 4: ordinal not in range(128)
>
> (last line wrapped for legibility).
>
> So (trying to be crystal clear), why is the first `print' working over
> its third argument, but not the second?  How does `print' convert that
> Unicode string to a 8-bit string for output, if not through `str()'?
> What is missing to the documentation, or to my way of understanding it?

AFAIK, print uses sys.stdout.encoding to encode the unicode string.

Thomas



More information about the Python-list mailing list