printing unicode strings

Peter Otten __peter__ at web.de
Tue Jul 24 18:40:35 EDT 2007


7stud wrote:

> Can anyone tell me why I can print out the individual variables in the
> following code, but when I print them out combined into a single
> string, I get an error?
> 
> symbol = u'ibm'
> price = u'4 \xbd'  # 4 1/2
> 
> print "%s" % symbol
> print "%s" % price.encode("utf-8")
> print "%s %s" % (symbol, price.encode("utf-8") )
> 
> --output:--
> ibm
> 4 1/2
> File "pythontest.py", line 6, in ?
>     print "%s %s" % (symbol, price.encode("utf-8") )
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position
> 2: ordinal not in range(128)

For format % args, if the format or any arg is a unicode string, the result
will be unicode, too. This implies that byte strings have to be decoded,
and for that process the default ascii codec is used. In your example

> print "%s %s" % (symbol, price.encode("utf-8") )

symbol is a unicode, so python tries to decode "%s %s" and "4 \xc2\xbd"
(the result of price.encode("utf8")). The latter contains non-ascii chars
and fails.

Solution: use unicode throughout and let the print statement do the
encoding.

>>> symbol = u"ibm"
>>> price = u"4 \xbd"
>>> print u"%s %s" % (symbol, price)
ibm 4 ?

Sometimes, e. g. if you redirect stdout, the above can fail. Here's a
workaround that uses utf8 in such cases.

import sys
if sys.stdout.encoding is None:
    import codecs
    sys.stdout = codecs.lookup("utf8").streamwriter(sys.stdout)

Peter




More information about the Python-list mailing list