printing unicode strings
Peter Otten
__peter__ at web.de
Tue Jul 24 18:40:35 EDT 2007
7stud wrote:
> Can anyone tell me why I can print out the individual variables in the
> following code, but when I print them out combined into a single
> string, I get an error?
>
> symbol = u'ibm'
> price = u'4 \xbd' # 4 1/2
>
> print "%s" % symbol
> print "%s" % price.encode("utf-8")
> print "%s %s" % (symbol, price.encode("utf-8") )
>
> --output:--
> ibm
> 4 1/2
> File "pythontest.py", line 6, in ?
> print "%s %s" % (symbol, price.encode("utf-8") )
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position
> 2: ordinal not in range(128)
For format % args, if the format or any arg is a unicode string, the result
will be unicode, too. This implies that byte strings have to be decoded,
and for that process the default ascii codec is used. In your example
> print "%s %s" % (symbol, price.encode("utf-8") )
symbol is a unicode, so python tries to decode "%s %s" and "4 \xc2\xbd"
(the result of price.encode("utf8")). The latter contains non-ascii chars
and fails.
Solution: use unicode throughout and let the print statement do the
encoding.
>>> symbol = u"ibm"
>>> price = u"4 \xbd"
>>> print u"%s %s" % (symbol, price)
ibm 4 ?
Sometimes, e. g. if you redirect stdout, the above can fail. Here's a
workaround that uses utf8 in such cases.
import sys
if sys.stdout.encoding is None:
import codecs
sys.stdout = codecs.lookup("utf8").streamwriter(sys.stdout)
Peter
More information about the Python-list
mailing list