[Tutor] ascii codec cannot encode character

Peter Otten __peter__ at web.de
Fri Jan 28 09:43:55 CET 2011


Alex Hall wrote:

> Hello again:
> I have never seen this message before. I am pulling xml from a site's
> api and printing it, testing the wrapper I am writing for the api. I
> have never seen this error until just now, in the twelfth result of my
> search:
> UnicodeEncodeError: 'ASCII' codec can't encode character u'\u2019' in
> position 42: ordinal not in range(128)
> 
> I tried making the strings Unicode by saying something like
> self.title=unicode(data.find("title").text)
> but the same error appeared. I found the manual chapter on this, but I
> am not sure I want to ignore since I do not know what this character
> (or others) might mean in the string. I am not clear on what 'replace'
> will do. Any suggestions?

You get a UnicodeEncodeError if you print a unicode string containing non-
ascii characters, and Python cannot determine the target's encoding:

$ cat tmp.py
# -*- coding: utf-8 -*-
print u'äöü'

$ python tmp.py
äöü
$ python tmp.py > tmp.txt
Traceback (most recent call last):
  File "tmp.py", line 2, in <module>
    print u'äöü'
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: 
ordinal not in range(128)

The error occurs because by default Python 2 tries to convert unicode into 
bytes using the ascii codec.

One approach to tackle this is to check sys.stdout's encoding, and if it's 
unknown (None) wrap it into a codecs.Writer that can handle all characters 
that may occur. UTF-8 is usually a good choice, but other codecs are 
possible.

$ cat tmp2.py
# -*- coding: utf-8 -*-
import sys

if sys.stdout.encoding is None:
    import codecs
    Writer = codecs.getwriter("utf-8")
    sys.stdout = Writer(sys.stdout)

print u'äöü'
$ python tmp2.py
äöü
$ python tmp2.py > tmp.txt
$ cat tmp.txt
äöü




More information about the Tutor mailing list