[Tutor] Assistance with UnicodeDecodeError

James Chapman james at uplinkzero.com
Wed Feb 4 18:01:40 CET 2015


>
> I am trying to scrap text from a website using Python 2.7 in windows 8 and
> i am getting this error *"**UnicodeDecodeError: 'charmap codec can't encode
> character u'\u2014 in position 11231 character maps to <undefined>"*
>
>
For starters, move away from Python 2 unless you have a good reason to use
it. Unicode is built into Python 3 whereas it's an after thought in Python
2.

What's happening is that python doesn't understand the character set in use
and it's throwing the exception. You need to tell python what encoding to
use: (not all website are "utf-8")


Code example (using python 2.7):

>>> u = u'\u2014'
>>> print(u)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "c:\Python27\lib\encodings\cp850.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character u'\u2014' in
position 0: character maps to <undefined>
>>> s = u.encode("utf-8")
>>> print(s)
ÔÇö



I also strongly suggest you read:
https://docs.python.org/2/howto/unicode.html

There is much cursing to come. Unicode and especially multi-byte character
string processing is a nightmare!
Good luck ;-)

James


More information about the Tutor mailing list