UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to <undefined>

Thu Feb 26 22:51:30 EST 2009

> (1) what is produced on Anjanesh's machine
>>> sys.getdefaultencoding()
'utf-8'

> (2) it looks like a small snippet from a Python source file!
Its a file containing just JSON data - but has some unicode characters
as well as it has data from the web.

> Anjanesh, Is it a .py file
Its a .json file. I have a bunch of these json files which Im parsing.
using json library.

> Instead of "something like", please report exactly what is there:
>
> print(ascii(open('the_file', 'rb').read()[10442-20:10442+21]))
>>> print(ascii(open('the_file', 'rb').read()[10442-20:10442+21]))
b'":42,"query":"0 1\xc2\xbb\xc3\x9d \\u2021 0\\u201a0 \\u2'

> Trouble with cases like this is as soon as they become interesting, the OP often
snatches somebody's one-liner that "works" (i.e. doesn't raise an exception),
makes a quick break for the county line, and they're not seen again :-)

Actually, I moved the files to my Ubuntu PC which has Python 2.5.2 and
didnt give the encoding issue. I just couldnt spend that much time on
why a couple of these files had encoding issues in Py3 since I had to
parse a whole lot of files.