UTF-8 Encoding Error

subhabangalore at gmail.com subhabangalore at gmail.com
Fri Dec 23 01:38:15 EST 2016


I am getting the error:
UnicodeDecodeError: 'utf8' codec can't decode byte 0x96 in position 15: invalid start byte

as I try to read some files through TaggedCorpusReader. TaggedCorpusReader is a module
of NLTK.
My files are saved in ANSI format in MS-Windows default. 
I am using Python2.7 on MS-Windows 7. 

I have tried the following options till now, 
string.encode('utf-8').strip()
unicode(string)
unicode(str, errors='replace')
unicode(str, errors='ignore')
string.decode('cp1252')

But nothing is of much help.

If any one may kindly suggest.

I am trying if you may see.



More information about the Python-list mailing list