[Python-Dev] Unicode decode exception

Sun Nov 30 23:19:44 EST 2014

The default encoding is "UTF-8". It works if I do:

with open("filename", errors="ignore") as f:
    ....

So I think Python2, by default, ignores all errors whereas Python3 doesn't

On 1 December 2014 at 01:49, Chris Angelico <rosuav at gmail.com> wrote:
> On Sun, Nov 30, 2014 at 7:07 PM, balaji marisetti
> <balajimarisetti at gmail.com> wrote:
>> Hi,
>
> Hi. This list is for the development *of* Python, not development
> *with* Python, so I'm sending this reply also to
> python-list at python.org where it can be better handled. You'll probably
> want to subscribe here:
>
> https://mail.python.org/mailman/listinfo/python-list
>
> or alternatively, point a news reader at comp.lang.python. Let's
> continue this conversation on python-list rather than python-dev.
>
>> When I try to iterate through the lines of a
>> file("openssl-1.0.1j/crypto/bn/asm/x86_64-gcc.c"), I get a
>> UnicodeDecodeError (in python 3.4.0 on Ubuntu 14.04). But there is no
>> such error with python 2.7.6. What could be the problem?
>
> The difference between the two Python versions is that 2.7 lets you be
> a bit sloppy about Unicode vs bytes, but 3.4 requires that you keep
> them properly separate.
>
>> In [39]: with open("openssl-1.0.1j/crypto/bn/asm/x86_64-gcc.c") as f:
>>                  for line in f:
>>                      print (line)
>>
>> ---------------------------------------------------------------------------
>> UnicodeDecodeError                        Traceback (most recent call last)
>> <ipython-input-39-24a3ae32a691> in <module>()
>>       1 with open("../openssl-1.0.1j/crypto/bn/asm/x86_64-gcc.c") as f:
>> ----> 2     for line in f:
>>       3         print (line)
>>       4
>>
>> /usr/lib/python3.4/codecs.py in decode(self, input, final)
>>     311         # decode input (taking the buffer into account)
>>     312         data = self.buffer + input
>> --> 313         (result, consumed) = self._buffer_decode(data,
>> self.errors, final)
>>     314         # keep undecoded input until the next call
>>     315         self.buffer = data[consumed:]
>>
>>
>> --
>> :-)balaji
>
> Most likely, the line of input that you just reached has a non-ASCII
> character, and the default encoding is ASCII. (Though without the
> actual exception message, I can't be sure of that.) The best fix would
> be to know what the file's encoding is, and simply add that as a
> parameter to your open() call - perhaps this:
>
> with open("filename", encoding="utf-8") as f:
>
> If you use the right encoding, and the file is correctly encoded, you
> should have no errors. If you still have errors... welcome to data
> problems, life can be hard. :|
>
> ChrisA

-- 
:-)balaji