[Tutor] UnicodeDecodeError while parsing a .csv file.

eryksun eryksun at gmail.com
Tue Oct 29 03:24:38 CET 2013


On Mon, Oct 28, 2013 at 7:49 PM, Steven D'Aprano <steve at pearwood.info> wrote:
>
> By default Python 3 uses UTF-8 when reading files. As the error below
> shows, your file actually isn't UTF-8.

Modules default to UTF-8, but io.TextIOWrapper defaults to the locale
preferred encoding. To handle terminals, it first tries
os.device_encoding (i.e. _Py_device_encoding). Otherwise for files it
defaults to locale.getpreferredencoding(False).

On Windows, getpreferredencoding uses _locale._getdefaultlocale, which
calls Windows GetACP to get the ANSI codepage (e.g. 1252).

For POSIX, if CODESET is defined, getpreferredencoding uses
_locale.nl_langinfo. Otherwise it falls back to using
getdefaultlocale, and ultimately to 'ascii'. getdefaultlocale parses
the environment variables LC_ALL, LC_CTYPE, LANG, and LANGUAGE.


More information about the Tutor mailing list