Notice: While Javascript is not essential for this website, your interaction with the content will be limited. Please turn Javascript on for the full experience.

PEP 597 -- Soft deprecation of default encoding

PEP:597
Title:Soft deprecation of default encoding
Last-Modified:23-Jun-2020
Author:Inada Naoki <songofacandy at gmail.com>
Discussions-To:https://discuss.python.org/t/3880
Status:Draft
Type:Standards Track
Created:05-Jun-2019
Python-Version:3.10

Abstract

This PEP proposes:

  • TextIOWrapper raises a PendingDeprecationWarning when the encoding option is not specified and dev mode is enabled.
  • Add encoding="locale" option to TextIOWrapper. It behaves like encoding=None but don't raise a warning.
  • Add io.LOCALE_ENCODING = "locale" constant to avoid confusing LookupError.

Motivation

Using the default encoding is a common mistake

Developers using macOS or Linux may forget that the default encoding is not always UTF-8.

For example, long_description = open("README.md").read() in setup.py is a common mistake. Many Windows users can not install the package if there is at least one non-ASCII character (e.g. emoji) in the README.md file which is encoded in UTF-8.

For example, 489 packages of the 4000 most downloaded packages from PyPI used non-ASCII characters in README. And 82 packages of them can not be installed from source package when locale encoding is ASCII. [1] They used the default encoding to read README or TOML file.

Another example is logging.basicConfig(filename="log.txt"). Some users expect UTF-8 is used by default, but locale encoding is used actually. [2]

Even Python experts assume that default encoding is UTF-8. It creates bugs that happen only on Windows. See [3] and [4].

Raising a warning when the encoding option is omitted will help to find such mistakes.

Prepare to change the default encoding to UTF-8

We chose to use locale encoding for the default text encoding in Python 3.0. But UTF-8 has been adopted very widely since then.

We might change the default text encoding to UTF-8 in the future. But this change will affect many applications and libraries. Many DeprecationWarning will be raised if we start raising the warning by default. It will be too noisy.

While this PEP doesn't cover the change, this PEP will help to reduce the number of DeprecationWarning in the future.

Specification

Raising a PendingDeprecationWarning

TextIOWrapper raises the PendingDeprecationWarning when the encoding option is omitted and dev mode is enabled.

encoding="locale" option

When encoding="locale" is specified to the TextIOWrapper, it behaves same to encoding=None except it doesn't raise warning. In detail, the encoding is chosen by this order:

  1. os.device_encoding(buffer.fileno())
  2. locale.getpreferredencoding(False)

This option can be used to use the locale encoding explicitly and suppress the PendingDeprecationWarning.

io.LOCALE_ENCODING

io module has io.LOCALE_ENCODING = "locale" constant. This constant can be used to avoid confusing LookupError: unknown encoding: locale error when the code is run in Python older than 3.10 accidentally.

The constant can be used to test that encoding="locale" option is supported too.

# Want to suppress the Warning in dev mode but still need support
# old Python versions.
locale_encoding = getattr(io, "LOCALE_ENCODING", None)
with open(filename, encoding=locale_encoding) as f:
    ...

io.text_encoding

TextIOWrapper is used indirectly in most cases. For example, open, and pathlib.Path.read_text() use it. Warning to these functions doesn't make sense. Callers of these functions should be warned instead.

io.text_encoding(encoding, stacklevel=1) is a helper function for it. Pure Python implementation will be like this:

def text_encoding(encoding, stacklevel=1):
    """
    Helper function to choose the text encoding.

    When encoding is not None, just return it.
    Otherwise, return the default text encoding ("locale" for now),
    and raise a PendingDeprecationWarning in dev mode.

    This function can be used in APIs having encoding=None option.
    But please consider encoding="utf-8" for new APIs.
    """
    if encoding is None:
        if sys.flags.dev_mode:
            import warnings
            warnings.warn(
                    "'encoding' option is not specified. The default encoding "
                    "might be changed to 'utf-8' in the future",
                    PendingDeprecationWarning, stacklevel + 2)
        encoding = LOCALE_ENCODING
    return encoding

pathlib.Path.read_text() can use this function like this:

def read_text(self, encoding=None, errors=None):
    """
    Open the file in text mode, read it, and close the file.
    """
    encoding = io.text_encoding(encoding)
    with self.open(mode='r', encoding=encoding, errors=errors) as f:
        return f.read()

subprocess module doesn't warn

While the subprocess module uses TextIOWrapper, it doesn't raise PendingDeprecationWarning. It uses the io.LOCALE_ENCODING by default.

Rationale

"locale" is not a codec alias

We don't add the "locale" to the codec alias because locale can be changed in runtime.

Additionally, TextIOWrapper checks os.device_encoding() when encoding=None. This behavior can not be implemented in the codec.

Use a PendingDeprecationWarning

This PEP doesn't cover changing the default encoding to UTF-8. So we use PendingDeprecationWarning instead of DeprecationWarning for now.

Raise warning only in dev mode

This PEP will produce a huge amount of PendingDeprecationWarning. It will be too noisy for most Python developers.

We need to fix all warnings in the standard library. We need to wait pip and major dev tools like pytest fix warnings before raising this warning by default.

subprocess module doesn't warn

The default encoding for PIPE is relating to the encoding of the stdio than the default encoding of TextIOWrapper. So this PEP doesn't propose to raise warning from the subprocess module.

References

[1]"Packages can't be installed when encoding is not UTF-8" (https://github.com/methane/pep597-pypi-ascii)
[2]"Logging - Inconsistent behaviour when handling unicode" (https://bugs.python.org/issue37111)
[3]Packaging tutorial in packaging.python.org didn't specify encoding to read a README.md (https://github.com/pypa/packaging.python.org/pull/682)
[4]json.tool had used locale encoding to read JSON files. (https://bugs.python.org/issue33684)
Source: https://github.com/python/peps/blob/master/pep-0597.rst