[Python-ideas] PEP 540: Add a new UTF-8 mode

Victor Stinner victor.stinner at gmail.com
Wed Jan 11 17:54:10 EST 2017


2017-01-06 10:50 GMT+01:00 M.-A. Lemburg <mal at egenix.com>:
> Victor: I think you are taking the UTF-8 idea a bit too far.
> Nick was trying to address the situation where the locale is
> set to "C", or rather not set at all (in which case the lib C
> defaults to the "C" locale). The latter is a fairly standard
> situation when piping data on Unix or when spawning processes
> which don't inherit the current OS environment.

My PEP 540 is different than Nick's PEP 538, even for the POSIX
locale. I propose to always use the surrogateescape error handler,
whereas Nick wants to keep the strict error handler for inputs and
outputs.
https://www.python.org/dev/peps/pep-0540/#encoding-and-error-handler

The surrogateescape error handler is useful to write programs which
work as pipes, as cat, grep, sed, ... UNIX program:
https://www.python.org/dev/peps/pep-0540/#producer-consumer-model-using-pipes

You can get the behaviour of Nick's PEP 538 using my UTF-8 Strict
mode. Compare "UTF-8 mode" and "UTF-8 Strict mode" lines in the tables
of my use case. The UTF-8 mode always works, but can produce mojibake,
whereas UTF-8 Strict doesn't produce mojibake but can fail depending
on data and the locale.

IMHO most users prefers usability ("just work") over correctness
(prevent mojibake).

So Nick and me don't have exaclty the same scope and use cases.

Victor


More information about the Python-ideas mailing list