[Python-ideas] RFC: PEP 540 version 3 (Add a new UTF-8 mode)

INADA Naoki songofacandy at gmail.com
Thu Jan 12 22:50:01 EST 2017


On Fri, Jan 13, 2017 at 11:40 AM, Stephen J. Turnbull
<turnbull.stephen.fw at u.tsukuba.ac.jp> wrote:
> INADA Naoki writes:
>
>  > But it's not a problem, because changing LC_CTYPE from C to C.UTF-8
>  > doesn't break anything.  It's broken at start.
>  > Use UTF-8 everywhere, anytime is best way to avoid mojibake.
>
> Please stop repeating this; it is invalid as an argument.

Sorry, I meant "If LC_CTYPE is C or C.UTF-8, all other LC_* should be
ASCII or UTF-8.
Otherwise, mojibake is not avoidable regardless PEP 538 or 540."
I didn't meant forcing C.UTF-8 for LC_TIME too.

As you can read, no one propose "Drop non-UTF-8 locale support completely".

>
> The problem is that not everybody does this yet, even today (in fact,
> that's the source of the problem on containers, people are using the C
> locale, not C.utf-8!),

No.  C locale doesn't forbid using UTF-8.  It doesn't determine terminal
encoding too.  It's just a Python's behavior, and it's unlike many
other languages
commonly used.  That's why many people bitten by this problem.

For example, vim uses latin-1 by default for C locale.
Since C locale means nothing about terminal/file/stdio encoding, using
most common byte
transparent encoding seems reasonable choice.
Off course, vim can be configured to use UTF-8, regardless LC_CTYPE.


> and some of us have to use or interoperate with
> systems that don't, even if our own systems do.
>
> If your position really is "Screw them, they're stupid -- let them fix
> their broken systems, it's not our problem",

I never said such thing.


> I can understand that but
> we'll have to agree to disagree.  My position is that we need to
>
> (1) determine if this change actually can cause problems for Python
>     users on such systems or interoperating with such systems

Sure.  That's what I tested.  "If people using non UTF-8 LC_TIME and LC_CTYPE=C,
thare is mojibake already.  This change doesn't break anything."

> (2) determine how serious the problems are with the "force UTF-8 in
>     certain situations" approach vs. the status quo
> (3) compare the damage both ways,
> (4) if there is a conflict, consider whether a modified proposal would
>     work as well or better in more circumstances.
>
> I think that is consistent with past Python practice on encoding
> issues.

Sure.
And there is unavoidable conflict, default behavior should be for more
common usage.


More information about the Python-ideas mailing list