[Python-Dev] Python3 "complexity"

Stefan Ring stefanrin at gmail.com
Fri Jan 10 18:34:22 CET 2014


On Fri, Jan 10, 2014 at 4:35 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> On 10 January 2014 13:32, Lennart Regebro <regebro at gmail.com> wrote:
>> No, because your environment have a default language. And Python has a
>> default encoding. You only get problems when some file doesn't use the
>> default encoding.
>
> The reason Python 3 currently tries to rely on the POSIX locale
> encoding is that during the Python 3 development process it was
> pointed out that ShiftJIS, ISO-2022 and various CJK codec are in
> widespread use in Asia, since Asian users needed solutions to the
> problem of representing kana, ideographs and other non-Latin
> characters long before the Unicode Consortium existed.
>
> This creates a problem for Python 3, as assuming utf-8 means we have a
> high risk of corrupting user's data at least in Asian locales, as well
> as anywhere else where non-UTF-8 encodings are common (especially when
> encodings that aren't ASCII compatible are involved).

>From my experience, the concept of a default locale is deeply flawed.
What if I log into a (Linux) machine using an old latin-1 putty from
the Windows XP era, have most file names and contents in UTF-8
encoding, except for one directory where people from eastern Europe
upload files via FTP in whatever encoding they choose. What should the
"default" encoding be now?

That's why I make it a principle to always unset all LC_* and LANG
variables, except when working locally, which happens rather rarely.


More information about the Python-Dev mailing list