[Python-Dev] Python3 "complexity"

Matěj Cepl matej at ceplovi.cz
Sat Jan 11 13:37:32 CET 2014


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 2014-01-10, 17:34 GMT, you wrote:
> From my experience, the concept of a default locale is deeply 
> flawed.  What if I log into a (Linux) machine using an old 
> latin-1 putty from the Windows XP era, have most file names 
> and contents in UTF-8 encoding, except for one directory where 
> people from eastern Europe upload files via FTP in whatever 
> encoding they choose. What should the "default" encoding be 
> now?

I know this stuff is really hard and only because I had to fight 
with it for a years (being Czech, so not blessed by Latin-1 
covering my language … actually no living encoding does support 
it completely, but that’s mostly theoretical issue … Latin-2 
used to work for us, and now everybody with civilized OS uses 
UTF-8 of course, not sure what’s the current state of MS 
Windows).

It seems to me that you have some fundamental principles muddled 
together.

a) Locale should be always set for the particular system. I.e., 
in your example above you have two variables only: locale of 
your Windows XP and locale of the Linux box.
b) I know for fact that exactly putty (even on Windows XP) CAN 
translate from UTF-8 on the server to whatever Windows have to 
offer. So, there is no such thing as “latin-1 putty”.
c) Responsibility for filenames on the system stands on whatever 
actually saves the file. So, in this testcase it is a matter of 
correct setting up of the FTP server (I see for example 
http://rhn.redhat.com/errata/RHBA-2012-0187.html and 
https://bugzilla.redhat.com/show_bug.cgi?id=638873 which seem to 
indicate that vsftpd, and what else you would use?, should 
support UTF-8 on filenames). If the server locale supports 
Eastern European filenames and vsftpd supports translation to 
this encoding (hint, hint: UTF-8 does), then you are all set.

> That's why I make it a principle to always unset all LC_* and 
> LANG variables, except when working locally, which happens 
> rather rarely.

That’s a bad idea. Those variables have ALWAYS some value set 
(perhaps default, which tends to be something like en_US.ASCII, 
which is not what you want, fortunately on most Unices these 
days it would be en_US.UTF8, command locale(1) always gives some 
result).

Matěj

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)

iD8DBQFS0TsM4J/vJdlkhKwRAg9+AJ9wuCEnPqbUr6imA2L9ak17svSP3ACePVRp
5MKkSVUQ9G7A+fZVhDGiEC8=
=MXgT
-----END PGP SIGNATURE-----


More information about the Python-Dev mailing list