[Python-Dev] Python3 "complexity"
MatÄj Cepl
matej at ceplovi.cz
Sat Jan 11 13:37:32 CET 2014
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 2014-01-10, 17:34 GMT, you wrote:
> From my experience, the concept of a default locale is deeply
> flawed. What if I log into a (Linux) machine using an old
> latin-1 putty from the Windows XP era, have most file names
> and contents in UTF-8 encoding, except for one directory where
> people from eastern Europe upload files via FTP in whatever
> encoding they choose. What should the "default" encoding be
> now?
I know this stuff is really hard and only because I had to fight
with it for a years (being Czech, so not blessed by Latin-1
covering my language … actually no living encoding does support
it completely, but that’s mostly theoretical issue … Latin-2
used to work for us, and now everybody with civilized OS uses
UTF-8 of course, not sure what’s the current state of MS
Windows).
It seems to me that you have some fundamental principles muddled
together.
a) Locale should be always set for the particular system. I.e.,
in your example above you have two variables only: locale of
your Windows XP and locale of the Linux box.
b) I know for fact that exactly putty (even on Windows XP) CAN
translate from UTF-8 on the server to whatever Windows have to
offer. So, there is no such thing as “latin-1 putty”.
c) Responsibility for filenames on the system stands on whatever
actually saves the file. So, in this testcase it is a matter of
correct setting up of the FTP server (I see for example
http://rhn.redhat.com/errata/RHBA-2012-0187.html and
https://bugzilla.redhat.com/show_bug.cgi?id=638873 which seem to
indicate that vsftpd, and what else you would use?, should
support UTF-8 on filenames). If the server locale supports
Eastern European filenames and vsftpd supports translation to
this encoding (hint, hint: UTF-8 does), then you are all set.
> That's why I make it a principle to always unset all LC_* and
> LANG variables, except when working locally, which happens
> rather rarely.
That’s a bad idea. Those variables have ALWAYS some value set
(perhaps default, which tends to be something like en_US.ASCII,
which is not what you want, fortunately on most Unices these
days it would be en_US.UTF8, command locale(1) always gives some
result).
Matěj
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iD8DBQFS0TsM4J/vJdlkhKwRAg9+AJ9wuCEnPqbUr6imA2L9ak17svSP3ACePVRp
5MKkSVUQ9G7A+fZVhDGiEC8=
=MXgT
-----END PGP SIGNATURE-----
More information about the Python-Dev
mailing list