[Python-Dev] Import and unicode: part two

Victor Stinner victor.stinner at haypocalc.com
Thu Jan 20 12:51:29 CET 2011


Le mercredi 19 janvier 2011 à 20:39 -0800, Toshio Kuratomi a écrit :
> Teaching students to write non-portable code (relying on filesystem encoding
> where your solution is, don't upload to pypi anything that has non-ascii
> filenames) seems like the exact opposite of how you'd want to shape a young
> student's understanding of good programming practices.

That was already discuted before: see PEP 3131.
http://www.python.org/dev/peps/pep-3131/#common-objections

If the teacher choose to use non-ASCII, (s)he is responsible to explain
the consequences to his/her students :-)

> > In a school, you can use the same configuration
> > (encoding) on all computers.
> > 
> In a school computer lab perhaps.  But not on all the students' and
> professors' machines.  How many professors will be cursing python when they
> discover that the example code that they wrote on their Linux workstation
> doesn't work when the students try to use it in their windows computer lab?

Because some students use a stupid or misconfigured OS, Python should
only accept ASCII names? So, why do Python 3 support non-ASCII
filenames: it is very well known that non-ASCII filenames is the root in
many troubles! Should we simply drop unicode support for all filenames?
And maybe restrict bytes filenames to bytes in [0; 127]? Or better,
restrict to [32; 126] (U+007f causes some troubles in some terminals).

I think that in 2011, non-ASCII filenames are well supported on all
(modern) operating systems. Issues with non-ASCII filenames are OS
specific and should be fixed by the user (the admin of the computer).

> Additionally, those other filesystem operations have
> been growing the ability to take byte values and encoding parameters because
> unicode translation via a single filesystem encoding is a good default but
> not a complete solution.

If you are unable to configure correctly your system to decode/encode
correctly filenames, you should just avoid non-ASCII characters in the
module names.

You only give theorical arguments: did you at least try to use non-ASCII
module names on your system with Python 3.2? I suppose that it will just
work and you will never notice that the unicode module name (on "import
café") in encoded to bytes.

It fails on on OSes using filesystem encodings other than UTF-8 (eg.
Windows)... because of a Python bug, and I just asked if I have to fix
this bug (or if we should deny non-ASCII names). If the bug is fixed, it
will works everywhere.

> Your solution creates modules which aren't portable

More and more operating systems use a filesystem encoding able to encode
any Unicode characters. ASCII-only always give you the best portability,
but I think that today you can start to play with (at least) ISO-8859-1
characters (café should work on all operating systems). If you don't
Unicode issues (I personally love them!), just use ASCII everywhere.

> One of my proposals creates python code which isn't portable.  The other one
> suffers some of the same disadvantages as your solution in portability but
> allows for tools that could automatically correct modules.

__import__('café'.encode('UTF-8')) or
__import__('café'.encode('ISO-8859-1')) is less portable than
__import__('café').

> You think that if a module is named appropriately on one system but is not portable to another
> system, that's fine.

No, I am not saying that.

I say that if your name is broken while you transfer your project from a
system to another (eg. decompressing an archive creates filenames with
mojibake in the filenames), you should fix your transfer procedure (eg.
use another archive format, use a script to fix filenames, or anything
else), but don't try to handle invalid filenames.

> Setting system locale to ASCII for use in system-wide scripts

This is stupid :-) Yes, on such system you, cannot open *any* non-ASCII
file with Python 3 (except if you work, as Python 2, on bytes
filenames).

Python cannot do anything to improve Unicode support on such system:
only the administrator have to something to do for that.

I know that you can give me many examples of systems where Unicode
doesn't work because the system is not correctly configured. But my
opinion is that we should support non-ASCII names because there are
somewhere "some" systems where Unicode is fully functionnal :-)

Victor



More information about the Python-Dev mailing list