[Python-Dev] Import and unicode: part two

Wed Jan 26 09:58:36 CET 2011

Toshio Kuratomi writes:

 > Sure ... but with these systems, neither read-modules-as-locale or
 > read-modules-as-utf-8 are a good solution to work, correct?

Good solution, no, but I believe that read-modules-as-locale *should*
work to a great extent.  AFAIK Python 3 reads Python programs as str
(ie, converting to Unicode -- if it doesn't, it *should*<wink>).

 > Especially if the OS does get upgraded but the filesystems with
 > user data (and user created modules) are migrated as-is, you'll run
 > into situations where system installed modules are in utf-8 and
 > user created modules are shift-jis and so something will always be
 > broken.

I don't know what you mean by "system-installed modules".  If you're
talking about Python itself, it's not a problem.  Python doesn't have
any Japanese-named modules in any encoding.

On the other hand, *everything* that involves scripting (shell
scripts, make, etc) related to those filesystems will be broken
*unless* the system, after upgrade but before going live, is converted
to have an appropriate locale encoding.  So I don't really see a
problem here.

The problem is portability across systems, and that is a problem that
only the third-party transports can really deal with.  tar and unzip
need to be taught how to change file names to the locale, etc.

 > The only way to make sure that modules work is to restrict them to ASCII-only
 > on the filesystem.  But because unicode module names are seen as
 > a necessary feature, the question is which way forward is going to lead to
 > the least brokenness.  Which could be locale... but from the python2
 > locale-related bugs that I get to look at, I doubt.

AFAICS this is going to be site-specific.  End of story.  Or, if you
prefer, "maru-nage".<wink>

IMHO, Python 2 locale bugs are unlikely to be a good guide to Python 3
locale bugs because in Python 2 most people just ignore locale and use
"native" strings (~= bytes in Python 3), and that typically "just
works".  In Python 3 that just *doesn't* work any more because you get
a UnicodeError on import, etc, etc.

IMHO, YMMV, and all that.  I know *of* such systems (there remain
quite a few here used by student and research labs), but the ones I
maintain were easy to convert to UTF-8 because I don't export file
systems (except my private files for my own use); everything is
mediated by Apache and Zope, and browsers are happy to cope if I
change from EUC-JP to UTF-8 and then flip the Apache switch to change
default encodings.