[Python-Dev] please consider changing --enable-unicode default to ucs4

Zooko O'Whielacronx zookog at gmail.com
Tue Sep 29 19:03:25 CEST 2009


Dear MAL and python-dev:

I failed to explain the problem that users are having.  I will try
again, and this time I will omit my ideas about how to improve things
and just focus on describing the problem.

Some users are having trouble using Python packages containing binary
extensions on Linux.  I want to provide such binary Python packages
for Linux for the pycryptopp project
(http://allmydata.org/trac/pycryptopp ) and the zfec project
(http://allmydata.org/trac/zfec ).  I also want to make it possible
for users to install the Tahoe-LAFS project (http://allmydata.org )
without having a compiler or Python header files.  (You'd be surprised
at how often Tahoe-LAFS users try to do this on Linux.  Linux is no
longer only for people who have the knowledge and patience to compile
software themselves.)  Tahoe-LAFS also depends on many packages that
are maintained by other people and are not packaged or distributed by
me -- pyOpenSSL, simplejson, etc..

There have been several hurdles in the way that we've overcome, and no
doubt there will be more, but the current hurdle is that there are two
"formats" for Python extension modules that are used on Linux -- UCS2
and UCS4.  If a user gets a Python package containing a compiled
extension module which was built for the wrong UCS2/4 setting, he will
get mysterious (to him) "undefined symbol" errors at import time.

On Mon, Sep 28, 2009 at 2:25 AM, M.-A. Lemburg <mal at egenix.com> wrote:
>
> The Python default is UCS2 for a good reason: it's a good trade-off
> between memory consumption, functionality and performance.

I'm sure you are right about this.  At some point I will try to
measure the performance implications in the context of our
application.  I don't think it will be an issue for us, as so far no
users have complained about any performance or functionality problems
that were traceable to the choice of UCS2/4.

> As already mentioned, I also don't understand how the changing
> the Python default on Linux would help your users in any way -
> if you let distutils compile your extensions, it's automatically
> going to use the right Unicode setting for you (as well as your
> users).

My users are using some Python packages built by me and some built by
others.  The binary packages they get from others could have the
incompatible UCS2/4 setting.  Also some of my users might be using a
python configured with the opposite setting of the python interpreter
that I use to build packages.

> Unfortunately, this automatic support doesn't help you when
> shipping e.g. setuptools eggs, but this is a tool problem,
> not one of Python: setuptools completely ignores the fact
> that there are two ways to build Python.

This is the setuptools/distribute issue that I mentioned:
http://bugs.python.org/setuptools/issue78 .  If that issue were solved
then if a user tried to install a specific package, for example with a
command-line like "easy_install
http://allmydata.org/source/tahoe/deps/tahoe-dep-eggs/pyOpenSSL-0.8-py2.5-linux-i686.egg",
then instead of getting an undefined symbol error at import time, they
would get an error message to the effect of "This package is not
compatible with your Python interpreter." at install time.  That would
be good because it would be less confusing to the users.

However, if they were using the default setuptools/distribute
dependency-satisfaction feature, e.g. because they are installing a
package and that package is marked as
"install_requires=['pyOpenSSL']", then setuptools/distribute would do
its fallback behavior in which it attempts to compile the package from
source when it can't find a compatible binary package.  This would
probably confuse the users at least as much as the undefined symbol
error currently does.

In any case, improving the tools to handle incompatible packages
nicely would not make more packages compatible.  Let's do both!
Improve tools to handle incompatible packages nicely, and encourage
everyone who compiles python on Linux to use the same UCS2/4
setting.

Thank you for your attention.

Regards,

Zooko


More information about the Python-Dev mailing list