|Title:||ABI version tagged .so files|
|Last-Modified:||2010-09-13 14:18:43 +0000 (Mon, 13 Sep 2010)|
|Author:||Barry Warsaw <barry at python.org>|
PEP 3147  described an extension to Python's import machinery that improved the sharing of Python source code, by allowing more than one byte compilation file (.pyc) to be co-located with each source file.
This PEP defines an adjunct feature which allows the co-location of extension module files (.so) in a similar manner. This optional, build-time feature will enable downstream distributions of Python to more easily provide more than one Python major version at a time.
PEP 3147 defined the file system layout for a pure-Python package, where multiple versions of Python are available on the system. For example, where the alpha package containing source modules one.py and two.py exist on a system with Python 3.2 and 3.3, the post-byte compilation file system layout would be:
alpha/ __pycache__/ __init__.cpython-32.pyc __init__.cpython-33.pyc one.cpython-32.pyc one.cpython-33.pyc two.cpython-32.pyc two.cpython-33.pyc __init__.py one.py two.py
For packages with extension modules, a similar differentiation is needed for the module's .so files. Extension modules compiled for different Python major versions are incompatible with each other due to changes in the ABI. Different configuration/compilation options for the same Python version can result in different ABIs (e.g. --with-wide-unicode).
While PEP 384  defines a stable ABI, it will minimize, but not eliminate extension module incompatibilities between Python builds or major versions. Thus a mechanism for discriminating extension module file names is proposed.
Linux distributions such as Ubuntu  and Debian  provide more than one Python version at the same time to their users. For example, Ubuntu 9.10 Karmic Koala users can install Python 2.5, 2.6, and 3.1, with Python 2.6 being the default.
In order to share as much as possible between the available Python versions, these distributions install third party package modules (.pyc and .so files) into /usr/share/pyshared and symlink to them from /usr/lib/pythonX.Y/dist-packages. The symlinks exist because in a pre-PEP 3147 world (i.e < Python 3.2), the .pyc files resulting from byte compilation by the various installed Pythons will name collide with each other. For Python versions >= 3.2, all pure-Python packages can be shared, because the .pyc files will no longer cause file system naming conflicts. Eliminating these symlinks makes for a simpler, more robust Python distribution.
A similar situation arises with shared library extensions. Because extension modules are typically named foo.so for a foo extension module, these would also name collide if foo was provided for more than one Python version.
In addition, because different configuration/compilation options for the same Python version can cause different ABIs to be presented to extension modules. On POSIX systems for example, the configure options --with-pydebug, --with-pymalloc, and --with-wide-unicode all change the ABI. This PEP proposes to encode build-time options in the file name of the .so extension module files.
PyPy  can also benefit from this PEP, allowing it to avoid name collisions in extension modules built for its API, but with a different .so tag.
The configure/compilation options chosen at Python interpreter build-time will be encoded in the shared library file name for extension modules. This "tag" will appear between the module base name and the operation file system extension for shared libraries.
The following information MUST be included in the shared library file name:
- The Python implementation (e.g. cpython, pypy, jython, etc.)
- The interpreter's major and minor version numbers
These two fields are separated by a hyphen and no dots are to appear between the major and minor version numbers. E.g. cpython-32.
Python implementations MAY include additional flags in the file name tag as appropriate. For example, on POSIX systems these flags will also contribute to the file name:
- --with-pydebug (flag: d)
- --with-pymalloc (flag: m)
- --with-wide-unicode (flag: u)
By default in Python 3.2, configure enables --with-pymalloc so shared library file names would appear as foo.cpython-32m.so. When the other two flags are also enabled, the file names would be foo.cpython-32dmu.so.
The shared library file name tag is used unconditionally; it cannot be changed. The tag and extension module suffix are available through the sysconfig modules via the following variables:
>>> sysconfig.get_config_var('SO') '.cpython-32mu.so' >>> sysconfig.get_config_var('SOABI') 'cpython-32mu'
Note that $SOABI contains just the tag, while $SO includes the platform extension for shared library files, and is the exact suffix added to the extension module name.
For an arbitrary package foo, you might see these files when the distribution package was installed:
(These paths are for example purposes only. Distributions are free to use whatever filesystem layout they choose, and nothing in this PEP changes the locations where from-source builds of Python are installed.)
Python's dynamic module loader will recognize and import shared library extension modules with a tag that matches its build-time options. For backward compatibility, Python will also continue to import untagged extension modules, e.g. foo.so.
This shared library tag would be used globally for all distutils-based extension modules, regardless of where on the file system they are built. Extension modules built by means other than distutils would either have to calculate the tag manually, or fallback to the non-tagged .so file name.
The approach described here is already proven, in a sense, on Debian and Ubuntu system where different extensions are used for debug builds of Python and extension modules. Debug builds on Windows also already use a different file extension for dynamic libraries, and in fact encoded (in a different way than proposed in this PEP) the Python major and minor version in the .dll file name.
This PEP only addresses build issues on POSIX systems that use the configure script. While Windows or other platform support is not explicitly disallowed under this PEP, platform expertise is needed in order to evaluate, describe, and implement support on such platforms. It is not currently clear that the facilities in this PEP are even useful for Windows.
PEP 384 defines a stable ABI for extension modules. In theory, universal adoption of PEP 384 would eliminate the need for this PEP because all extension modules could be compatible with any Python version. In practice of course, it will be impossible to achieve universal adoption, and as described above, different built-time flags still affect the ABI. Thus even with a stable ABI, this PEP may still be necessary. While a complete specification is reserved for PEP 384, here is a discussion of the relevant issues.
PEP 384 describes a change to PyModule_Create() where 3 is passed as the API version if the extension was complied with Py_LIMITED_API. This should be formalized into an official macro called PYTHON_ABI_VERSION to mirror PYTHON_API_VERSION. If and when the ABI changes in an incompatible way, this version number would be bumped. To facilitate sharing, Python would be extended to search for extension modules with the PYTHON_ABI_VERSION number in its name. The prefix abi is reserved for Python's use.
Thus, an initial implementation of PEP 384, when Python is configured with the default set of flags, would search for the following file names when extension module foo is imported (in this order):
foo.cpython-XYm.so foo.abi3.so foo.so
The distutils  build_ext command would also have to be extended to compile to shared library files with the abi3 tag, when the module author indicates that their extension supports that version of the ABI. This could be done in a backward compatible way by adding a keyword argument to the Extension class, such as:
Extension('foo', ['foo.c'], abi=3)
- --with-pydebug would not be supported by the stable ABI because this changes the layout of PyObject, which is an exposed structure.
- --with-pymalloc has no bearing on the issue.
- --with-wide-unicode is trickier, though Martin's inclination is to force the stable ABI to use a Py_UNICODE that matches the platform's wchar_t.
In the initial python-dev thread  where this idea was first introduced, several alternatives were suggested. For completeness they are listed here, along with the reasons for not adopting them.
Work on this code is tracked in a Bazaar branch on Launchpad  until it's ready for merge into Python 3.2. The work-in-progress diff can also be viewed  and is updated automatically as new changes are uploaded.
This document has been placed in the public domain.