| PEP: | 376 |
|---|---|
| Title: | Changing the .egg-info structure |
| Version: | 75414 |
| Last-Modified: | 2009-10-14 20:39:17 +0200 (Wed, 14 Oct 2009) |
| Author: | Tarek Ziadé <tarek at ziade.org> |
| Status: | Draft |
| Type: | Standards Track |
| Content-Type: | text/x-rst |
| Created: | 22-Feb-2009 |
| Python-Version: | 2.7, 3.2 |
| Post-History: |
Contents
- Abstract
- Definitions
- Rationale
- .egg-info becomes a directory
- Adding a RECORD file in the .egg-info directory
- Adding an INSTALLER file in the .egg-info directory
- Adding a REQUESTED file in the .egg-info directory
- New APIs in pkgutil
- PEP 262 replacement
- Adding an Uninstall function
- Adding an Uninstall script
- Backward compatibility and roadmap
- References
- Acknowledgements
- Copyright
Abstract
This PEP proposes various enhancements for Distutils:
- A new format for the .egg-info structure.
- Some APIs to read the meta-data of a distribution.
- A replacement PEP 262.
- An uninstall feature.
Definitions
A distribution is a collection of files, which can be Python modules, extensions, or data. A distribution is managed by a special module called setup.py which contains a call to the distutils.core.setup function. The arguments passed to that function describe the distribution, like its name, its version, and so on.
Distutils provides, among other things, commands that can be called through the shell using the setup.py script. An sdist command is provided for instance to create a source distribution archive. An install command is also provided to perform an installation of the distribution in the Python installation the script is invoked with:
$ python setup.py install
See the Distutils [1] documentation for more information.
Once installed, the elements are located in various places in the system, like:
- In Python's site-packages (Python modules, Python modules organized into packages, Extensions, etc.)
- In Python's include directory.
- In Python's bin or Script directory.
- Etc.
Rationale
There are two problems right now in the way distributions are installed in Python:
- There are too many ways to do it.
- There is no API to get the metadata of installed distributions.
How distributions are installed
Right now, when a distribution is installed in Python, the elements it contains are installed in various directories.
The pure Python code, for instance, is installed in the purelib directory which is located in the Python installation at lib/python2.6/site-packages for example under Unix-like systems or Mac OS X, and in Lib\site-packages under Windows. This is done with the Distutils install command, which calls various subcommands.
The install_egg_info subcommand is called during this process in order to create an .egg-info file in the purelib directory.
For example, for the docutils distribution, which contains one package an extra module and executable scripts, three elements are installed in site-packages:
- docutils: The docutils package.
- roman.py: An extra module used by docutils.
- docutils-0.5-py2.6.egg-info: A file containing the distribution metadata as described in PEP 314 [3]. This file corresponds to the file called PKG-INFO, built by the sdist command.
Some executable scripts, such as rst2html.py, are also be added in the bin directory of the Python installation.
Another project called setuptools [4] has two other formats to install distributions, called EggFormats [7]:
- a self-contained .egg directory, that contains all the distribution files and the distribution metadata in a file called PKG-INFO in a subdirectory called EGG-INFO. setuptools creates other fils in that directory that can be considered as complementary metadata.
- a .egg-info directory installed in site-packages, that contains the same files EGG-INFO has in the .egg format.
The first format is automatically used when you install a distribution that uses the setuptools.setup function in its setup.py file, instead of the distutils.core.setup one.
The setuptools project also provides an executable script called easy_install [5] that installs all distributions, including distutils-based ones in self-contained .egg directories.
If you want to have a standalone .egg.info directory distributions, e.g. the second setuptools format, you have to force it when you work with a setuptools-based distribution or with the easy_install script. You can force it by using the -–single-version-externally-managed option or the --root option.
This option is used by :
Uninstall information
Distutils doesn't provide an uninstall command. If you want to uninstall a distribution, you have to be a power user and remove the various elements that were installed, and then look over the .pth file to clean them if necessary.
And the process differs depending on the tools you have used to install the distribution and if the distribution's setup.py uses Distutils or Setuptools.
Under some circumstances, you might not be able to know for sure that you have removed everything, or that you didn't break another distribution by removing a file that is shared among several distributions.
But there's a common behavior: when you install a distribution, files are copied in your system. And it's possible to keep track of these files for later removal.
What this PEP proposes
To address those issues, this PEP proposes a few changes:
- A new .egg-info structure using a directory, based on one format of the EggFormats standard from setuptools.
- New APIs in pkgutil to be able to query the information of installed distributions.
- A de-facto replacement for PEP 262
- An uninstall function and an uninstall script in Distutils.
.egg-info becomes a directory
As explained earlier, the EggFormats standard from setuptools proposes two formats to install the metadata information of a distribution:
- A self-contained directory that can be zipped or left unzipped and contains the distribution files and an .egg-info directory containing the metadata.
- A distinct .egg-info directory located in the site-packages directory, with the metadata inside.
This PEP proposes to keep just one format and make it the standard way to install the metadata of a distribution : a distinct .egg-info directory located in the site-packages directory, containing the metadata.
This .egg-info directory contains a PKG-INFO file built by the write_pkg_file method of the Distribution class in Distutils.
This change does not impact Python itself because the metadata files are not used anywhere yet in the standard library besides Distutils.
It does impact the setuptools and pip projects, but given the fact that they already work with a directory that contains a PKG-INFO file, the change will have no deep consequences.
Let's take an example of the new format with the docutils distribution. The elements installed in site-packages are:
- docutils/
- roman.py
- docutils-0.5-py2.6.egg-info/
PKG-INFO
The syntax of the egg-info directory name is as follows:
name + '-' + version + '.egg-info'
The egg-info directory name is created using a new function called egginfo_dirname(name, version) added to pkgutil. name is converted to a standard distribution name by replacing any runs of non-alphanumeric characters with a single '-'. version is converted to a standard version string. Spaces become dots, and all other non-alphanumeric characters (except dots) become dashes, with runs of multiple dashes condensed to a single dash. Both attributes are then converted into their filename-escaped form, i.e. any '-' characters are replaced with '_' other than the one in 'egg-info' and the one separating the name from the version number.
Examples:
>>> egginfo_dirname('docutils', '0.5')
'docutils-0.5.egg-info'
>>> egginfo_dirname('python-ldap', '2.5')
'python_ldap-2.5.egg-info'
>>> egginfo_dirname('python-ldap', '2.5 a---5')
'python_ldap-2.5.a_5.egg-info'
Adding a RECORD file in the .egg-info directory
A RECORD file is added inside the .egg-info directory at installation time when installing a source distribution using the install command. Notice that when installing a binary distribution created with bdist command or a bdist-based command, the RECORD file will be installed as well since these commands use the install command to create a binary distributions.
The RECORD file holds the list of installed files. These correspond to the files listed by the record option of the install command, and will be generated by default. This allows the implementation of an uninstallation feature, as explained later in this PEP. The install command also provides an option to prevent the RECORD file from being written and this option should be used when creating system packages.
Third-party installation tools also should not overwrite or delete files that are not in a RECORD file without prompting or warning.
This RECORD file is inspired from PEP 262 FILES [2].
The RECORD format
The RECORD file is a CSV file, composed of records, one line per installed file. The csv module is used to read the file, with these options:
- field delimiter : ,
- quoting char : ".
- line terminator : os.linesep (so \r\n or \n)
Each record is composed of three elements.
- the file's full path
- if the installed file is located in the directory where the .egg-info directory of the package is located, it's a '/'-separated relative path, no matter what the target system is. This makes this information cross-compatible and allows simple installations to be relocatable.
- if the installed file is located under sys.prefix or sys.exec_prefix`, it's a it's a '/'-separated relative path prefixed by the $PREFIX or the $EXEC_PREFIX string. The install command decides which prefix to use depending on the files. For instance if it's an executable script defined in the scripts option of the setup script, $EXEC_PREFIX will be used. If install doesn't know which prefix to use, $PREFIX is preferred.
- the MD5 hash of the file, encoded in hex. Notice that pyc and pyo generated files don't have any hash because they are automatically produced from py files. So checking the hash of the corresponding py file is enough to decide if the file and its associated pyc or pyo files have changed.
- the file's size in bytes
The csv module is used to generate this file, so the field separator is ",". Any "," characters found within a field is escaped automatically by csv.
When the file is read, the U option is used so the universal newline support (see PEP 278 [9]) is activated, avoiding any trouble reading a file produced on a platform that uses a different new line terminator.
Example
Back to our docutils example, we now have:
- docutils/
- roman.py
- docutils-0.5-py2.6.egg-info/
PKG-INFO
RECORD
And the RECORD file contains (extract):
docutils/__init__.py,b690274f621402dda63bf11ba5373bf2,9544 docutils/core.py,9c4b84aff68aa55f2e9bf70481b94333,66188 roman.py,a4b84aff68aa55f2e9bf70481b943D3,234 $EXEC_PREFIX/bin/rst2html.py,a4b84aff68aa55f2e9bf70481b943D3,234 docutils-0.5-py2.6.egg-info/PKG-INFO,6fe57de576d749536082d8e205b77748,195 docutils-0.5-py2.6.egg-info/RECORD
Notice that:
- the RECORD file can't contain a hash of itself and is just mentioned here
- docutils and docutils-0.5-py2.6.egg-info are located in site-packages so the file paths are relative to it.
Adding an INSTALLER file in the .egg-info directory
The install command has a new option called installer. This option is the name of the tool used to invoke the installation. It's an normalized lower-case string matching [a-z0-9_-.].
$ python setup.py install --installer=pkg-system
It defaults to distutils if not provided.
When a distribution is installed, the INSTALLER file is generated in the .egg-info directory with this value, to keep track of who installed the distribution. The file is a single-line text file.
Adding a REQUESTED file in the .egg-info directory
Some install tools automatically detect unfulfilled dependencies and install them. In these cases, it is useful to track which distributions were installed purely as a dependency, so if their dependent distribution is later uninstalled, the user can be alerted to the orphaned dependency.
If a distribution is installed by direct user request (the usual case), a file REQUESTED is added to the .egg-info directory of the installed distribution. The REQUESTED file may be empty, or may contain a marker comment line beginning with the "#" character.
If an install tool installs a distribution automatically, as a dependency of another distribution, the REQUESTED file should not be created.
The install command of distutils by default creates the REQUESTED file. It accepts --requested and --no-requested options to explicitly specify whether the file is created.
If a package that was already installed on the system as a dependency is later installed by name, the distutils install command will create the REQUESTED file in the .egg-info directory of the existing installation.
New APIs in pkgutil
To use the .egg-info directory content, we need to add in the standard library a set of APIs. The best place to put these APIs is pkgutil.
Query functions
The new functions added in the pkgutil are :
get_distributions() -> iterator of Distribution instances.
Provides an iterator that looks for .egg-info directories in sys.path and returns Distribution instances for each one of them.
get_distribution(name) -> Distribution or None.
Scans all elements in sys.path and looks for all directories ending with .egg-info. Returns a Distribution corresponding to the .egg-info directory that contains a PKG-INFO that matches name for the name metadata.
Notice that there should be at most one result. The first result founded is returned. If the directory is not found, returns None.
get_file_users(path) -> iterator of Distribution instances.
Iterates over all distributions to find out which distributions uses path. path can be a local absolute path or a relative '/'-separated path.
Distribution class
A new class called Distribution is created with the path of the .egg-info directory provided to the constructor. It reads the metadata contained in PKG-INFO when it is instanciated.
Distribution(path) -> instance
Creates a Distribution instance for the given path.
Distribution provides the following attributes:
- name: The name of the distribution.
- metadata: A DistributionMetadata instance loaded with the distribution's PKG-INFO file.
- requested: A boolean that indicates whether the REQUESTED metadata file is present (in other words, whether the package was installed by user request).
And following methods:
get_installed_files(local=False) -> iterator of (path, md5, size)
Iterates over the RECORD entries and return a tuple (path, md5, size) for each line. If local is True, the path is transformed into a local absolute path. Otherwise the raw value from RECORD is returned.
A local absolute path is an absolute path in which occurrences of '/' have been replaced by the system separator given by os.sep.
uses(path) -> Boolean
Returns True if path is listed in RECORD. path can be a local absolute path or a relative '/'-separated path.
get_egginfo_file(path, binary=False) -> file object
Returns a file located under the .egg-info directory.
Returns a file instance for the file pointed by path.
path has to be a '/'-separated path relative to the .egg-info directory or an absolute path.
If path is an absolute path and doesn't start with the .egg-info directory path, a DistutilsError is raised.
If binary is True, opens the file in read-only binary mode (rb), otherwise opens it in read-only mode (r).
get_egginfo_files(local=False) -> iterator of paths
Iterates over the RECORD entries and return paths for each line if the path is pointing a file located in the .egg-info directory or one of its subdirectory.
If local is True, each path is transformed into a local absolute path. Otherwise the raw value from RECORD is returned.
Notice that the API is organized in five classes that work with directories and Zip files (so it works with files included in Zip files, see PEP 273 for more details [8]). These classes are described in the documentation of the prototype implementation for interested readers [12].
Usage example
Let's use some of the new APIs with our docutils example:
>>> from pkgutil import get_distribution, get_file_users
>>> dist = get_distribution('docutils')
>>> dist.name
'docutils'
>>> dist.metadata.version
'0.5'
>>> for path, hash, size in dist.get_installed_files()::
... print '%s %s %d' % (path, hash, size)
...
docutils/__init__.py b690274f621402dda63bf11ba5373bf2 9544
docutils/core.py 9c4b84aff68aa55f2e9bf70481b94333 66188
roman.py a4b84aff68aa55f2e9bf70481b943D3 234
/usr/local/bin/rst2html.py a4b84aff68aa55f2e9bf70481b943D3 234
docutils-0.5-py2.6.egg-info/PKG-INFO 6fe57de576d749536082d8e205b77748 195
docutils-0.5-py2.6.egg-info/RECORD None None
>>> dist.uses('docutils/core.py')
True
>>> dist.uses('/usr/local/bin/rst2html.py')
True
>>> dist.get_egginfo_file('PKG-INFO')
<open file at ...>
>>> dist.requested
True
PEP 262 replacement
In the past an attempt was made to create a installation database (see PEP 262 [2]).
Extract from PEP 262 Requirements:
" We need a way to figure out what distributions, and what versions of those distributions, are installed on a system..."
Since the APIs proposed in the current PEP provide everything needed to meet this requirement, PEP 376 replaces PEP 262 and becomes the official installation database standard.
The new version of PEP 345 (XXX work in progress) extends the Metadata standard and fullfills the requirements described in PEP 262, like the REQUIRES section.
Adding an Uninstall function
Distutils already provides a very basic way to install a distribution, which is running the install command over the setup.py script of the distribution.
Distutils will provide a very basic uninstall function, that is added in distutils.util and takes the name of the distribution to uninstall as its argument. uninstall uses the APIs described earlier and remove all unique files, as long as their hash didn't change. Then it removes empty directories left behind.
uninstall returns a list of uninstalled files:
>>> from distutils.util import uninstall
>>> uninstall('docutils')
['/opt/local/lib/python2.6/site-packages/docutils/core.py',
...
'/opt/local/lib/python2.6/site-packages/docutils/__init__.py']
If the distribution is not found, a DistutilsUninstallError is be raised.
Filtering
To make it a reference API for third-party projects that wish to control how uninstall works, a second callable argument can be used. It's called for each file that is removed. If the callable returns True, the file is removed. If it returns False, it's left alone.
Examples:
>>> def _remove_and_log(path):
... logging.info('Removing %s' % path)
... return True
...
>>> uninstall('docutils', _remove_and_log)
>>> def _dry_run(path):
... logging.info('Removing %s (dry run)' % path)
... return False
...
>>> uninstall('docutils', _dry_run)
Of course, a third-party tool can use pkgutil APIs to implement its own uninstall feature.
Installer marker
As explained earlier in this PEP, the install command adds an INSTALLER file in the .egg-info directory with the name of the installer.
To avoid removing distributions that where installed by another packaging system, the uninstall function takes an extra argument installer which default to distutils.
When called, uninstall controls that the INSTALLER file matches this argument. If not, it raises a DistutilsUninstallError:
>>> uninstall('docutils')
Traceback (most recent call last):
...
DistutilsUninstallError: docutils was installed by 'cool-pkg-manager'
>>> uninstall('docutils', installer='cool-pkg-manager')
This allows a third-party application to use the uninstall function and strongly suggest that no other program remove a distribution it has previously installed. This is useful when a third-party program that relies on Distutils APIs does extra steps on the system at installation time, it has to undo at uninstallation time.
Adding an Uninstall script
An uninstall script is added in Distutils. and is used like this:
$ python -m distutils.uninstall packagename
Notice that script doesn't control if the removal of a distribution breaks another distribution. Although it makes sure that all the files it removes are not used by any other distribution, by using the uninstall function.
Also note that this uninstall script pays no attention to the REQUESTED metadata; that is provided only for use by external tools to provide more advanced dependency management.
Backward compatibility and roadmap
These changes don't introduce any compatibility problems with the previous version of Distutils, and will also work with existing third-party tools.
The plan is to include the functionality outlined in this PEP in distutils for Python 2.7 and Python 3.2. A backport of the new distutils for 2.5, 2.6, 3.0 and 3.1 is provided so people can benefit from these new features.
Distributions installed using existing, pre-standardization formats do not have the necessary metadata available for the new API, and thus will be ignored. Third-party tools may of course to continue to support previous formats in addition to the new format, in order to ease the transition.
References
| [1] | http://docs.python.org/distutils |
| [2] | (1, 2) http://www.python.org/dev/peps/pep-0262 |
| [3] | http://www.python.org/dev/peps/pep-0314 |
| [4] | http://peak.telecommunity.com/DevCenter/setuptools |
| [5] | http://peak.telecommunity.com/DevCenter/EasyInstall |
| [6] | http://pypi.python.org/pypi/pip |
| [7] | http://peak.telecommunity.com/DevCenter/EggFormats |
| [8] | http://www.python.org/dev/peps/pep-0273 |
| [9] | http://www.python.org/dev/peps/pep-0278 |
| [10] | http://fedoraproject.org/wiki/Packaging/Python/Eggs#Providing_Eggs_using_Setuptools |
| [11] | http://wiki.debian.org/DebianPython/NewPolicy |
| [12] | http://bitbucket.org/tarek/pep376/ |
Acknowledgements
Jim Fulton, Ian Bicking, Phillip Eby, and many people at Pycon and Distutils-SIG.
Copyright
This document has been placed in the public domain.
