[Distutils] A Modest Proposal for "A Database of Installed Packages"

Alexander Michael lxander.m at gmail.com
Fri Mar 28 16:02:19 CET 2008


I'll continue my fool hearty effort [1] to build a concrete proposal
for "a database of installed packages" by offering up a sketch of a
possible straw-man "solution". I realize that this is likely
oversimplified to a fault, but I hope it will help us move forward.
Apologies if the equivalent of this has been proposed and rejected
before. My proposal is basically to make PKG-INFO functional and
usable by:

* Fixing the technical issues with requirements (i.e. dependencies)
and naming as specified by PEP 314/345.

* Modifying distutils to install PKG-INFO alongside each module file
or package directory as a side-car file of the same name but with a
special extension (.pyi or whatever). These files would be the place
to include the optional list of installed files as well as the
optional md5sums, if desired by the installer. Files in the package
will be listed using relative paths, while far flung files (bin,
shared, etc) will get full paths so that there is full allowance for
relocating simple (nothing in bin or shared) modules and packages.
Although optional, "python setup.py install" will include the
installed file list by default.

That's it. The intent is to provide just enough information to allow
the development of tools to use it, for those that are interested,
while being minimally invasive to developers that are not interested
in such tools. To determine the current state of your python
environment, walk sys.path looking for modules and packages,
collecting PKG-INFO when available. No standard centralized database.
Some of us will choose to opt-in to a particular installation
management tool that might maintain a cache (centralized or
per-directory) for efficiency, but that would be considered a
performance optimization for that particular tool.

We can also bootstrap older python installations by creating an online
database (that can of course be downloaded by security conscious
individuals for offline querying) that maps
(module-file/package-directory name, md5sum) pairs to their respective
PKG-INFO contents (no list of installed files) which can be queried by
an automated sys.path walker to fill-in missing side-car files. Thus,
I can opt-in to this scheme for python 2.5 by installing a distutils
patch and meta-data side-car bootstrapper that does its best to
identify what's on my sys.path. It would be quite tractable to
maintain this for the python standard library and perhaps the official
installations of a few major OS versions. Such a database could even
be used for the community to provide metadata for packages that the
developer didn't (again, furthering an opt-in mentality). Of course,
even though it worked for CDDB, it would likely be too much to expect
this level of coverage through user submitted entries.

[1] http://mail.python.org/pipermail/distutils-sig/2008-March/009108.html


More information about the Distutils-SIG mailing list