[Distutils] A Modest Proposal for "A Database of Installed Packages"

Floris Bruynooghe floris.bruynooghe at gmail.com
Sat Apr 5 23:07:49 CEST 2008


Hello

On Fri, Mar 28, 2008 at 11:02:19AM -0400, Alexander Michael wrote:
> I'll continue my fool hearty effort [1] to build a concrete proposal
> for "a database of installed packages" by offering up a sketch of a
> possible straw-man "solution". I realize that this is likely
> oversimplified to a fault, but I hope it will help us move forward.
> Apologies if the equivalent of this has been proposed and rejected
> before. My proposal is basically to make PKG-INFO functional and
> usable by:
> 
> * Fixing the technical issues with requirements (i.e. dependencies)
> and naming as specified by PEP 314/345.
> 
> * Modifying distutils to install PKG-INFO alongside each module file
> or package directory as a side-car file of the same name but with a
> special extension (.pyi or whatever). These files would be the place
> to include the optional list of installed files as well as the
> optional md5sums, if desired by the installer. Files in the package
> will be listed using relative paths, while far flung files (bin,
> shared, etc) will get full paths so that there is full allowance for
> relocating simple (nothing in bin or shared) modules and packages.
> Although optional, "python setup.py install" will include the
> installed file list by default.
> 
> That's it.

This proposal has been here about a week now, with no comments on it.
I take that as positive as no one has had major objections.  :-)
Personally I think it is a good proposal, it does basically what an
installation database would have to do while being minimally invasive.

The important question is however: Is this enough for setuptools to
work withouth doing all it's path magic?  Would this be a workable
solution for setuptools?


Now my own thoughts about the technicalities (sorry this got long)...

Distutils does create a ${pkgname}-${version}.egg-info file right now
with the PKG-INFO data in.  From earlier discussions it seems the
.egg-info extension is not very loved, so a change to .pyi could be
done (also, it has little to do with eggs).  Secondly I'm not sure how
useful it is for the version number to be encoded in the filename.

It seems the .egg-info file does get installed in the site-packages
root currently.  This will likely give conflicts when we're starting
to use namespace packages.  We can't put the .pyi *in* the package
since then we lose support for simple modules, so we have to place it
*next* to the package.  So if "bar" is a namespace package inside
"foo" then we would have:

site-packages/foo/bar.pyi
site-packages/foo/bar/__init__.py

This means any package tool will need to recursively scan the
site-packages directory to find the files, but that doesn't seem like
to much a penalty?  The alternative is to have a separate directory
for the intalldb files:

site-packages/foo/bar/__init__.py
site-packages/install.db/foo/bar.pyi

This could significantly reduce the scanning time since there are far
fewer files too walk.  I chose a name with a "." for install.db so
we're not stealing a possible module or package name.  Other then that
the name of the directory can by anything we manage to agree on.  :-)
Using this approach might create confusion about relative paths
mentioned in .pyi files though (is the root the current direcotry or
do we pretend the .pyi was actually next to the package/module?).

Distribution not providing a package/module or with a different
distribution name then the package(s)/module(s) provided would end up
in the top-level of the database (in both scenarios), effectively
stealing package/module names but that seems to be the current
behaviour of distutils already anyway.  Namespace sub-distributions
(bar in the example above) with a different distribution name as
package/module name would steal names from it's namespace.


Namespace packages are not fully handled yet, there is still the issue
of who owns site-packages/foo/__init__.py.  That would logically be
defined by site-packages/foo.pyi, but we don't want the user to have
to install yet another package for this.  So for a namespace package
the .pyi could look like this:

Metadata-Version: 1.0
Name: foo
...
Owner: setuptools
Namespace: True
Directory: foo/
File: foo/__init__.py

It might be possible that a namespace package doesn't need an owner so
that a different tool is allowed to clean it up, but I'm not sure
about that.

When "bar" gets uninstalled now it should know if it can clean up it's
namespace "foo" too (if it is empty).  So bar.pyi should have:

Metadata-Version: 1.0
Name: bar
...
Owner: setuptools
Requires-Namespaces: foo
Directory: foo/bar/
File: foo/bar/__init__.py

Here "foo" could also have been a dotted name: "foo.sub.package".  So
the "foo.sub" package would have both the Namespace: and
Requires-Namespace: fields in it's .pyi.

AFAIK this should cover namespace packages.


So the new headers to turn the PKG-INFO into a .pyi would be:

Owner: The owner of this distribution.  This would be any string
    representing the package tool, e.g.: distutils, setuptools,
    zc.buildout, rpm, dpkg, etc.

Provides: Copied from PEP262.  Don't like this in it's original form
    since it's ambiguous.  So this lists the *distributions* provided
    by this package on top of it's native name in the Name: field.
    Optional (and very rare).

Modules: List of packages/modules provided.  If no packages or modules
    are installed this doesn't need to be present.  You could argue
    that this should be called Packages: or so.  Derived from PEP262.

Namespace: The value of this doesn't matter, when present it
    indicates this .pyi file describes a namespace package.

Requires: Copied from PEP262 (also ambiguous).  Optional.
    Distributions that must be installed for this distribution to
    work.

Requires-Modules: Optional list of packages/modules required.  No need
    to list modules in the standard library.  (Figuring out if this
    site-packages tree is of the correct python version is of no use
    for the installdb).  Derived from PEP262.

Requires-Namespaces: This package requires a namespace.  The value is
    a list of dotted names of the namespace packages, as they would
    appear in an import statement.

Directory: A directory from this package.  Relative to this .pyi or
    absolute.  For directories inside site-packages they *should* be
    relative, for outside site-packges they *should* be absolute.

File: The value of this is first an optional MD5 hash (or SHA1?) of
    the file, followed by the path of the installed file (absolute or
    relative, same rules as for Directory: above).  The only
    restriction this makes on a filename is that you can't have a file
    in the current directory that is also a valid hash and does not
    have a hash itself.  You can work around this by prepending the
    filename with ./ however - but why would you want such a file?

The only issue I can think of right now is with File:.  It is not very
extensible if a tool wants too keep track of extra info like file
permissions.  AFAIK RFC822 requires you to keep the order of the
fields, if so we could make this:

File: foo.py
File-MD5: XXXXXXXXXXXX
X-MyTool-File-perms: -rw-rw-r-
File: bar.py
...



Lastly --and I'm not sure how happy I'm about this, should have
thought of this earlier-- the python packaging tools need to support
giving away ownership at install time!  Since Debian and Redhat etc
just call setup.py that would mean each package they install would be
owned by distutils/setuptools/...  That's bad.

I propose that setup.py needs to honour an environment variable:
PYI_OWNER so that distros can set this to their custom name (dpkg,
rpm, ...).  Although I can imagine in Debian's case that it's better to
change the dh_py* tools to go and modify the .pyi files.  So if all
distros are happy with having to modify installed files this might not
be necessary.

Another a nice/required feature for distros would be to ask the tool
to only install the namespace package or omit the namespace packge.
This could just be a command line switch to setup.py I think.  Again
this is not a hard requirement, I can imagine Debian's dh_py* tools to
scan the .pyi files, detect namespace packages and (re)move them as
required.  But once more I don't know enough about other distro's.


Phew, thanks for reading this far!  I hope this is useful, if it is we
should probably start writing the text for the new PEP262 on a wiki
somewhere while we discus details.


Regards
Floris


-- 
Debian GNU/Linux -- The Power of Freedom
www.debian.org | www.gnu.org | www.kernel.org


More information about the Distutils-SIG mailing list