[Distutils] A Modest Proposal for "A Database of Installed Packages"

Phillip J. Eby pje at telecommunity.com
Sun Apr 6 01:50:19 CEST 2008


At 10:07 PM 4/5/2008 +0100, Floris Bruynooghe wrote:
>This proposal has been here about a week now, with no comments on it.
>I take that as positive as no one has had major objections.  :-)

It's more that there are some holes and handwaving; I haven't really 
had the mental bandwidth to offer comments on the original proposal as yet.

(One comment, though: I really don't like the idea of extending 
PKG-INFO to include installation data; it's only incidentally related 
and there are other contexts in which we use PKG-INFO where having 
that data included would make no sense.  Plus, it's really not an 
ideal file format for including data about a potentially rather large 
number of files.)


>Secondly I'm not sure how
>useful it is for the version number to be encoded in the filename.

It's very useful for setuptools, as it avoids the need to open and 
parse the file when searching for a suitable version of a desired package.


>It seems the .egg-info file does get installed in the site-packages
>root currently.  This will likely give conflicts when we're starting
>to use namespace packages.

This doesn't make sense.  Namespace packages and project names are 
not in the same namespace and have nothing to do with each 
other.  For example, I have a project called DecoratorTools that 
installs a module in the peak.util namespace package.  Its egg-info 
would be something like DecoratorTools-1.6.egg-info.  So I think you 
are confused about something here.


>  We can't put the .pyi *in* the package
>since then we lose support for simple modules, so we have to place it
>*next* to the package.

No, it just goes to the --install-lib directory, which in the default 
case is site-packages.  (But could be a PYTHONPATH or other directory.)

>   So if "bar" is a namespace package inside
>"foo" then we would have:
>
>site-packages/foo/bar.pyi
>site-packages/foo/bar/__init__.py

Ah, I see... you are definitely confusing package names and project names.


>This means any package tool will need to recursively scan the
>site-packages directory to find the files, but that doesn't seem like
>to much a penalty?  The alternative is to have a separate directory
>for the intalldb files:
>
>site-packages/foo/bar/__init__.py
>site-packages/install.db/foo/bar.pyi
>
>This could significantly reduce the scanning time since there are far
>fewer files too walk.  I chose a name with a "." for install.db so
>we're not stealing a possible module or package name.  Other then that
>the name of the directory can by anything we manage to agree on.  :-)
>Using this approach might create confusion about relative paths
>mentioned in .pyi files though (is the root the current direcotry or
>do we pretend the .pyi was actually next to the package/module?).
>
>Distribution not providing a package/module or with a different
>distribution name then the package(s)/module(s) provided would end up
>in the top-level of the database (in both scenarios), effectively
>stealing package/module names but that seems to be the current
>behaviour of distutils already anyway.  Namespace sub-distributions
>(bar in the example above) with a different distribution name as
>package/module name would steal names from it's namespace.

All of this is moot, since project/distribution names are unrelated 
to package names.


>Namespace packages are not fully handled yet, ...
>
>AFAIK this should cover namespace packages.

Unfortunately, this doesn't fix the problem, since either *some* 
package has to own the __init__.py, or there has to be a way for 
Python to treat the directory as a package without one.  And for 
system package managers (esp. on Linux), some *one* system package 
must own the file - it can't be owned by multiple system packages.

My guess is that this is true, *even if* the file is automatically 
generated.  Some system packaging folks will need to chime in here.


>Lastly --and I'm not sure how happy I'm about this, should have
>thought of this earlier-- the python packaging tools need to support
>giving away ownership at install time!  Since Debian and Redhat etc
>just call setup.py that would mean each package they install would be
>owned by distutils/setuptools/...  That's bad.
>
>I propose that setup.py needs to honour an environment variable:
>PYI_OWNER so that distros can set this to their custom name (dpkg,
>rpm, ...).

A command-line option to 'install' that's inherited by 
'install_egg_info' would handle this; I don't think an environment 
variable is a good idea for this -- too implicit.  Note that 
bdist_rpm, for example, would need to encode this as a command-line 
option in the .spec file, anyway.



>Phew, thanks for reading this far!  I hope this is useful, if it is we
>should probably start writing the text for the new PEP262 on a wiki
>somewhere while we discus details.

The major issues at the moment are that 1) your spec is confused 
about packages vs. projects or distributions (and thus needs to be 
revamped with that in mind), and 2) PKG-INFO is a really lousy place 
to put this, from a formatting perspective.  It's one thing to 
include the PKG-INFO in the install DB, and another thing entirely to 
include the install db into the PKG-INFO!  I think PEP 262 had the 
right idea, even though I'm not overjoyed by its proposed format, either.



More information about the Distutils-SIG mailing list