[Distutils] A Modest Proposal for "A Database of Installed Packages"

Floris Bruynooghe floris.bruynooghe at gmail.com
Sun Apr 6 03:18:42 CEST 2008


On Sat, Apr 05, 2008 at 07:50:19PM -0400, Phillip J. Eby wrote:
> At 10:07 PM 4/5/2008 +0100, Floris Bruynooghe wrote:
> (One comment, though: I really don't like the idea of extending PKG-INFO 
> to include installation data; it's only incidentally related and there 
> are other contexts in which we use PKG-INFO where having that data 
> included would make no sense.  Plus, it's really not an ideal file format 
> for including data about a potentially rather large number of files.)

That's fair.  Blowing up the files with the PKG-INFO information in
could have bad performance effects.  rfc822 in the stdlib reads
everything in memory AFAIK.

>> Secondly I'm not sure how
>> useful it is for the version number to be encoded in the filename.
>
> It's very useful for setuptools, as it avoids the need to open and parse 
> the file when searching for a suitable version of a desired package.

Hmm, it's not that much work to read the contents of a .egg-info.
Just seems odd to me to have this info in two places so close to each
other.

[...]
> All of this is moot, since project/distribution names are unrelated to 
> package names.

So this means there is a flat namespace for all project names and
nested namespace for modules.  When I was saying that project names
"steal" names from modules that is because they end up in the same
directory.  I.e. project "foo" with foo_1.0.egg-info provides module
"bar", while project "bar" with bar_1.0.egg-info provides module
"bar2".  Not ideal.

What I was trying to get at was to prefix project names that provide a
sub-module for a namespace with the namespace module name.  Inside the
hypothetical installdb that is.  But maybe that makes the whole
project namespace vs modules namespace just more confusing (thinking
of it definatly a bad idea if the project of the sub-package also
installs a script or so).

The second part was introducing a "virtual project" for pure namespace
packages, where the project name would have to be the same as the
package name in order to find it.

>> AFAIK this should cover namespace packages.
>
> Unfortunately, this doesn't fix the problem, since either *some* package 
> has to own the __init__.py, or there has to be a way for Python to treat 
> the directory as a package without one.  And for system package managers 
> (esp. on Linux), some *one* system package must own the file - it can't 
> be owned by multiple system packages.

With the format I suggested a package tool could detect on install if
a required pure namespace package was already installed or still
needed to be installed/created.  Similar on removal it is possible to
detect if the pure namespace package is still required (by checking if
it's directory contains any other files then those provided by the
namespace package) on removal of a sub-package.

> My guess is that this is true, *even if* the file is automatically  
> generated.  Some system packaging folks will need to chime in here.

System packagers would create 2 packages out of a package requiring a
namespace package.  One the pure namespace, the other with the
sub-package.  Other sub-packages then just need to depend on the pure
namespace one.

>> Lastly --and I'm not sure how happy I'm about this, should have
>> thought of this earlier-- the python packaging tools need to support
>> giving away ownership at install time!  Since Debian and Redhat etc
>> just call setup.py that would mean each package they install would be
>> owned by distutils/setuptools/...  That's bad.
>>
>> I propose that setup.py needs to honour an environment variable:
>> PYI_OWNER so that distros can set this to their custom name (dpkg,
>> rpm, ...).
>
> A command-line option to 'install' that's inherited by  
> 'install_egg_info' would handle this; I don't think an environment  
> variable is a good idea for this -- too implicit.  Note that bdist_rpm, 
> for example, would need to encode this as a command-line option in the 
> .spec file, anyway.

I picked an environment variable here because then it would be
possible to call setup.py identical whether or not it provides this
new installdb.  Providing a non-existing command line option tends to
cause more problems.


>> Phew, thanks for reading this far!  I hope this is useful, if it is we
>> should probably start writing the text for the new PEP262 on a wiki
>> somewhere while we discus details.
>
> The major issues at the moment are that 1) your spec is confused about 
> packages vs. projects or distributions (and thus needs to be revamped 
> with that in mind),

See clarification above.

> and 2) PKG-INFO is a really lousy place to put this, 
> from a formatting perspective.  It's one thing to include the PKG-INFO in 
> the install DB, and another thing entirely to include the install db into 
> the PKG-INFO!  I think PEP 262 had the right idea, even though I'm not 
> overjoyed by its proposed format, either.

Not wanting to blow up PKG-INFO is laudable, but OTOH separating out
the data is dubious as is replicating data (PKG-INFO data in .egg-info
AND the installdb).  PKG-INFO was just simple as it's there and tools
can use it already.

Maybe we're making it too hard by wanting to cover *every* file
installed by python projects?  The main reason for this installdb, as
I understand it, is so that a package tool can install a sub-project
in a namespace package installed by someone else.  And similarly that
someone else doesn't wipe away the sub-package when it thinks it can
remove the namespace package.  Ah, this make me think of the people
that complain on comp.lang.python that Python namespaces are too
tightly bound to files and directories...  It all makes sense now, we
wouldn't even be having this discussion if a package could declare
it's namespace in the code!  ;-)


Regards
Floris

-- 
Debian GNU/Linux -- The Power of Freedom
www.debian.org | www.gnu.org | www.kernel.org


More information about the Distutils-SIG mailing list