[Distutils] python version information in .egg-info directory name

Phillip J. Eby pje at telecommunity.com
Sat Jul 22 02:15:28 CEST 2006


At 12:08 AM 7/22/2006 +0200, Matthias Klose wrote:
>Phillip J. Eby writes:
> > I read the entire policy you linked to, and I don't actually see many 
> problems.
> >
> > It seems to me that the single largest problem in that policy is that it
> > clearly predates the existence of the distutils.  It has no conception 
> of a
> > Python *project* or *distribution*, only modules and packages.  It's
> > therefore not surprising that it also doesn't encompass such issues as
> > distribution metadata, package data, namespace packages, and the like.  It
> > also explains why the policy is so out-of-sync with e.g. PyPI.  (I 
> hesitate
> > to see what would happen if somebody tries to package any of my Python
> > projects such as SymbolType or ProxyTypes for Debian: they all are modules
> > in the 'peak' package, but each is distributed as a separate project!)
>
>The Python policy is just a sub-part of the Debian Policy [1]; the
>Debian Policy predates PyPi.  You are missing the existing bits about
>i.e. distribution metadata, distributions, etc.
>[1] http://www.us.debian.org/doc/debian-policy/

I'm referring above to Python distribution metadata, not Debian's.  That 
is, the distribution of a Python project, not a Linux distribution.


>I cannot find the term "project" in the distutils documentation. Any pointers?

I use the term "project" to refer to the logical thing of which a distutils 
"distribution" is a physical manifestation.  The distutils documentation 
confusingly uses the word "package" to refer both to what I'm calling a 
"project", and the notion of an individual Python package.

You can tell this by close inspection of the distutils documentation, if 
you notice that there are many places where the configuration of a 
"package" (meaning #1) in fact can list multiple "packages" (meaning #2) 
for inclusion.  (Many Python developers have previously commented on this 
naming ambiguity in the distutils.)

I thus prefer to use (and promote the use of) the word "project" for 
meaning 1, in order to have better communication about what is actually 
going on.  It is intuitive and does not confuse two different notions of 
"package".


>So yes, if peak is a rather complex setup, it might be worthful to
>have it as an example for a Debian package and to identify omissions
>in Debian packaging practices and distutils/setuptools.

This very statement helps to illustrate the impedance mismatch and 
communication difficulty.  You appear to be interpreting my statement that 
e.g. SymbolTypes and ProxyTypes contain modules in the "peak" package 
(meaning #2) to imply that peak is or should be a Debian package (meaning 
#1, or perhaps a new meaning #3!).

But this would be the same as concluding that the various Java projects 
whose packages are contained in the org.apache.* namespace are a single 
"package" in this way, and should thus be combined into a Debian 
"org-apache" package -- split out of their respective jars and forcibly 
recombined!

But as with the org.apache.* prefix, the peak.* prefix is merely a 
namespace that indicates an affiliation and prevents unintentional name 
conflicts.  As the number of distributed Python projects increases (1400+ 
on PyPI of late), this kind of namespace management will become 
increasingly important.  This is where the term "namespace package" comes 
from; it was coined several years ago by Zope to distinguish 
package-as-unit from package-as-namespace.

The latter kind of package is still relatively uncommon, since there are 
relatively few large organizations distributing large projects as split 
distributions.  Unbundling these large projects into smaller pieces is an 
increasing trend, however, as it allows smaller units to be spun off as 
sub-projects, each with its own release management and version 
lifecycle.  PEAK and Zope as monolithic projects might contain some 
elements in alpha or beta releases that, considered in their own right, 
might be worthy of 1.x or 2.x stable version labelling.  Lumping these in 
together with other components doesn't help anybody, but spinning them off 
as separate projects allows them to be reused.

So far, I've spun off seven such packages from the monolithic PEAK 
distribution, and there will be more over time.  These other packages live 
in the peak.* namespace, and the monolithic distribution depends on them, 
but it would not make sense to aggregate them all as one Debian package, 
since other packages may depend directly on them.  SymbolTypes, ProxyTypes, 
and DecoratorTools are all likely to get used in other projects that would 
depend on them directly, but not necessarily require any other part of the 
peak.* namespace.

And this, you see, is why I say that the Debian Python policy is based on a 
limited conceptual framework that doesn't mesh well with a distutils- or 
setuptools-based world.  Mapping "1 Debian package = 1 Python package = 1 
project" is inaccurate, because one project may contain multiple Python 
packages, and a single Python package can be spread across multiple 
projects.  And it's not just PEAK and Zope doing it -- I discovered last 
year that there's an ll.* namespace package out there that uses an 
interesting quirk of the distutils installation system to implement a 
namespace package.  It might actually predate Zope's coining of the term 
"namespace package" for all I know.


>I'm not sure what you mean by the generation of distribution metadata
>and different dependencies.

The PKG-INFO format changed in some Python versions.  The entry points that 
setuptools offers as commands depends on what Python version it's installed 
with.  The dependencies of some projects depend on whether a needed thing 
is now bundled in Python.  For example, there is a standalone "ctypes" 
project, that is bundled in Python 2.5.  A project being installed for 2.5 
would have no reason to declare a dependency on ctypes, and it would be 
entirely reasonable for a setup.py to contain something like:

     deps = []
     if sys.version[:3]<"2.5":
         deps.append("ctypes>=0.9.6")

     ...
     setup(
         ...
         install_requires = deps,
         ...
     )

Similar code might decide to build alternate extensions, etc.  The 
monolithic PEAK distribution for example includes its own version of the 
Python expat module, and builds it to replace Python's if it's being 
installed for Python 2.3 (whose expat interface doesn't include access to 
the current line number during regular parsing callbacks).  (Note: it 
builds this backported extension as peak.util.pyexpat; it doesn't override 
the stdlib-supplied pyexpat!)

Anyway, now that setuptools exists, the right way for me to have handled 
that would be to create a separate project, let's say "pyexpat-backport" 
that provides the 2.4 expat interface for Python 2.3, and then declare that 
as a dependency if I'm installing under Python 2.3 -- a 
Python-version-conditional dependency.


> > These concepts can't be well-understood from the perspective that only
> > modules and packages exist, so until the policy's conceptual underpinning
> > is expanded, it's going to continue to be difficult to squeeze square pegs
> > into the policy's round holes.
>
>agreed, but it cannot be as open as the possibilities of
>distutils/setuptools are.  Python packages (in the Debian sense) still
>have to follow [1] and general decisions made by release management.

You'll have to clue me in as to which meaning of "package" you're using 
here.  I personally try to use the following terms to be unambiguous:

1. "Project" - a thing that somebody distributes
2. "Python package" - something you can actually import!
3. "System package" - something that is installed with a system packaging 
tool, like a .rpm
4. "Distribution" - an embodiment of a particular release of a project

As far as I can tell, Debian terminology conflates some of these 
terms.  And so long as its vocabulary is thus restricted, there will be an 
impedance mismatch at the interface where people try to create tools to 
support mapping #1 and #4 on to Debian's #3.


>Many problems that PyPI and setuptools try to solve are well addressed
>by existing packaging tools for Linux and *BSD distributions.

A similarity in solutions is not the same as similarity in problems.  The 
goals of a system packaging tool and the goals of setuptools are quite 
different, and in some cases may actually be opposed.  :)

Setuptools' fundamental goal is to encourage reuse by lowering the 
transaction cost of depending on another developer's software.  Not merely 
in the sense of lowering *distribution* or *installation* cost, but also 
enhancing the extensibility and interoperability of the projects 
themselves.  Metadata and entry points facilitate creating *platforms* in 
Python, such as the joint TurboGears-CherryPy template plugin API.  That 
API couldn't exist without something like setuptools; system packaging 
tools simply don't play in that space.

Now, furthering setuptools' goals *does* require distribution and 
dependency management... but its "low transaction cost" goal means that it 
requires a *common* nomenclature for referencing projects.  A nomenclature 
that varied from one packaging system to another would not lower 
transaction cost, since it would force a developer to learn the 
ever-changing and mutually incompatible naming conventions of every Linux 
and BSD variant.

The only universal nomenclature available, therefore, was project 
names.  The distutils built distributions using project names, and PyPI 
displayed project names.  Hence, it was and is the right choice for Python 
to identify projects by those names.

Distributions, however, that insist on deconstructing Python projects and 
creating nomenclature with no mapping to PyPI project names, simply create 
a policy barrier between those upstream projects and ready access by their 
users.  It increases the transaction cost for providing software to Debian 
users -- and Debian of course ends up bearing those costs.

The efforts of people like Andrew and Vincenzo to create tools that map 
PyPI projects into Debian packages are therefore in vain; Debian doesn't 
want to decrease transaction cost, which then leaves the tool developers 
confused, since their goal is to further reduce transaction costs.

I myself was initially baffled by this resistance from Debian 
representatives, but now I simply accept it as a fact that Debian's goals 
differ from mine.  I do think it's unfortunate, though, because other 
people seem to keep thinking that they will be able to write a conversion 
tool and solve a sociopolitical/conceptual problem with a technical 
solution.  It just ain't gonna happen.  :)  (I don't mean it's unfortunate 
that Debian has different goals, I just mean it's unfortunate that this 
fact isn't immediately obvious to the people who keep beating their heads 
on this particular wall.  You can't work to lower the impedance between 
PyPI and Debian, and still please Debian policy, because the policy itself 
is the source of the impedance.)


>It would be nice to see setuptools to use this infrastructure where available.

The --single-version-externally-managed option exists so that setuptools 
can get out of system packaging tools' way.  There's also extensive work 
that I did to make namespace packages play well with system packaging tools 
that don't allow more than one system package to provide the same file, 
although this required what some Pythoneers would consider a horrific abuse 
of Python's .pth file system.  These things were done because people doing 
work for Debian asked for them, and if anybody asks nicely for other things 
that I can provide, I'll certainly do so.

However, some things just aren't doable.  I can't, for example, turn back 
the clock seven years and make the distutils go away, or even four years to 
make namespace packages go away, just because Debian policy doesn't grok 
those concepts yet, or refuses to acknowledge their validity.  Even if I 
agreed with Debian on these points (and I don't), the Python community 
voted with its feet years ago, and Guido blessed all of them.  The 
distutils were blessed for the stdlib in what, Python 2.1?  Namespace 
packages were blessed for 2.3 (see the "pkgutil" module docs, although they 
use the term "logical package").  (Guido himself wrote that module, if I 
recall correctly.)  Support for package data (data files found inside 
package directories) was added in 2.4, and .egg-info distribution metadata 
was blessed for 2.5.  From the Python POV, most of this stuff is ancient 
history by now.



More information about the Distutils-SIG mailing list