[Distutils] thoughts on distutils 1 & 2

Mark W. Alexander slash at dotnetslash.net
Fri May 14 12:20:41 EDT 2004


On Fri, May 14, 2004 at 03:16:31PM +0100, has wrote:
> Recommend:
> 
> - Before adding new features/complexity, refactor current _design_ to 
> simplify it as much as possible. Philosophy here is much more 
> hands-off than DU1; less is more; power and flexibility through 
> simplicity: make others (filesystem, generic tools, etc.) do as much 
> of the work as possible; don't create dependencies.

I'd translate this to "an implementation that effectively manages
package metadata". If the metadata is consistently represented,
translating to differing build, install and packaging modules becomes
simpler and more straight forward. The idea of different setup.cfg
sections for different bdist commands needs to die. Every bdist command
should be able to process effectively off the same set of metadata.

> -- e.g. c.f. Typical OS X application installation procedure (mount 
> disk image and copy single application package to Applications 
> folder; no special tools/actions required) versus typical Windows 
> installation procedure (run InstallShield to put lots of bits into 
> various locations, update Registry, etc.) or typical Unix 
> installation procedure (build everything from source, then move into 
> location). Avoiding overreliance on rigid semi-complex procedures 
> will allow DU2 to scale down very well and provide more flexibility 
> in how it scales up.

Distutils cannot dictate to any platform how packages are to be
installed. It needs to go the other way. Distutils needs to meet the
platforms' requirements for packaging.

I know perl-mongers (and some pythoneers as well) hold up CPAN as a
shining example of how things should work, but I shoot any of my admins
that use plain CPAN for installation. It doesn't register with the
platform's native software manager and, therefore, completely destroys
any attempt to manage system inventory information. No big deal on a
system or two, maybe, but at 3:00 in the morning, with an hour outage to
move an application from one system to another, suddenly discovering
that you didn't prepare the destination host with 37 required CPAN (or
Distutils) modules because there was no record of them ever having been
installed really, really, sucks. IMHO, if Distutils leverages native
packager bdist commands for as many platforms as possible, it will kick
CPAN's behind in the enterprise enviroment.

I, for one, do not want to visit hundreds of  machines to "python
setup.py install", even if it downloads it for me. I don't need or want
a full development environment on production machines. I _need_ a
package once (per platform), install -> hundreds of boxes AND I need
those hundreds of boxes to be able to tell me exactly what is installed
on them for disaster recovery and application portability.

> - Eliminate DU1's "Swiss Army" tendencies. Separate the build, 
> install and register procedures for higher cohesion and lower 
> coupling. This will make it much easier to refactor design of each in 
> turn.

I think DU1 should be viewed in a "lessons learned" context. What needs
to be done is to document the internal APIs, extrapolate what those APIs
should really be now that there's some real-life experience available,
then refactor the APIs to be more representative of actual use.

> - Every Python module should be distributed, managed and used as a 
> single folder containing ALL resources relating to that module: 
> sub-modules, extensions, documentation (bundled, generated, etc.), 
> tests, examples, etc. (Note: this can be done without affecting 
> backwards-compatibility, which is important.) Similar idea to OS X's 
> package scheme, where all resources for [e.g.] an application are 
> bundled in a single folder, but less formal (no need to hide package 
> contents from user).

Maybe for modules, but this is patently false for applications.
Applications have configuration files and data that may be different for
different instances or users of the application on the same box. Dumping
all that into site-packages is impractical for both usability principles
(separation of programs and data) and space considerations (how do I
size /usr if /usr/lib/pythonx.x/site-packages/package is going to
contain variable user data?). This begs the questions "Should Distutils
handle modules and applications the same?" and "Does it make more sense
to package modules one way and applications another?" I don't know.

> - Question: is there any reason why modules should not be installable 
> via simple drag-n-drop (GUI) or mv (CLI)? A standard policy of "the 
> package IS the module" (see above) would allow a good chunk of both 
> existing and proposed DU "features" to be gotten rid of completely 
> without any loss of "functionality", greatly simplifying both build 
> and install procedures.

Yes, because that's not how all platforms manage their software
inventory. Servers may not (most, in fact, _should_ not) have GUI
capable resources installed on them. "mv" (or "cp" or any other command)
is not a software management tool. I use Distutils because it helps be
manage my software inventory. If all I wanted was to be able to make
something run on a machine, I'd just build a tarball and drop it in
where I needed it. The problem is that once I've done that, I don't have
any record of the fact that I needed it on any particular machine and I
have lost the ability to easily replicate a machine's environment on
another machine, be it for failover or upgrades or for a test/QA
environment.

> --Replace current system where user must explicitly state what they 
> want included with one where user need only state what they want 
> excluded. Simpler and less error-prone; fits better with user 
> expectations (meeting the most common requirement should require 
> least amount of work, ideally none). Manifest system would no longer 
> be needed (good riddance). Most distributions could be created simply 
> by zipping/tar.gzipping the module folder and all its contents, minus 
> any .pyc and [for source-only extension distributions] .so files.

This I agree with. I recently did a quick setup.py on something (can't
remember what) and key .xml files where ignored. If it's in the source
tree, it's probably needed for something. If it's not, then I should be
able to exclude it, but I'd rather be sure everything necessary "gets
there" by default. There are still a lot of python modules available
that don't have setup.py included. It would be far easier for a 3rd
party to submit Distutils enabling setup.py "patches" if they didn't
have to do a complete code analysis to determine what's necessary and
what's not.

> -- In particular, removing most DU involvment from build procedures 
> would allow developers to use their own development/build systems 
> much more easily.

I may be on the fence on this one. Since mastering simple setup.py
formats, I'm not sure I remember how to manually build a C extension
anymore ;) Fro my perspective, DU1 has got what it takes to support a
newbie's foray into building extensions and I'd hate to see that get
lost. A cleaner API and documentation, however, would support better
integration of Distutils _into_ alternative build systems.

> - Installation and compilation should be separate procedures. Python 
> already compiles .py files to .pyc on demand; is there any reason why 
> .c/.so files couldn't be treated the same? Have a standard 'src' 
> folder containing source files, and have Python's module mechanism 
> look in/for that as part of its search operation when looking for a 
> missing module; c.f. Python's automatic rebuilding of .pyc files from 
> .py files when former isn't found. (Q. How would this folder's 
> contents need to be represented to Python?)

They already are. "python setup.py build" and "python setup.py install"
can be done on separate machines (of the same architecture). My argument
continues to be that "python setup.py install" is not a software
mangement tool. "python setup.py bdist_whatever" produces the
installation packages I need to support the "whatever" architecture with
a build once, install anywhere and -- most importantly -- effectively
manage the software configuration of N hosts of "whatever" architecture.

bdist_* is the true value of Distutils. Without the bdist commands,
Distutils does nothing more effectively than `./configure && make &&
make install`.

> - What else may setup.py scripts do apart from install modules (2) 
> and build extensions (3)?

Install and optionally upgrade configuration files. Build native
packages including preinstall, preremove, postinstall, postremove and
reconfiguration scripts. Dynamically relocate packages to the target
host's python installation directory. These are all features of the
bdist_command capability that I think you're missing.

> -- Most packages should not require a setup.py script to install. 
> Users can, of course, employ their own generic shell 
> script/executable to [e.g.] unzip downloaded packages and mv them to 
> their site-packages folder.

They can do that today. Where's the "value added?" Maybe setup.py could
be replaced by something that scans a directory for __init__.py files
and make some deductions, and maybe that's a good thing for simple
packages. Personally, I'd like to see DU2 go further in supporting the
simple production of multiple binary packages from a single source tree.
Marc's egenix packages come to mind as a good example of the type of
thing that it would be really nice to NOT have to sub-class a bunch of
Distutils classes in a mega-setup.py script in order to produce a set of
related, but not necessarily interdependent, packages.

> -- Extensions distributed as source will presumably require some kind 
> of setup script in 'src' folder. Would this need to be a dedicated 
> Python script or would something like a standard makefile be 
> sufficient?

My belief is that Distutils role is to reduce the need to distribute
extenstions as source. One, good, Distutils configuration by a Python
for Windows developer should simply repackage (absent win32
dependencies) for any supported platform simply by changing "python
setup.py wininst" to "python setup.py myplatform". This drives back to
my focus on getting the meta-data right and in a consistent format
regardless of the original development platform or initially perceived
target.

> -- Build operations should be handled by separate dedicated scripts 
> when necessary. Most packages should only require a generic shell 
> script/executable to zip up package folder and its entire contents 
> (minus .pyc and, optionally, .so files).

Again, this provides no software configuration management. What, then,
do I gain by using Distutils at all?

> - Remove metadata from setup.py and modules. All metadata should 
> appear in a single location: meta.txt file included in every package 
> folder. Use a single metadata scheme in simple structured nested 
> machine-readable plaintext format (modified Trove); example:

Isn't that what setup.cfg is? I agree that there shouldn't be redundancy
between what can go in setup.py and what can go in setup.cfg. 
> ------------------------------------------------------------------
> Name
> 	roundup
[snip]

There's a whole slew of meta-data required for native packagers that's
required for software configuration management. Most (iirc) are
addressed in various PEP's, although it would be good to revisit the
fields and establish a matrix between Distutils meta-data fields and
various native package manager fields to make sure all posibilities are
covered.

> - Improve version control. Junk current "operators" scheme (=, 
> <, >, >=, <=) as both unnecessarily complex and inadequate (i.e. 
> stating module X requires module Y (>= 1.0) is useless in practice as 
> it's impossible to predict _future_ compatibility). Metadata should 
> support 'Backwards Compatibility' (optional) value indicating 
> earliest version of the module that current version is 
> backwards-compatible with. Dependencies list should declare name and 
> version of each required package (specifically, the version used as 
> package was developed and released). Version control system can then 
> use both values to determine compatibility. Example: if module X is 
> at v1.0 and is backwards-compatible to v0.5, then if module Y lists 
> module X v0.8 as a dependency then X 1.0 will be deemed acceptable, 
> whereas if module Z lists X 0.4.5 as a dependency then X 1.0 will be 
> deemed unacceptable and system should start looking for an older 
> version of X.

This is more appropriately addressed in the context of what/how native
package managers support version control. A point implicit in this
discussion, however, is that a package name registry is required. Unless
package names are registered with some authority, you can have multiple
packages of the same name which shoves a huge bone done the throat of
dependency resolution.

> - Make it easier to have multiple installed versions of a module. 
> Ideally this would require including both name and version in each 
> module name so that multiple modules may coexist in same 
> site-packages folder. Note that this naming scheme would require 
> alterations to Python's module import mechanism and would not be 
> directly compatible with older Python versions (users could still use 
> modules with older Pythons, but would need to strip version from 
> module name when installing).

This is conditional on support of the platform's native package manager.
Some support multiple installations and some do not. Most can be dealt
with in any case with package named fudging and intelligent install
scripts.

I don't see a need for multiple instances in the same site-packages,
however. Futzing with the import mechanism would be fixing something
that ain't broke. Installing to an alternate path and optionally having
postinstall scripts update site.py or requiring the user modify
PYTHONPATH is adequate. I do this on HP-UX, which supports multiple
installs, in different locations, of the same binary package, which
allows users to install into their own target python library. When an
alternate path is selected, the installer spits out all the necessary
steps required to make use of the alternate path.

> - Reject PEP 262 (installed packages database). Complex, fragile, 
> duplication of information, single point of failure reminiscent of 
> Windows Registry. Exploit the filesystem instead - any info a 
> separate db system would provide should already be available from 
> each module's metadata.

I agree with rejecting 262 as well, but not in favor of the filesystem
but in favor of the native platform tools via bdist support. Solaris
people use pkgtools for everything. RH and friends use RPM. HP people
use SD-UX. Debianites use dpkg. etc. etc. etc.... God help those of us
supporting multiple platforms.

In each case, absent [expletive deleted] commercial package installs,
all software and configuration management is consistent and, more
importantly, effective. Anything on top of that; CPAN, Distutils,
PEP 262, rogue admins with tarballs; _anything_ at all and people who
have to deal with anything over a handfull of machines WILL eventually
get caught with their pants down. 

If Distutils does not support simple, native package manager
integration, then it ceases to be a solution and becomes just one more
problem. A successfull implementation that creates native packages gets
immediate support from apt, yum, yast, urpmi, pkg-get, swinstall and
anything else, now and in the future.

/me steps off the soapbox

mwa
-- 
Mark W. Alexander
slash at dotnetslash.net

The contents of this message authored by Mark W. Alexander are
released under the Creative Commons Attribution-NonCommercial license.
Copyright of quoted materials are retained by the original author(s).

http://creativecommons.org/licenses/by-nc/1.0/



More information about the Distutils-SIG mailing list