[Distutils] thoughts on distutils 1 & 2

has hengist.podd at virgin.net
Fri May 14 10:16:31 EDT 2004


Hello List,

Very quiet here, so thought I would toss in some notes I've been 
making regarding Python's module system, the current DistUtils 1.x 
and some of the proposals I've seen for Distutils 2. These notes are 
very rough so I dunno how much sense they'll make to anyone else in 
their current state, but I figure it's better to pitch them in to 
find out if there's any interest in discussing them further than 
spend time polishing them if there isn't.

Let us know what you think, and we can take it from there if folk are 
interested.

Regards,

has

-------

Issues:

http://www.python.org/cgi-bin/moinmoin/DistUtils20 states:

	"The ultimate goal: Must be backwards-compatible with 
existing setup.py scripts."

This is both a red herring and likely recipe for DU2 becoming a big 
ball of mud before it's even out the door...

- Compatibility for existing setup.py scripts can easily be ensured 
by retaining DU1. DU1 should be declared at end of its development 
life. DU1 API may eventually be re-implemented on top of DU2, 
allowing DU1 core to be ditched to reduce maintenance cost. Deprecate 
DU1 API.

- DU1 doesn't scale down as well as it could/should. Doesn't scale up 
as well as it could/should. Current DU2 proposals don't seem to 
address these points, seeking only to add new material on top rather 
than reexamine/reevaluate existing architecture. Some current DU2 
proposals smack of rampant architecture astronomy, lacking sufficient 
evaluation of their potential cost or whether the same goals could be 
achieved through other, simpler means.

- DU2 provides an opportunity to review everything learnt over course 
of DU1 development and do it better. DU1 development has stagnated 
under its own weight. DU1 architecture is a rat's nest. Not a good 
base to build DU2 on. Better to design afresh: assemble list 
representative range of use cases and their relative frequencies in 
real-world use, determine "ideal" solution, determine "practical" 
solution. "Practical" solution = "ideal" solution minus anything that 
would prove too disruptive to Python, or too expensive for the 
benefits it'd provide, or where existing material from DU1 could be 
leveraged in at less cost than reimplementing from scratch.

-------

Recommend:

- Before adding new features/complexity, refactor current _design_ to 
simplify it as much as possible. Philosophy here is much more 
hands-off than DU1; less is more; power and flexibility through 
simplicity: make others (filesystem, generic tools, etc.) do as much 
of the work as possible; don't create dependencies.

-- e.g. c.f. Typical OS X application installation procedure (mount 
disk image and copy single application package to Applications 
folder; no special tools/actions required) versus typical Windows 
installation procedure (run InstallShield to put lots of bits into 
various locations, update Registry, etc.) or typical Unix 
installation procedure (build everything from source, then move into 
location). Avoiding overreliance on rigid semi-complex procedures 
will allow DU2 to scale down very well and provide more flexibility 
in how it scales up.


- Eliminate DU1's "Swiss Army" tendencies. Separate the build, 
install and register procedures for higher cohesion and lower 
coupling. This will make it much easier to refactor design of each in 
turn.

- Every Python module should be distributed, managed and used as a 
single folder containing ALL resources relating to that module: 
sub-modules, extensions, documentation (bundled, generated, etc.), 
tests, examples, etc. (Note: this can be done without affecting 
backwards-compatibility, which is important.) Similar idea to OS X's 
package scheme, where all resources for [e.g.] an application are 
bundled in a single folder, but less formal (no need to hide package 
contents from user).


- Question: is there any reason why modules should not be installable 
via simple drag-n-drop (GUI) or mv (CLI)? A standard policy of "the 
package IS the module" (see above) would allow a good chunk of both 
existing and proposed DU "features" to be gotten rid of completely 
without any loss of "functionality", greatly simplifying both build 
and install procedures.

--Replace current system where user must explicitly state what they 
want included with one where user need only state what they want 
excluded. Simpler and less error-prone; fits better with user 
expectations (meeting the most common requirement should require 
least amount of work, ideally none). Manifest system would no longer 
be needed (good riddance). Most distributions could be created simply 
by zipping/tar.gzipping the module folder and all its contents, minus 
any .pyc and [for source-only extension distributions] .so files.

-- In particular, removing most DU involvment from build procedures 
would allow developers to use their own development/build systems 
much more easily.


- Installation and compilation should be separate procedures. Python 
already compiles .py files to .pyc on demand; is there any reason why 
.c/.so files couldn't be treated the same? Have a standard 'src' 
folder containing source files, and have Python's module mechanism 
look in/for that as part of its search operation when looking for a 
missing module; c.f. Python's automatic rebuilding of .pyc files from 
.py files when former isn't found. (Q. How would this folder's 
contents need to be represented to Python?)


- What else may setup.py scripts do apart from install modules (2) 
and build extensions (3)?

-- Most packages should not require a setup.py script to install. 
Users can, of course, employ their own generic shell 
script/executable to [e.g.] unzip downloaded packages and mv them to 
their site-packages folder.

-- Extensions distributed as source will presumably require some kind 
of setup script in 'src' folder. Would this need to be a dedicated 
Python script or would something like a standard makefile be 
sufficient?

-- Build operations should be handled by separate dedicated scripts 
when necessary. Most packages should only require a generic shell 
script/executable to zip up package folder and its entire contents 
(minus .pyc and, optionally, .so files).


- Remove metadata from setup.py and modules. All metadata should 
appear in a single location: meta.txt file included in every package 
folder. Use a single metadata scheme in simple structured nested 
machine-readable plaintext format (modified Trove); example:

------------------------------------------------------------------
Name
	roundup

Version
	0.1.0

Intended Audience
	End Users/Desktop
	Developers
	System Administrators

License
	OSI Approved
		Python Software Foundation License


Topic
	Communications
		Email
	Office/Business
	Software Development
		Bug Tracking

Dependencies
	etc...
------------------------------------------------------------------

- Improve version control. Junk current "operators" scheme (=, 
<, >, >=, <=) as both unnecessarily complex and inadequate (i.e. 
stating module X requires module Y (>= 1.0) is useless in practice as 
it's impossible to predict _future_ compatibility). Metadata should 
support 'Backwards Compatibility' (optional) value indicating 
earliest version of the module that current version is 
backwards-compatible with. Dependencies list should declare name and 
version of each required package (specifically, the version used as 
package was developed and released). Version control system can then 
use both values to determine compatibility. Example: if module X is 
at v1.0 and is backwards-compatible to v0.5, then if module Y lists 
module X v0.8 as a dependency then X 1.0 will be deemed acceptable, 
whereas if module Z lists X 0.4.5 as a dependency then X 1.0 will be 
deemed unacceptable and system should start looking for an older 
version of X.


- Make it easier to have multiple installed versions of a module. 
Ideally this would require including both name and version in each 
module name so that multiple modules may coexist in same 
site-packages folder. Note that this naming scheme would require 
alterations to Python's module import mechanism and would not be 
directly compatible with older Python versions (users could still use 
modules with older Pythons, but would need to strip version from 
module name when installing).

- Reject PEP 262 (installed packages database). Complex, fragile, 
duplication of information, single point of failure reminiscent of 
Windows Registry. Exploit the filesystem instead - any info a 
separate db system would provide should already be available from 
each module's metadata.


-- 
http://freespace.virgin.net/hamish.sanderson/



More information about the Distutils-SIG mailing list