[Distutils] The $0.02 of a packager ;-)

Greg Ward gward@cnri.reston.va.us
Thu, 4 Feb 1999 10:37:29 -0500


Hi Oliver --

glad you could join us -- you raise a lot of good points, and I'll see
if I can address (most of) them.  Your post will certainly serve as a
good "to do" list!

Quoth Oliver Andrich, on 02 February 1999:

> - Tasks and division of labour
> 
>   I think that the packager and installer role are a little bit mixed up. In
>   my eyes the packagers job is to build and test the package on a specific
>   platform and also build the binary distribution. This is also what you wrote
>   on the webpage.
> 
>   But the installers role should only be to install the prebuilt package,
>   cause his normal job is to provide uptodate software that the users of the
>   system he manages use. And he has enough to do with that.

Yes, the packager and installer are a little bit mixed up, because in
the most general case -- installing from a source distribution -- there
is no packager, and the installer has to do what that non-existent
packager might have done.  That is, starting with the same source
distribution, both a packager (creating a built distribution) and an
installer (of the hardcore Unix sysadmin type, who is not afraid of
source) would incant:

   # packager:                         # installer:
   tar -xzf foo-1.23.tar.gz            tar -xzf foo-1.23.tar.gz
   cd foo-1.23                         cd foo-1.23
   ./setup.py build                    ./setup.py build
   ./setup.py test                     ./setup.py test
   ./setup.py bdist --rpm              ./setup.py install

Yes, there is a lot of overlap there.  So why is the packager wasting
his time building this RPM (or some other kind of built distribution)?
Because *this* installer is an oddball running a six-year-old version of
Freaknix on his ancient Frobnabulator-2000, and Distutils doesn't
support Freaknix' weird packaging system.  (Sorry.)  So, like every good
Unix weenie, he starts with the source code and installs that.

More mainstream users, eg. somebody running a stock Red Hat 5.2 on their
home PC (maybe even with -- gasp! -- Red Hat's Python RPM, instead of a
replacement cooked up by some disgruntled Python hacker >grin<), will
just download the foo-1.23.i386.rpm that results from the packager
running "./setup.py bdist --rpm", and incant

   rpm --install foo-1.23.i386.rpm

Easier and faster than building from source, but a) it only works on
Intel machines running an RPM-based Linux distribution, and b) it
requires that some kind soul out there has built an RPM for this
particular Python module distribution.  That's why building from source
must be supported, and is considered the "general case" (even if not
many people will have to do it).

Also, building from source builds character (as you will quickly find
out if you ask "Where can I get pre-built binaries for Perl?" on
comp.lang.perl.misc ;-).  It's good for your soul, increases karma,
reduces risk of cancer (but not ulcers!), etc.

> - The proposed Interface 
> 
>   Basically the that I can say that I like the idea that I can write in my RPM
>   spec file
> 
>   setup.py build
>   setup.py test
>   setup.py install
> 
>   and afterwards I have installed the package somewhere on my machine. And I
>   am absolutely sure that it works as indented. I think that this is the way
>   it works for most Perl modules. 

That's the plan: a simple standard procedure so that anyone with a clue
can do this, and so that it can be automated for those without a clue.
And yes, the basic mode of operation was stolen shamelessly from the
Perl world, with the need for Makefiles removed because a) they're not
really needed, and b) they hurt portability.

>   But I have problems with bdist option of setup.py cause I think that this is
>   hard to implement. If I got this right I as a RPM and debian package
>   maintainer should be able to say
> 
>   setup.py bdist rpm
>   setup.py bdist debian
> 
>   And afterwards I have a debian linux and rpm package of the Python package.

That's the basic idea, except it would probably be "bdist --rpm" --
'bdist' being the command, '--rpm' being an option to it.  If it turns
out that all the "smart packagers" are sufficiently different and
difficult to wrap, it might make sense to make separate commands for
them, eg. "bdist_rpm", "bdist_debian", "bdist_wise", etc.  Or something
like that.

>   Nice in theory but this would require that setup.py or the distutils
>   packages how to create these packages, that means we have to implement a
>   meta packaging system on top of existing packaging systems which are
>   powerful themselves. So what would it look like when I call these commands
>   above?
> 
>   Would the distutils stuff create a spec file (input file to create a
>   rpm) and then call rpm -ba <specfile>? And inside the rpm build
>   process setup.py is called again to compile and install the packages
>   content? Finally rpm creates the two normal output files, which are
>   the actual binary package and the other is the source rpm from which
>   you can recompile the binary package on your machine.

I haven't yet thought through how this should go, but your plan sounds
pretty good.  Awkward having setup.py call rpm, which then calls
setup.py to build and install the modules, but consider that setup.py is
really just a portal to various Distutils classes.  In reality, we're
using the Distutils "bdist" command to call rpm, which then calls the
Distutils "build", "test", and "install" commands.  It's not so clunky
if you think about it that way.

Also, I don't see why this constitutes "building a meta packaging
system" -- about the only RPM-ish terrain that Distutils would intrude
upon is knowing which files to install.  And it's got to know that
anyways, else how could it install them?  Heck, all we're doing here is
writing a glorified Makefile in Python because Python has better control 
constructs and is more portable than make's language.  Even the lowliest 
Makefile with an "install" target has to know what files to install.

>   This is the same for debian linux, slackaware linux, rpm based linux
>   versions, Solaris packages and BeOS software packages. The last is only a
>   vague guess cause I only looked into the Be system very shortly.

The open question here is, How much duplication is there across various
packaging systems?  Definitely we should concentrate on the
build/test/dist/install stuff first; giving the world a standard way to
build module distributions from source would be a major first step, and
we can worry about built distributions afterwards.

> - What I would suggest what setup.py should do 
> 
>   The options that I have no problem with are 
> 
>   build_py    - copy/compile .py files (pure Python modules)
>   build_ext   - compile .c files, link to .so in blib
>   build_doc   - process documentation (targets: html, info, man, ...?)
>   build       - build_py, build_ext, build_doc
>   dist        - create source distribution
>   test        - run test suite
>   install     - install on local machine
>   
>   What should make_blib do ?

"make_blib" just creates a bunch of empty directories that mimic
something under the Python lib directory, eg.

   ./blib
   ./blib/site-packages
   ./blib/site-packages/plat-sunos5
   ./blib/doc
   ./blib/doc/html

etc.  (The plat directory under site-packages is, I think, something not
in Python 1.5 -- but has Michel Sanner pointed out, it appears to be
needed.)

The reason for this: it provides a mockup installation tree in which to
run test suites, it makes installation near-trivial, and it makes
determining which files get installed where near-trivial.

The reason for making it a separate command: because build_py,
build_ext, build_doc, and build all depend on it having already been
done, so it's easier if they can just "call" this command themselves
(which will of course silently do nothing if it doesn't need to do
anything).

>   But I require is that I can tell the build_ext which compiler switches to
>   use, cause may be I need on my system different switches then the original
>   developer can use.

Actually, the preferred compiler/flags will come not from the module
developer but from the Python installation which is being used.  That's
crucial; otherwise the shared library files might be incompatible with
the Python binary.  If you as packager or installer wish to tweak some
of these ("I know this extension module is time-intensive, so I'll
compile with -O2 instead of -O"), that's fine.  Of course, that opens up
some unpleasant possibilities: "My sysadmin compiled Python with cc, but
I prefer gcc, so I'll use it for this extension."  Danger Will Robinson!
Danger!  Not much we can do about that except warn in the documentation,
I suppose.

>   I also like to provide the install option with an argument to tell
>   where the files should be installed, cause I can tell rpm for
>   example that it should compile the extension package as if it would
>   be installed in /usr/lib/python1.5 but could it in the install stage
>   to install it in /tmp/py-root/usr/lib/python1.5. So I can build and
>   install the package without overwriting an existing installation of
>   a older version and I also have a clean way to determine what files
>   actually got installed.

Yes, good idea.  That should be an option to the "install" command;
again, the default would come from the current Python installation, but
could be overidden by the packager or installer.

>   install should also be split up into install and install_doc and
>   installdoc should also be able to take an argument where I tell it
>   where to install the files to.

Another good idea.  Actually, I think the split should be into "install
python library stuff" and "install doc"; "install" would do both.
I *don't* think that "install" should be split, like "build", into
"install_py" and "install_ext".  But I could be wrong... opinions?

>   I would remove the bdist option cause it would introduce a lot of
>   work, cause you not only have to tackle various systems but also
>   various packaging systems. I would add an option files instead which
>   returns a list of files this packages consists of. And consequently
>   an option doc_files is also required cause I like to stick to the
>   way rpm manages doc files, I simply tell it what files are doc files
>   and it installs them the right way.

Punting on bdist: ok.  Removing it? no way.  It should definitely be
handled, although it's not as high priority as being able to build from
source.  (Because obviously, if you can't build from source, you can't
make a built distribution!)

The option(s) to get out list(s) of files installed is good.  Where does
it belong, though?  I would think something like "install --listonly"
would do the trick.

>   Another that would be fine if I could extract the package information with
>   setup.py. Something like setup description returns the full description and
>   so on.

I like that -- probably best to just add one command, say "meta".  Then
you could say "./setup.py meta --description" or "./setup.py meta
--name --version".  Or whatever.

>   And I would also add an option system to the command line options, cause I
>   like to tell the setup.py script an option from which it can determine on
>   which system it is running. Why this is required will follow.

Already in the plan.  Go back and check the archives for mid-January --
I posted a bunch of stuff about design proposals, with how-to-handle-
command-line-options being one of my fuzzier areas.  (Eg. see
http://www.python.org/pipermail/distutils-sig/1999-January/000124.html
and followups.)

> - ARCH dependent sections should be added
> 
>   What is not clear in my eyes, may be I have missed something, but how do you
>   deal with different architectures? What I would suggest here is we should
>   use a dictionary instead of plain definitions of cc, ccshared, cflags and
>   ldflags. Such a dictionary may look like that

Generally, that's left up to Python itself.  Distutils shouldn't have a
catalogue of compilers and compiler flags, because those are chosen when
Python is configured and built.  That's the autoconf philosophy -- no
feature catalogues, just make sure that what you try makes sense on the
current platform, and let the builder (of Python in this case, not
necessarily of a module distribution) override if he needs to.  Module
packagers and installers can tweak compiler stuff a little bit, but it's
dangerous -- the more you tweak, the more likely you are to generate
shared libraries that won't load with your Python binary.

> - Subpackages are also required
> 
>   Well, this is something that I like very much and what I really got
>   accustomed to. They you build PIL and also a Tkinter version that supports
>   PIL, then like to create both packages and also state that PIL-Tkinter
>   requires PIL. 

Something like that has been tried in the Perl world, except they were
talking about "super-packages" and they called them "bundles".  I think
it was a hack because module dependencies were not completely handled
for a long time (which, I gather, has now been fixed).  I never liked
the idea, and I hope it will now go away.

The plan for Distutils is to handle module dependencies from the start,
because that lack caused many different Perl module developers to have
to write Makefile.PL's that all check for their dependencies.  That
should be handled by MakeMaker (in the Perl world) and by Distutils (in
the Python world).

Thanks for your comments!

        Greg
-- 
Greg Ward - software developer                    gward@cnri.reston.va.us
Corporation for National Research Initiatives    
1895 Preston White Drive                      voice: +1-703-620-8990 x287
Reston, Virginia, USA  20191-5434               fax: +1-703-620-0913