[Distutils] Pondering multi-package packages

Greg Ward gward@python.net
Sun, 28 May 2000 15:23:17 -0400


On 27 May 2000, M.-A. Lemburg said:
> I was referring to installing a (pre)built binary -- just before
> copying the compiled files to their final install location and
> right after that step is done.

Yes: this is the one place where the Distutils' extension mechanism
*won't* work, because the Distutils aren't present (or at least, not in
control) when installing a pre-built binary.  Here, some other mechanism
will be needed: pass a function, or a module, or a chunk of code to be
eval'd, or something.  Still not sure what's best; we have to balance
the needs of the developer writing the setup script with the facilities
available at the time the hook is run, and how the hook will be run
("python hookscript.py"?).

> "install-from-source" would execute these hooks too: right after
> having built the binaries.

Yes, *except* in the case where the installation is being done solely
for the purpose of creating a built distribution.  I'm pretty sure this
can be handled by adding a "fake install" flag to the "install" command:
if true, don't run the {pre,post}-install hooks.

> Wouldn't a method interface be more reliable and provide
> better means of extension using subclassing ?
> 
> I usually wrap these attributes in .get_foobar(), .set_foobar()
> methods -- this also makes it clear which attributes are
> read-only, read-write or "better don't touch" :-)

"Yes, but..."

I have spent the weekend thinking hard about this problem, and I think I
can explain the situation a little better now.  Distutils commands are
rather odd beasts, and the usual rules and conventions of OO programming
don't work very well with them.  Not only are they singletons (enforced
by the Distribution method 'get_command_obj()'), but they have a
prescribed life-cycle which is also enforced by the Distribution class.
Until today, this life-cycle was strictly linear:

        non-existent
        ---> preinitialized ---> initialized
        ---> finalized
        ---> running
        ---> run

"Preinitialized" and "initialized" are on the same line because, to
outsiders, they are indistinguishable: the transition happens entirely
inside the Command constructor.  It works like this:

  * before we create any command objects, we find and parse all config
    files, and parse the command line; the results are stored in a
    dictionary 'command_options' belonging to the Distribution instance
  * somebody somewhere calls Distribution.get_command_obj("foo"), which
    notices that it hasn't yet instantiated the "foo" command (typically
    implemented by the class 'foo' in the module distutils.command.foo)
  * 'get_command_obj()' instantiates a 'foo' object; command classes do
    not define constructors, so we go straight into Command.__init__
  * Command.__init__ calls self.initialize_options(), which must
    be provided by each individual command class
  * 'initialize_options()' is typically a series of
      self.this = None
      self.that = None
    assignments: ie. it "declares" the available "options" for this
    command.  (The 'user_options' class attribute also "declares"
    the command's options.  The two are redundant; every "foo-bar"
    option in 'user_options' must be matched by a "self.foo_bar = None"
    in 'initialize_options()', or it will all end in tears.)
  * some time later (usually immediately), the command's
    'finalize_options()' method is called.  The job of
    'finalize_options()' is to make up the command's mind about
    everything that will happen when the command runs.  Typical code
    in 'finalize_options()' is:
      if self.foo is None:
         self.foo = default value
      if self.bar is None:
         self.bar = f(self.foo)
    
    Thus, we respect the user's value for 'foo', and have a sensible
    default if the user didn't provide one.  And we respect the user's
    value for 'bar', and have a sensible -- possibly complicated --
    default to fallback on.

    The idea is to reduce the responsibilities of the 'run()' method,
    and to ensure that "full disclosure" about the command's intentions
    can be made before it is ever run.

To play along with this complicated dance, Distutils command classes
have to provide 1) the 'user_options' class attribute, 2) the
'initialize_options()' method, and 3) the 'finalize_options()' method.
(They also have to provide a 'run()' method, of course, but that has
nothing to do with setting/getting option values.)

The payoff is that new command classes get all the Distutils user
interface -- command-line parsing and config files, for now -- for free.
The example "configure" command that I showed in a previous post, simply
by virtue of having "foo-inc" and "foo-lib" in 'user_options' (and
corresponding "self.xxx = None" statements in 'initialize_options()',
will automatically use the Distutils' config file and command-line
parsing mechanism to set values for those options.  Only if the user
doesn't supply the information do we have to poke around the target
system to figure out where "foo" is installed.

Anyways, the point of this long-winded discussion is this: certain
attributes of command objects are public and fair game for anyone to set
or modify.  However, there are well-defined points in the object's
life-cycle *before* which it is meaningless to *get* option values, and
*after* which it is pointless to *set* option values.  In particular,
there's no point in getting an option value *before* finalization,
because -- duh -- the options aren't finalized yet.  More subtly,
attempting to set some option *after* finalization time might have no
effect at all (if eg. that option is only used to derive other options
from, like the 'build_base' option in the "build" command); or it might
have complicated, undesirable effects.  I can see this happening in
particular with the "install" command, which (necessarily) has a
frighteningly complex finalization routine.

If we go by the simple, linear state-transition diagram above, it turns
out that setting option values for a particular command object is a
dicey proposition: you simply don't know what state the command object
is in, so you don't know what effect setting values on that command will
have.  If you try to force them to have the right effect, by calling
'finalize_options()', it won't work: the way that method is typically
written ("if self.foo is None: self.foo = default value", for as many
values of "foo" as are needed), calling it a second time just won't
work.

So today, I added a couple of new transitions to that state-transition
diagram.  Now, you can go from any state to the "initialized" state
using the 'reinitialize_command()' method provided by Distribution.  So
it's now safe to do something like this, eg. in a "configure" command

    build = self.reinitialize_command("build")
    build.include_dirs.append(foo_inc)
    build.library_dirs.append(foo_lib)
    build.ensure_finalized()

...and you know that any user-specified options to the "build" command
will be preserved, and that all dependent-but-unspecified options will
be recomputed.  (You don't need to call 'ensure_finalized()' here unless
you will subsequently by getting some option values from the "build"
object.)

Thus, it should now be possible to write a "configure" command that
respects the bureaucracy of the Distutils *and* forces the "build"
command to do The Right Thing.  This is a small change to the code, but
a major change to the philosophy of option-passing in the Distutils,
which until now was (theoretically) "pull only": it was not considered
proper or safe to assign another command's option attributes; now it is,
as long as you play by the above rules.  Cool!

BTW, I'm not opposed to the idea of 'get_foo()' and 'set_foo()' methods:
they could add some value, but only if they are provided by the Command
class, rather than each command having to implement a long list of
near-identical accessor and modifier methods.  Probably 'get_foo()'
should die if the object hasn't been finalized, and 'set_foo()' should
die if it has been finalized (or hasn't been initialized).

Hope this makes some sense...

        Greg
-- 
Greg Ward - Unix nerd                                   gward@python.net
http://starship.python.net/~gward/
I haven't lost my mind; I know exactly where I left it.