[Distutils] how do I add some functionality to my setup.py which is testable and modular?

Brian Warner warner at lothar.com
Mon Aug 17 07:55:28 CEST 2009


On Sun, 16 Aug 2009 14:16:29 -0600
Zooko Wilcox-O'Hearn <zooko at zooko.com> wrote:

> Folks:
> 
> I have a recurring problem with distutils/setuptools/Distribute,  
> which is that I don't know how to extend the functionality of  
> setup.py and make the new functionality be testable and modular.
> ...
> Here's one specific example that I currently have a lot of
> experience with: I'd like to generate a version number from revision
> control history.
> ...
> 
> The third is what I do in Tahoe-LAFS [6].  I moved the functionality  
> in question into a separate Python package, in this case named  
> "darcsver" [7], and used setuptools's plugin system to add a command  
> named "darcsver" which initializes the distribution.metadata.version  
> attribute correctly.  Then I had to add a bunch of aliases to my  
> setup.cfg [8] saying "If you're going to build, first darcsver, and  
> if you're going to install, first darcsver, and ...".  This sort of  
> works, except that yesterday my programming partner Brian Warner  
> informed me [9] that he expected the "python ./setup.py --version"  
> command-line to output the version number.  Argh!  There is no way
> to configure in my setup.cfg "If you're going to --version, first  
> darcsver.".

I think I have a plan for addressing this in the specific case of Tahoe.
The following may be irrelevant to the extending-setuptools discussion,
but it also might be informative.

The job of "setup.py darcsver" (as used by Tahoe) is to perform a
relatively disk-intensive operation to compute the current source tree's
version string and then record it in two places:

 src/allmydata/_version.py : for use by tahoe itself, so it knows it own
                             version at runtime, i.e. so that running a
                             "tahoe --version" command works
 distribution.metadata.version : for use by setuptools, so commands like
                                 "setup.py sdist" can name the tarball
                                 correctly

"distribution.metadata.version" is usually set by passing a version=
argument to the main setup() call inside setup.py . Packages can use
whatever mechanism they like to decide what to pass to version= . Tahoe
currently never passes a version= to setup().

There are two basic scenarios that Tahoe's what-version-should-I-use
code must contend with:

 1: running in a darcs checkout
 2: running from a source tarball, without _darcs/ metadata

We always use darcsver to populate src/allmydata/_version.py before
generating a tarball, so the #2 scenario should always find a
_version.py file with the pre-computed version string.

darcsver has code to handle the situation where it cannot use darcs to
compute a version string (either "darcs" is unrunnable or _darcs/ is
missing), and *also* has an existing _version.py file. In this case, it
reads _version.py and greps the version string out of it, then saves the
result in distribution.metadata.version . 

This is the bit that's relevant to Zooko's main question: it doesn't
attempt to "import allmydata._version", but instead of does a
out-of-band open/read/grep. In setup.py and other build-time code, I'm
generally opposed to using "import XYZ; XYZ.__version__" to determining
somebody's version string, specifically because you can't unimport
anything. Instead, I prefer to grep a version out of a file with a
well-known-format (we generate _version.py so we control its format,
making it safe to grep), or to run a subprocess which does "import XYZ;
print XYZ.__version__" (which of course still assumes the __version__
convention).

Anyways, the problem with "setup.py --version" not working is because
this read-_version.py-and-grep code only gets run when you invoke the
"darcsver" command, and "setup.py --version" doesn't run any commands.

So my proposal for Tahoe is:

 1: setup.py should always start by attempting to read
    src/allmydata/_version.py and grep the version out of it. If this
    works, pass a version= argument into the setup() call, which will
    populate distribution.metadata.version and make everything work
    (including "setup.py --version", "setup.py --fullname", and
    "setup.py sdist")
 2: if the "darcsver" command is run, that will possibly regenerate
    _version.py and reset distribution.metadata.version to a new value

In general, you only run "darcsver" when you want to rebuild _version.py
. At all other times we should instead just use the most recently cached
value. Zooko modified the tahoe setup.cfg file to forcibly invoke
"darcsver" before most major commands for two reasons:

 1: to get a version string at all for commands like sdist (since
    without the always-read-_version.py approach proposed above, this
    was the only way to set distribution.metadata.version)

 2: to protect the user against confusion if they run darcsver, then
    pull a new patch or two (invalidating the version string), then run
    sdist: the tarball would be generated with the wrong version string.
    Basically setup.py has no way to magically tell that the source tree
    has been changed, so re-running darcsver all the time is the only
    way to make sure the version string is up-to-date.

My personal preference would be to leave tahoe.cfg empty and instruct
people to run "darcsver" before doing anything else, or to make setup.py
test for the existence of _version.py and invoke darcsver if it is
missing, because darcsver is rather disk-intensive and takes up to 10
seconds to run on my slow (FileVault encrypted) OS-X partition. But I
appreciate Zooko's concern#2, and have been personally bitten by the
out-of-date-version-string problem in the past. So tolerating slower
setup.py commands is a reasonable tradeoff to make.

It may still be an interesting-to-setuptools issue of how to best
modularize this proposed "read _version.py and run darcsver if
necessary" functionality. It seems to me that the setuptools plugin
mechanism is a good way to provide new commands (like "darcsver"), but
not a good way to persistently modify existing behavior, and that the
latter will always require not-very-modular customizations to a
package's setup.py .


cheers,
 -Brian


More information about the Distutils-SIG mailing list