[SciPy-dev] what the heck is an egg ;-)

Sat Dec 1 16:20:36 EST 2007

Joe Harrington wrote:
>> setup.py and using distutils is not novel: this is the standard way to 
>> distribute python packages to be built for years (distutils was included 
>> in python in 2000, 2001 ?).
> 
> Well, it's a matter of perspective, and the perspective I take is that
> of a new user to Python, or even a non-Python-using system manager who
> is installing numpy and friends for others to use.  To them, *anything*
> that is not a tar.gz, RPM, or Deb is novel, and most would not dare to
> use even an un-novel tar.gz in their OS directories.  Then we say, here,
> execute this setup.py as root, it's code for an interpreted language and
> you have no idea what it will do.  Well, that's pretty terrifying,
> especially to the security-conscious.

It takes enough time to write a package, write the build scripts, 
document everything nicely, polish the documentation, write tests to 
cover an adequate number of cases, test the code, and maintain the code 
while not breaking the tests. Meanwhile, many of us need to get some 
science done in the middle of all of it. The fact that some people 
haven't gotten around to debbing or RPM'ing their packages is 
understandable. It's all a matter of whether the author has time or 
interest in doing packaging. If someone is reluctant to use a package 
because the author has yet to get it packaged up, then how is it the 
problem of the author or community? This does not mean we don't want to 
have everything nicely packaged up in RPM or dpkg format.

> I know almost nothing about eggs.  I see them being used for all the
> Enthought code, which provides the de facto standard 3D environment,
> mayavi2.  What's a numerical package without a 3D environment?  While
> that's not on scipy.org, it's darn close, and necessary for an
> environment that competes with IDL or Matlab.

I've been using various numerical environments for years and have never 
had a need for a 3D environment. There are many people who need one but 
certainly not everyone. I'm not sure what you mean by competition 
because it's hard to compete with something that's free and easy-to-use, 
especially when it does what you want. Granted there are algorithms or 
tools in IDL and MATLAB that aren't in Scipy (the same could be said the 
other way around). The difference is that you don't pay for Scipy. You 
could certainly offer to pay someone in the community to write the thing 
you need, or you could do it yourself.

I made the choice to avoid proprietary software in the development of my 
scientific code base so that I could reduce the costs of the science I 
do and make it more enjoyable. I'm no longer locked into expensive 
software upgrades and I can now spread my simulation across many 
machines without having to deal with license issues.

I wrote on the order of about 15,000 lines of MATLAB code in my 
lifetime. It took a big investment of time to rewrite a lot of the most 
useful code in Python, and throw the rest in the trash can. For me, 
MATLAB made a lot of things harder. It made it harder to: collaborate 
with other people (some people outright refused to write code in MATLAB 
so they could interface with my code), run code at home due to license 
issues, distribute code to others in some sensible way, and run code on 
many machines. MATLAB also lacks decent object-orientation, which Python 
handles very nicely. The ability to pass by reference in Python has made 
it much easier to write memory-intensive and highly manipulative code. I 
realized from the start that I might need something that was unavailable 
in the Scipy framework, but I chose to take the risk so that I had the 
freedom to not deal with so many problems that make software 
development, simulation runs, and science not enjoyable.

> I agree that the correct path is to push everything into binary
> installs, even the experimental stuff.  I love the OS installers, and I
> thank the packagers from the bottom of my heart!  If only there were
> more of them, and if only they could handle more of these packages.  The
> OS installers may not deal with multiple package version on Linux, but I
> have never wanted more than one version.  Someone who does is probably a
> developer and can handle the tar installs, eggs, or whatever, and direct
> Python to find the results.  I believe that we would double our
> community's size if all our major packages were available in binary
> installs for all major platforms.
> 
> But, a plethora of packagers is not our situation.  It would help the
> inexperienced (including the aforementioned system manager, who will
> never be experienced in Python) to have some plain talk about what these
> Python-specific installers do, and how to use them to install,
> uninstall, and automatically update.  It can probably be knocked off in
> about a page per installer, but it has to be done by someone who knows
> them well.

Even if we did package RPMs for all the add-ons, that should not leave a 
sysadmin complacent. Just because a file is an RPM does not make it safe 
to install as root. The real issues are whether the file came from a 
trusted source, whether it was generated by trusted people, and whether 
this can be verified. However, these a separate issues from the one 
you've raised--that because we use, what is perceived by some at least, 
an "ad hoc" build process, this will cause sysadmins to question the 
security of the packages. I see that as a problem of ignorance on part 
of the sysadmin. Would the sysadmin be less suspect if the more 
universal autoconf and automake were used? Would it be worth the effort 
to use something more standard even if it took much more time to setup 
and maintain just to assuage the sysadmin?

Most languages have their own build tool, and some people choose to use 
them over a generic build tool sometimes because it is too frustrating, 
hassle-some, and time-consuming to get the generic one working right. 
When I first was exposed to python and saw this setup.py, I thought to 
myself what the heck is this non-standard thing and why should I learn 
how it works? Then one day, I was at a Border's cafe, and read about 
distutils in David Beazley's book. He made it seem so easy that I just 
had to try it when I got home, and within 30 minutes, I had several 
internal packages of mine building with Python's distutils. I was sold 
because it handles Python's idiosyncrasies in a much more fail-proof way 
than I could ever achieve with my makefiles. I could build source 
distributions, RPM spec files, windows installers, etc. with a few 
keystrokes. Mind you it is much more involved to ensure, with some 
confidence, that an RPM will universally work on any machine with the 
same platform on which it was generated but distutils at least 
simplifies the build process.

Your point is well taken that there should be a one-liner somewhere 
easy-to-find that Python's distutils and setuptools are used for 
building various Scipy packages. But distutils is so standard that it 
comes built into Python, which is installed by default in most Linux 
distributions. About the paranoia of some sysadmins out there in 
internet land when non-standard build tools are used, I don't know what 
to do about that. But I hope many of them aren't thinking that just 
because a file is an RPM, it is safe, or more safe than building the 
same package from source with either a standard or non-standard build tool.

Damian