[SciPy-dev] what the heck is an egg ;-)
Damian Eads
eads at soe.ucsc.edu
Sat Dec 1 16:20:36 EST 2007
Joe Harrington wrote:
>> setup.py and using distutils is not novel: this is the standard way to
>> distribute python packages to be built for years (distutils was included
>> in python in 2000, 2001 ?).
>
> Well, it's a matter of perspective, and the perspective I take is that
> of a new user to Python, or even a non-Python-using system manager who
> is installing numpy and friends for others to use. To them, *anything*
> that is not a tar.gz, RPM, or Deb is novel, and most would not dare to
> use even an un-novel tar.gz in their OS directories. Then we say, here,
> execute this setup.py as root, it's code for an interpreted language and
> you have no idea what it will do. Well, that's pretty terrifying,
> especially to the security-conscious.
It takes enough time to write a package, write the build scripts,
document everything nicely, polish the documentation, write tests to
cover an adequate number of cases, test the code, and maintain the code
while not breaking the tests. Meanwhile, many of us need to get some
science done in the middle of all of it. The fact that some people
haven't gotten around to debbing or RPM'ing their packages is
understandable. It's all a matter of whether the author has time or
interest in doing packaging. If someone is reluctant to use a package
because the author has yet to get it packaged up, then how is it the
problem of the author or community? This does not mean we don't want to
have everything nicely packaged up in RPM or dpkg format.
> I know almost nothing about eggs. I see them being used for all the
> Enthought code, which provides the de facto standard 3D environment,
> mayavi2. What's a numerical package without a 3D environment? While
> that's not on scipy.org, it's darn close, and necessary for an
> environment that competes with IDL or Matlab.
I've been using various numerical environments for years and have never
had a need for a 3D environment. There are many people who need one but
certainly not everyone. I'm not sure what you mean by competition
because it's hard to compete with something that's free and easy-to-use,
especially when it does what you want. Granted there are algorithms or
tools in IDL and MATLAB that aren't in Scipy (the same could be said the
other way around). The difference is that you don't pay for Scipy. You
could certainly offer to pay someone in the community to write the thing
you need, or you could do it yourself.
I made the choice to avoid proprietary software in the development of my
scientific code base so that I could reduce the costs of the science I
do and make it more enjoyable. I'm no longer locked into expensive
software upgrades and I can now spread my simulation across many
machines without having to deal with license issues.
I wrote on the order of about 15,000 lines of MATLAB code in my
lifetime. It took a big investment of time to rewrite a lot of the most
useful code in Python, and throw the rest in the trash can. For me,
MATLAB made a lot of things harder. It made it harder to: collaborate
with other people (some people outright refused to write code in MATLAB
so they could interface with my code), run code at home due to license
issues, distribute code to others in some sensible way, and run code on
many machines. MATLAB also lacks decent object-orientation, which Python
handles very nicely. The ability to pass by reference in Python has made
it much easier to write memory-intensive and highly manipulative code. I
realized from the start that I might need something that was unavailable
in the Scipy framework, but I chose to take the risk so that I had the
freedom to not deal with so many problems that make software
development, simulation runs, and science not enjoyable.
> I agree that the correct path is to push everything into binary
> installs, even the experimental stuff. I love the OS installers, and I
> thank the packagers from the bottom of my heart! If only there were
> more of them, and if only they could handle more of these packages. The
> OS installers may not deal with multiple package version on Linux, but I
> have never wanted more than one version. Someone who does is probably a
> developer and can handle the tar installs, eggs, or whatever, and direct
> Python to find the results. I believe that we would double our
> community's size if all our major packages were available in binary
> installs for all major platforms.
>
> But, a plethora of packagers is not our situation. It would help the
> inexperienced (including the aforementioned system manager, who will
> never be experienced in Python) to have some plain talk about what these
> Python-specific installers do, and how to use them to install,
> uninstall, and automatically update. It can probably be knocked off in
> about a page per installer, but it has to be done by someone who knows
> them well.
Even if we did package RPMs for all the add-ons, that should not leave a
sysadmin complacent. Just because a file is an RPM does not make it safe
to install as root. The real issues are whether the file came from a
trusted source, whether it was generated by trusted people, and whether
this can be verified. However, these a separate issues from the one
you've raised--that because we use, what is perceived by some at least,
an "ad hoc" build process, this will cause sysadmins to question the
security of the packages. I see that as a problem of ignorance on part
of the sysadmin. Would the sysadmin be less suspect if the more
universal autoconf and automake were used? Would it be worth the effort
to use something more standard even if it took much more time to setup
and maintain just to assuage the sysadmin?
Most languages have their own build tool, and some people choose to use
them over a generic build tool sometimes because it is too frustrating,
hassle-some, and time-consuming to get the generic one working right.
When I first was exposed to python and saw this setup.py, I thought to
myself what the heck is this non-standard thing and why should I learn
how it works? Then one day, I was at a Border's cafe, and read about
distutils in David Beazley's book. He made it seem so easy that I just
had to try it when I got home, and within 30 minutes, I had several
internal packages of mine building with Python's distutils. I was sold
because it handles Python's idiosyncrasies in a much more fail-proof way
than I could ever achieve with my makefiles. I could build source
distributions, RPM spec files, windows installers, etc. with a few
keystrokes. Mind you it is much more involved to ensure, with some
confidence, that an RPM will universally work on any machine with the
same platform on which it was generated but distutils at least
simplifies the build process.
Your point is well taken that there should be a one-liner somewhere
easy-to-find that Python's distutils and setuptools are used for
building various Scipy packages. But distutils is so standard that it
comes built into Python, which is installed by default in most Linux
distributions. About the paranoia of some sysadmins out there in
internet land when non-standard build tools are used, I don't know what
to do about that. But I hope many of them aren't thinking that just
because a file is an RPM, it is safe, or more safe than building the
same package from source with either a standard or non-standard build tool.
Damian
More information about the SciPy-Dev
mailing list