Jeremy Hylton : weblog : 2003-08-10

Administering servers

Sunday, August 10, 2003

Noticed a link to this paper on Neil Schemenauer's web log:

Steve Traugott and Joel Huddleston. Bootstrapping an Infrastructure. Proceedings of the 12th Systems Administration Conference (LISA '98).


When deploying and administering systems infrastructures it is still common to think in terms of individual machines rather than view an entire infrastructure as a combined whole. This standard practice creates many problems, including labor-intensive administration, high cost of ownership, and limited generally available knowledge or code usable for administering large infrastructures.

The model we describe treats an infrastructure as a single large distributed virtual machine. We found that this model allowed us to approach the problems of large infrastructures more effectively. This model was developed during the course of four years of mission-critical rollouts and administration of global financial trading floors. The typical infrastructure size was 300-1000 machines, but the principles apply equally as well to much smaller environments. Added together these infrastructures totaled about 15,000 hosts. Further refinements have been added since then, based on experiences at NASA Ames.

The methodologies described here use UNIX and its variants as the example operating system. We have found that the principles apply equally well, and are as sorely needed, in managing infrastructures based on other operating systems.

This paper is a living document: Revisions and additions are expected and are available at We also maintain a mailing list for discussion of infrastructure design and implementation issues -- details are available on the web site.

Running a big server farm is a fairly interesting problem. I've seen how a few organizations have done it, and it has always seemed like there should be a better way. This paper describes a better way.