Jeremy Hylton :
||last modified Thu Mar 17 01:11:16 2005
Noticed a link to this paper on Neil Schemenauer's web log:
Steve Traugott and Joel Huddleston. Bootstrapping an Infrastructure. Proceedings of the 12th Systems Administration Conference (LISA '98).
When deploying and administering systems infrastructures it is still common to think in terms of individual machines rather than view an entire infrastructure as a combined whole. This standard practice creates many problems, including labor-intensive administration, high cost of ownership, and limited generally available knowledge or code usable for administering large infrastructures.
The model we describe treats an infrastructure as a single large distributed virtual machine. We found that this model allowed us to approach the problems of large infrastructures more effectively. This model was developed during the course of four years of mission-critical rollouts and administration of global financial trading floors. The typical infrastructure size was 300-1000 machines, but the principles apply equally as well to much smaller environments. Added together these infrastructures totaled about 15,000 hosts. Further refinements have been added since then, based on experiences at NASA Ames.
The methodologies described here use UNIX and its variants as the example operating system. We have found that the principles apply equally well, and are as sorely needed, in managing infrastructures based on other operating systems.
This paper is a living document: Revisions and additions are expected and are available at www.infrastructures.org. We also maintain a mailing list for discussion of infrastructure design and implementation issues -- details are available on the web site.
Running a big server farm is a fairly interesting problem. I've seen how a few organizations have done it, and it has always seemed like there should be a better way. This paper describes a better way.
Blocking SOBIG.F with dynamic firewall updates
Martijn Pieters came up with an excellent solution to get the mail flowing through mail.python.org (aka mail.zope.org): When a host tries to send too many copies of the SOBIG virus, block it from making any more connections with a Linux firewall rule.
mail.python.org has been getting crushed by the SOBIG.F worm. There are so many machines attempting to deliver the virus to a python.org address, that nothing is getting through. The bounces and virus notification messages are almost as bad. They just clog the server with useless mail.
We struggled coming up with a good solution to the problem. All sorts of second-order failures occurred; for example, the /var partition that holds the exim spools filled up. We eventually stopped new virus deliveries by filtering on the subject, but even then the number of attempted deliveries choked everything else out.
Martijn's solution is to scan the exim log and count the number of times an IP address attempts to deliver a virus. If it tries more than N times in M minutes, it is blocked from connecting to port 25 with an ipchains firewall rule. (M and N are 5 and 15 for now.)
Later, someone on one of the lists mentioned that this technique was described in a LISA paper last year:
Deeann M.M. Mikula, Chris Tracy, and Mike Holling. Spam Blocking with a Dynamically Updated Firewall Ruleset. Proceedings of LISA '02: 16th Systems Administration Conference, 13--20.
Abstract. In this paper, we detail our methods for controlling spam at a small ISP, reducing both resource usage and customer complaints. We will discuss our initial unsuccessful tactics, and the resulting development of our unique spam blocking system. Deny-Spammers classifies hosts as probable spammers and inserts those hosts into a dynamically updated firewall ruleset on our mail server, thereby effectively blocking the host from making an SMTP connection to our mail server. Our analysis demonstrates that this has been effective in reducing the amount of spam that our customers receive, and the burden on our limited resources.