[Mailman-Users] CGI alternative?

Dan Wilder dan at ssc.com
Tue Mar 26 00:31:45 CET 2002


There are gobs of abusive searchbots out there. I won't name
names.  Look at your logs and estimate hits/referrals.  Not to mention 
email harvesters and college kids sitting behind an OC3, having nothing 
better to do with their dorm room computers than run a recursive wget 
on your site every ten minutes.  Never mind that most of the material 
never changes once it is up.

On the other hand, watch out for large ISPs who reach out to the
world through a small pool of IP addresses.

Apache can be configured to block on browser match, which will get
rid of a lot of those you don't want.  It can also block specific
IP numbers.   Read your logs, look at your web page stats, figure
out who you don't need, and go to it.

For the individual irresponsible individuals you'll need some sort
of rate throttling.

Re Google: in my experience they are VERY responsible about their
web page crawling, seldom exceeding one hit per couple of minutes on 
our sites, even considering that they crawl simultaneously from several 
IP numbers.  Generally they're among our top ten or fifteen referrers,
while accounting on any time horizon for less than 0.2% of our total 
kbytes.  They do a great job.

On Mon, Mar 25, 2002 at 05:39:31PM -0500, Jon Carnes wrote:
> Apache has some nice tools for limiting the number of connections on a
> specific resource.  I believe that you can throttle on both requested
> resource and on destination IP.  You probably want to cap the amount of
> connections from any single ip address to something like 20.
> 
> There are also some nice firewall scripts that do the same thing, but more
> drastically.  Once the number of connections from a single IP reaches a
> defined number, the offending ip is denied access for 5 minutes (or whatever
> time period you want).
> 
> Good Luck.  Let us know what you come up with.
> ----- Original Message -----
> From: "kellan" <kellan at protest.net>
> To: <mailman-users at python.org>
> Sent: Monday, March 25, 2002 5:05 PM
> Subject: [Mailman-Users] CGI alternative?
> 
> 
> > Hi, I'm part of a team that works on maintaining lists.indymedia.org, we
> > have a very large number of lists, a lot of traffic, and seemingly a lot
> > of interest for robots, particularily badly behaved ones.
> >
> > Several times lately the server has started choaking and dieing, with
> > loads of 70+ in response to some bot hitting all the listinfo pages, and
> > firing up dozens and dozens of CGI processes.
> >
> > I was wondering if anyone else has struggled with this issue.  We don't
> > want to simply block all robots, google can be essential to finding old
> > posts on some of the high traffic lists.  So I'm looking for less
> > resource intensive solutions then CGI.
> >
> > Has anyone setup the Mailman CGIs under FastCGI, or mod_snake or
> > something?  How did that go?
> >
> > Also considering simply sticking Squid in front of it, anybody have tips
> > on a Mailman friendly Squid config would be great.
> >
> > Am I missing an obvious solution?
> >
> > Thanks
> > Kellan
> >
> 
> 
> ------------------------------------------------------
> Mailman-Users mailing list
> Mailman-Users at python.org
> http://mail.python.org/mailman/listinfo/mailman-users
> Mailman FAQ: http://www.python.org/cgi-bin/faqw-mm.py
> 

-- 
-----------------------------------------------------------------
 Dan Wilder <dan at ssc.com>   Technical Manager & Editor
 SSC, Inc. P.O. Box 55549   Phone:  206-782-8808
 Seattle, WA  98155-0549    URL http://embedded.linuxjournal.com/
-----------------------------------------------------------------




More information about the Mailman-Users mailing list