[Mailman-Users] Load-balancing Mailman in LVS cluster

Wed Jun 30 02:41:46 CEST 2004

Brad Knowles wrote:

> At 9:32 AM +1000 2004-06-30, Guy Waugh wrote:
>
>>  The system I'm building already has apache on each of the two 
>> application
>>  servers in the cluster, and the web docroot is NFS-shared between 
>> the two
>>  from the third server I mentioned above, so there shouldn't be any 
>> dramas
>>  with the web archives that Mailman generates (unless there are file 
>> locking
>>  issues with these...?).
>
>
>     I wouldn't expect the web archives would have problems with 
> locking, no.
>
>     But keep in mind that this is a very small part of what Mailman does.
>
>>                           Similarly, sendmail is a standalone app on
>>  both servers, so actually sending mail shouldn't be a problem. Mailman
>>  will be sending mail to other servers outside the cluster (i.e. no user
>>  accounts exist within the cluster). So, my only problem (I think) with
>>  this is going to be with Mailman...
>
>
>     That seems likely.
>
>     The other pieces of this puzzle are pretty well understood on the 
> scalability side of the picture, and you can pull out a whole host of 
> known workable solutions, depending on your particular needs.
>
>>  I wasn't aware of those, so thanks for letting me know. We do run 
>> RHEL3,
>>  so GFS would be an option, but for US$2,200, I think I'd have an uphill
>>  battle justifying it. I see that NFS has an option of 'noac' (no 
>> attribute
>>  caching) which sounds potentially useful for me - I don't know whether
>>  that directly relates to file locking, though.
>
>
>     You need noac when sharing filesystems like this for other 
> reasons, but it has nothing to do with file locking.
>
>     The problem is that locking is handled outside of the NFS protocol 
> per se.  You have lock manager daemons running on both the server and 
> the client, and while NFS is supposedly stateless, they are not.  And 
> it is not uncommon for the server and client lock manager daemons to 
> get out-of-sync in a busy environment.  In addition to handling 
> locking, these daemons also handle mount requests.
>
>     NFSv4 is being re-written to become more stateful and to bring the 
> management of locks inside the base protocol, so that you don't have 
> to worry about lock managers that lose their minds or simply roll over 
> and die, and filesystems that can be read from and written to because 
> the NFS side of the server is still working fine, but which cannot 
> handle locking or be mounted or unmounted because the lock manager 
> daemon has died.
>
>
>     Scaling NFS servers in a write-intensive environment is a very 
> hard task.  You end up doing all sorts of crazy things to avoid any 
> kind of lock creation (much less contention).
>
>     Proper cluster-aware filesystems avoid these kinds of issues, and 
> make it much easier to scale the systems involved.  However, as you 
> noted, they are expensive.  The question becomes how much is your time 
> worth, and how much do you lose when everything goes Tango-Uniform 
> (T**ts-Up)?  Here, you've got to look not only at your direct loss of 
> revenue, but also the cost of lost opportunities.
>
>
>     There's a reason why cluster filesystems are so expensive -- this 
> is hard to get right.  Moreover, people will pay big money for those 
> applications which *do* get it right.  They've done the cost/benefit 
> analysis and they figured out that if it takes one of their engineers 
> an extra month to build the system, their MTBF is 1/100th what it 
> would be, and their time/cost to repair is higher due to the custom 
> nature of the solution, then the stuff pays for itself in the first 
> outage -- or the first outage that they avoid.
>
>     I *still* haven't seen anything to compare with VaxCluster 
> solutions in this field that were created something like fifteen or 
> twenty years ago.  Some of those things are still running, for that 
> reason.
>
>
>     Don't get me wrong, NFS is great.  But if you're trying to build a 
> scalable network solution, it can be a very poor choice, depending on 
> the application.

Thanks for the great info Brad - saved me a lot of running around... 
luckily, I have a few weeks up my sleeve before I need to get this 
sorted out, so I might play around with NFS and see what occurs. If NFS 
does fail, I'll investigate running Mailman out of one of the two app 
servers, and keeping an rsynced backup somewhere else for redundancy. 
GFS or Veritas sound like just what I need, but I can visualise my 
manager's lips screaming the word 'NO...' when I request the money 8-)

Cheers,
Guy.