[Mailman-Developers] Problem with MM after power outage

Wed Aug 20 01:45:23 EDT 2003

On Wednesday 20 August 2003 00:06, you wrote:

> > ok, but how do you make sure the file is really on disk instead of, e.g.,
> > half on disk and half on cache?
>
> We close the file before we rename it.

This would be ok if the underlying operating system flushed the disk cache 
upon close(), but I'm afraid this is not the case (at least on linux). 
This is from man 2 close:

  A  successful  close does not guarantee that the data has been success-
  fully saved to disk, as the kernel defers writes. It is not common  for
  a  filesystem  to  flush  the buffers when the stream is closed. If you
  need to be sure that the data is physically stored use  fsync(2).   (It
  will depend on the disk hardware at this point.)

This behaviour is declared conforming to SVr4, SVID, POSIX, X/OPEN, BSD 4.3.

Therefore I believe the problem reported here happens in this way:

 1. mailman writes the tmp file, closes it and the atomically renames.
    this is atomically from userland point of view (e.g. applications will
    see the file instantly changed)
 2. under the hood, the operating system is running a disk cach to speed
    up file operations, therefore what really happened is the file has been
    written to some RAM pages but not yet on disk.
 3. at some later time, the disk cache is copied from RAM to disk, effectively
    making changes permanent.  This copy is not atomic, e.g. files bigger than
    4k will be written in chunks of 4k pages.

A power interruption (or OS crash, or any other unclean shutdown) in phase 2 
could lead to a lost transaction (e.g. the file will appear as never 
overwritten, like phase 1 never happened).

A power interruption (or OS crash, or any other unclean shutdown) happening in 
phase 3 could lead to a corrupted file (e.g. some pages written to disk, some 
pages not).

MTAs usually provide a configuration setting to enable cache flush for each 
transaction (by use of fsync()), but this is disabled by default because of 
the severe impact in performance.

Use of BerkeleyDB (or similar transactional db libraries) could eliminate the 
problem of corrupted files without the need to fsync, but to solve the 
problem in phase 2 we need to guarantee at application level that loosing a 
file won't make dangling references or bad states in the related data we 
stored elsewhere.  Worst case, when restarting after power outage we should 
check for transactions to be cancelled because the related file is not on 
disk.
An example could be: we put a message on hold for moderation, therefore we  
 - save the message in a file (or rename from the previous location)
 - update the moderation queue index in MailList
 - Save() the list config pickle
If the system goes down now because of a power outage, when restarting we 
could have (even fsync()ing everything):
 - the index has been regularly updated
 - the message is not on disk, or it's in a different filename/path
this can happen because actual writes on disk can be reordered by the OS, for 
performance reasons.
Accessing the admindb panel now could potentially lead to exceptions.

Now, everyone who is serious about administering a server has a big and 
dependable UPS, automatically triggering clean shutdowns and so on, therefore 
everything I've described is not as much as a problem.

-- 
[pioppo at abulafia pioppo]$ man women
No manual entry for women