[Mailman-Developers] Problem with MM after power outage
Simone Piunno
pioppo at ferrara.linux.it
Wed Aug 20 01:45:23 EDT 2003
On Wednesday 20 August 2003 00:06, you wrote:
> > ok, but how do you make sure the file is really on disk instead of, e.g.,
> > half on disk and half on cache?
>
> We close the file before we rename it.
This would be ok if the underlying operating system flushed the disk cache
upon close(), but I'm afraid this is not the case (at least on linux).
This is from man 2 close:
A successful close does not guarantee that the data has been success-
fully saved to disk, as the kernel defers writes. It is not common for
a filesystem to flush the buffers when the stream is closed. If you
need to be sure that the data is physically stored use fsync(2). (It
will depend on the disk hardware at this point.)
This behaviour is declared conforming to SVr4, SVID, POSIX, X/OPEN, BSD 4.3.
Therefore I believe the problem reported here happens in this way:
1. mailman writes the tmp file, closes it and the atomically renames.
this is atomically from userland point of view (e.g. applications will
see the file instantly changed)
2. under the hood, the operating system is running a disk cach to speed
up file operations, therefore what really happened is the file has been
written to some RAM pages but not yet on disk.
3. at some later time, the disk cache is copied from RAM to disk, effectively
making changes permanent. This copy is not atomic, e.g. files bigger than
4k will be written in chunks of 4k pages.
A power interruption (or OS crash, or any other unclean shutdown) in phase 2
could lead to a lost transaction (e.g. the file will appear as never
overwritten, like phase 1 never happened).
A power interruption (or OS crash, or any other unclean shutdown) happening in
phase 3 could lead to a corrupted file (e.g. some pages written to disk, some
pages not).
MTAs usually provide a configuration setting to enable cache flush for each
transaction (by use of fsync()), but this is disabled by default because of
the severe impact in performance.
Use of BerkeleyDB (or similar transactional db libraries) could eliminate the
problem of corrupted files without the need to fsync, but to solve the
problem in phase 2 we need to guarantee at application level that loosing a
file won't make dangling references or bad states in the related data we
stored elsewhere. Worst case, when restarting after power outage we should
check for transactions to be cancelled because the related file is not on
disk.
An example could be: we put a message on hold for moderation, therefore we
- save the message in a file (or rename from the previous location)
- update the moderation queue index in MailList
- Save() the list config pickle
If the system goes down now because of a power outage, when restarting we
could have (even fsync()ing everything):
- the index has been regularly updated
- the message is not on disk, or it's in a different filename/path
this can happen because actual writes on disk can be reordered by the OS, for
performance reasons.
Accessing the admindb panel now could potentially lead to exceptions.
Now, everyone who is serious about administering a server has a big and
dependable UPS, automatically triggering clean shutdowns and so on, therefore
everything I've described is not as much as a problem.
--
[pioppo at abulafia pioppo]$ man women
No manual entry for women
More information about the Mailman-Developers
mailing list