[Mailman-Developers] Is the config.pck.last logic correct?

Les Niles les at 2pi.org
Wed Feb 4 14:09:39 EST 2004


Our list server had another crash the other day, this time it
really toasted a couple of lists. :(  (No, we hadn't yet done any
of the mitigation steps that we should've, at least none that
worked....)

What happens is that some of the config.pck files get trashed by
having the last part of the file overwritten with nul bytes.  I'm
assuming that it's a filesystem corruption causing this, perhaps
involving disk hardware errors.  By the time the problem is
apparent, the config.pck and config.pck.last files are both trashed
-- they're identical, with identical timestamps.  I've looked at
the logic for loading and saving config.pck, and don't see how this
can happen.  It seems like config.pck.last gets replaced only when
the list data is saved, which should only happen at some point
after the list data is successfully loaded.  So there should be
good data to generate the config.pck, otherwise config.pck.last
should be left alone.  But there seems to be some flaw in the
logic, that I can't see, because both files are ending up trashed.
Then again, I've clearly demonstrated an overwhelming stupidity by
letting these crashes happen many times until finally something
really nasty occurred, so maybe I'm just too stupid to look at the
code.

Or maybe something entirely different is happening.  If the
pickle-save itself is corrupted in a way that isn't being caught,
then I suppose the bad config.pck will happily by turned into an
equally bad config.pck.last.

Obviously I don't have a reproducible test case for this, but maybe
someone has some idea of what's going on, and how to improve the
robustness. 

  -les



More information about the Mailman-Developers mailing list