[Mailman-Developers] Requirements for a new archiver

J C Lawrence claw at kanga.nu
Wed Oct 29 23:01:01 EST 2003


On Thu, 30 Oct 2003 04:45:37 +0100 
Brad Knowles <brad.knowles at skynet.be> wrote:
> At 10:27 PM -0500 2003/10/29, J C Lawrence wrote:

>> Actually the two cases are considerably different.  In the delete
>> case I have to do pool management, with some eye toward fragmentation
>> control and optimisations of average latency for free heap searches,
>> as well as heap integrity audits.  In the write-only case I just
>> build on the end and need pay no mind to prior data once it is
>> allocated.

> Not really.  You still have to maintain all the indexes, make sure
> that if things get moved around that all the links get updated,
> etc....  

With a write-once system you don't actually need to ever move anything.
At its core it is: Open one file, repetitively append to end until file
size exceeds size N, create new file, repeat.  You can do object size
clustering across files or other optimisation techniques, but the basic
pattern remains the same.  For the few cases you have to support delete
you either just NULL the byte stream for the pointed-to object, or you
invalidate the key.  As the frequency and number of such deletes is
infinitesimal, they require no special management complexity.  You can
afford to just swallow the lost free space as the cost of attempting to
manage it is simply never rewarded.

> True, you don't have to worry about fragementation control or other
> more complex aspects of heap management, but that's a further cost
> savings over other techniques and not a "drawback" to using this
> technique for this purpose.

True.  I'm not lableing it a drawback, just a boon of dubious advantage.

> Now, if you want to consider what would happen to you if the
> Scientologists ever came after you, or if you had court orders to
> remove postings that linked to bomb-making instructions, you'd
> probably want to keep all those other tools related to heap management
> around anyway.  

Not really.  The percentage of such deleted posts over the lifetime of
the store can be generally assumed to be less than 1 in 10^5, and is
probably considerably lower, if not in the 1:10^8 range.  Add a simple
invalid key semantic and you're done.

  Caveat: Continual addition and deletion of SPAM from an archive would
  change this balance.

> They'd be less likely to be used, but at least you wouldn't have to
> take the entire site down while you went and wrote the tools from
> scratch to handle a situation that you had not foreseen.

You're going to need tools when the percentage of such deleted postings
is sufficiently high that the cost of the lost free space and its
overhead exceeds the cost of managing that free space.  That's not a
quick thing.

-- 
J C Lawrence                
---------(*)                Satan, oscillate my metallic sonatas. 
claw at kanga.nu               He lived as a devil, eh?		  
http://www.kanga.nu/~claw/  Evil is a name of a foeman, as I live.



More information about the Mailman-Developers mailing list