[Mailman-Developers] Improving the archives

Tue Oct 30 23:26:28 CET 2007

   Or Re: [Mailman-Developers 10417] Improving the archives

   I would like to interject and highlight some use cases for stable  
and predictable IDs. For us, "message IDs" are directly used both by  
people and ignorant programs. Our mailing lists serve as a permanent  
and concise record of our discussions, decisions, and operations, and  
we find it invaluable to be able to refer to individual messages in a  
simple and memorable way: "message 1210 in the calibration list", say.  
Other people can then easily jot that info down or directly find the  
message. Some message IDs even become shorthands for a particular  
topic or decision. We have also added trac InterWiki templates  
pointing into our mail archives (as listname:number), which encourages  
desirable cross-referencing (PRs, wiki pages, and SVN change logs can  
refer to mail messages, just as wiki pages could always refer to  
changesets and PRs, etc, etc.)  But trac InterWiki templates can only  
interpolate $1,$2,... arguments into strings, and could not possibly  
calculate anything based on the _content_ of the messages.
   Globally unique IDs, hashed IDs, etc., are very appealing from  
various CS-y and techie points of view, but are simply not memorable  
to humans or knowable by dumb external programs. I think as much, or  
more, effort should be put into delivering a straightforwardly useable  
naming scheme as goes into making an arbitrary message recoverable  
from anywhere.  Basically, "friendly URLs" should be a primary  
requirement, not an optional afterthought for careless geeks like me  
to get wrong later....

   We long ago added an extremely simple ID handoff between MM 2.1.8  
and pipermail, and though imperfect it has served us well. Basically,  
we hijacked the .post_id member in mailman (otherwise basically  
unused, and mysteriously a floating point number); CookHeaders stuffed  
it into a X-Mailman-Sequence-ID header line, and AfterDelivery  
incremented it. In turn, pipermail uses the header to feed a sequence  
ID into make_article, and the message is squirreled away as  
$mailinglist/all/%d.html. There are a few other minor matters (e.g.  
post_id was added to Decorators, a couple of templates were changed,  
we lost having 'ls' sort chronologically [did we have to add .last  
and .prev to the HyperDatabase classes?]), but it really was a minor  
bit of work. And for stability, as long as the archive files aren't  
lost, pipermail rebuilds should yield the same URLs even if junk  
messages have been deleted. [Oh, we did also add a "never rotate"  
policy to our archives, but that is finesseable. ]
   As an aside on other discussions, can you get away without using  
Message-ID or Date? I.e., aren't those just more of those tokens which  
were standardized back before the Internet got tricky enough to  
invalidate the standards? Mailing lists serialize incoming messages,  
and so can generate their own unique and trustworthy IDs. "UUIDs"  
would work, but if you can trust yourself to generate them,  
consecutive integers provide minimal, order-preserving, perfect  
hashing, too!

   Anyhow, we have found that people will enthusiastically refer by  
name to individual messages within mail archives if they can.

  - craig