[Mailman-Developers] Improving the archives

Stephen J. Turnbull turnbull at sk.tsukuba.ac.jp
Wed Jul 25 05:04:23 CEST 2007


Jeff Breidenbach writes:

 > >So we just specify a header to put it in, and subscribers will be able
 > >to use it, per definition of a canonical URL.
 > 
 > It is the archive server's job to decide what is the "canonical" URL
 > for a message. There's a good chance these archival URLs will be
 > served by an HTTP redirect. So let's not use the word canonical. :)

If it's not going to be "canonical" (I forget if there's a standard
for that word :), what is the point in writing an RFC?

 > >What complexity?  Mailman just does
 > >
 > >  msg['X-List-Archive-Received-ID'] = Email.msgid()
 > 
 > Easy to introduce, harder to deal with. The archival server would now
 > keep track of both the message-id and the x-list-archive-received-id.
 > That's two namespaces that almost do the same thing.

The implementations are similar, and there is "nearly" a one-to-one
correspondence.  But the semantics are very different.  Message-ID is
untrustworthy, the internal ID is trustworthy.

 > So for these reasons, I'd rather stick with message-id and risk
 > some real world collisions, instead of introduce another identifier.

Go ahead and stick with message-id if *you* like, but please don't
tell *me* what risks I have to accept.

There needs to be a way to *enforce* uniqueness, and it *must* be
specified by the RFC in order for archive implementations to be
interoperable.  Note that word "specify"; I do not insist that this
level of robustness be *required*.  But if we don't specify it now,
people who want such robustness will have to do all this work again,
and possibly will end up with something that some servers conforming
to "your" RFC will not conform to.

It is possible that most archivers will simply use the message ID, and
do something brutal in the rare case of a collision.  That's fine.
But an archiver that wants to provide a canonical URL which is
guaranteed to uniquely and losslessly identify a post in its archive
should have a standard way to do that.

 > The main thing that bugs me is message-ids are long, which makes
 > them awkward to embed in a URL in the footer of a message.

The footer URL is of no concern in this discussion.  There is not
going to be a requirement that footer URLs be "canonical", not if I
have any say in the matter.  The "canonical" URL will be in (or be
constructed from) the message header.



More information about the Mailman-Developers mailing list