[Mailman-Developers] Re: 2006 archives already online!

Barry A. Warsaw barry@digicool.com
Tue, 1 May 2001 00:57:45 -0400


>>>>> "OT" == Owen Taylor <otaylor@redhat.com> writes:

    OT> What I did for the gnome.org archives (using mhonarc plus
    OT> custom perl) is to used the Received: header for the date.

Ah, but which one? :)  There's going to have a Received: header for
each hop that message takes.  By the time your message got to me, it
had 7 Received: headers, and 3 (I think) by the time it reached
Mailman.

    OT> Which is, almost always, quite close to the time the person
    OT> actually sent it, and assuming that your local server's time
    OT> isn't screwed up (which is a much bigger problem...) does
    OT> not have the 2004 problem.

    OT> And it has the advantage over clobber_date of:

    |  - Not munging the mail

True, with the disadvantage that if you use an external archiver,
it'll have to handle checking for outrageous dates.  clobber_date
munges the message before it hits either archiver (Pipermail or
external).  If I was smart, I'd also count as a major disadvantage the
fact that I'll have to track down all the places where the Date:
header is used in Pipermail, and I /hate/ diving in that code. ;(

    |  - Not being skewed by moderation delays

Dang, yep, but fixable.

    |  - Being independent of the archiving process, so if you
    |    import a bunch of old mail with incorrect Date: lines
    |    into the archiving process you still get the 2004 
    |    protection.

True, with the caveat above.

This would be a reasonable option, however if you use the most recent
Received: header, won't you still be subject to local server clock
skew?  And if you use the earliest Received: you'll be subject to the
same bogosity in the Date: header.  Or do you just start parsing the
Received:'s back from the most recent and take the first sane one you
find?

-Barry