[Mailman-Developers] Patch for HyperArch

Mark Sapiro mark at msapiro.net
Thu Mar 10 13:10:08 EST 2016


On 03/10/2016 03:19 AM, Sebastian Hagedorn wrote:
> 
> Unless you're really interested in the other differences you referred to
> in your other message, I won't bother to analyze them further. It seems
> clear to me that you have identified the main issue.


I understand the issue, and I know how to "fix" it.

I'm a bit uncertain about what to change a bad date to. Normally,
messages in the cumulative .mbox have at least three sources of date.
There is a Date: header, The mbox From_ separator line, and at least if
the message originally came via Mailman, an X-List-Received-Date: header
that was added by Mailman's ArchRunner when the message was archived.

Also, depending to an extent on site configuration, if the message was
originally archived by Mailman, it's archived Date: header will normally
be "close" to the time it was received by Mailman. See the code in the
_dispose() method in Mailman/Queue/ArchRunner.py.

So what this says is if a message in the mbox has a bad Date:, it is
probably from an imported mbox, and it's not clear that the From_ date
will be any better.

In the messages and excerpts you posted earlier, the From_ dates were
all within a few minutes of "Mon Nov  7 14:08:46 2005" which is probably
the time that portion of the mbox was built from a majordomo archive.

I have made a script at <https://www.msapiro.net/scripts/cleanarch2>
(mirrored at <http://fog.ccsf.edu/~msapiro/scripts/cleanarch2>) which
augments the standard bin/cleanarch script to also replace Date: headers
with the date from From_ if they differ by more than
mm_cfg.ARCHIVER_ALLOWABLE_SANE_DATE_SKEW (default = 15 days).

This may be sufficient. If you run it with the -n option against your
mbox, it will report the line #s of the bad dates, what they are and
what they would be changed to.

For the actual "fix", my inclination is to modify the _set_date method
in pipermail.py (this is called from Hyperarch.py as
self.__super_set_date(message) just before it does self.fromdate =
time.ctime(int(self.date)).

I would have this check the date and if it's not within say 50 years of
now, replace the date with something reasonable. My question at this
point is what's that something reasonable. I think it comes down to a
choice between the From_ date if that's reasonable or the current date,
but I don't know which is better.

Does anyone have an idea?

-- 
Mark Sapiro <mark at msapiro.net>        The highway is for gamblers,
San Francisco Bay Area, California    better use your sense - B. Dylan


More information about the Mailman-Developers mailing list