[Mailman-Developers] Updated dupe removal patch

Ben Gertzfield che@debian.org
Tue, 05 Mar 2002 10:49:13 +0900


>>>>> "Marc" == Marc MERLIN <marc_news@vasoftware.com> writes:

    Barry> Let's ignore the duplicate or missing Message-ID: issue for
    Barry> now.  The biggest problem I see is that 1) you lose all the
    Barry> mappings if you restart your IncomingRunner ...

    Marc> That's probably not a problem because 1) it would only
    Marc> affect a message being processed at the time you kill and
    Marc> restart IncomingRunner, not very likely, and worst case, you
    Marc> do get a second copy.  2) You don't restart IncomingRunner
    Marc> often if at all 3) When you do restart qrunner, there can be
    Marc> other qwirks, like a message being delivered twice (I've
    Marc> seen this with VERP enabled, I probably killed it while it
    Marc> was delivering a batch to exim, so it didn't complete and
    Marc> did it all over again after the restart)

Marc has summed up all my comments here -- the worst thing that can
happen if IncomingRunner is restarted is that a duplicate is sent,
which is what we do now without the patch.

    Barry> 2) your process will grow without bounds until you do
    Barry> restart your IncomingRunner.

    Marc> I think you're right. You'd have to have a lot of traffic
    Marc> before it catches up with you, but it will eventually if you
    Marc> never restart qrunner.

Yeah.. I knew about this, but I think my setup had a cron job to
restart the runner daily.  Not at all an optimal solution, just
a hack to re-implement the /etc/aliases style list functionality
where a user belonging to multiple umbrella lists only receives
one copy of any given mail.

    Barry> I'm not sure about the best thing to do.  Sticking this
    Barry> data structure in the list, or otherwise making it
    Barry> persistent, could take too much resources for not much
    Barry> gain.  The second issue is more important, especially given
    Barry> that all our runners are now long running processes, and I
    Barry> think most of the unbounded memory growth issues are taken
    Barry> care of.  Probably the best thing to do is to evict any
    Barry> entry in the dictionary that's older than a day or two.

    Marc> That sounds like a reasonable plan.

So, is there functionality in the *Runners to run something on
a regular schedule?  Say, if we clean out the structure once an
hour or so, it should work pretty well.

    Barry> Then again, this whole data structure seems intended to
    Barry> avoid duplicates when lists are crossposted.  It shouldn't
    Barry> be necessary if we just want to filter out duplicates to
    Barry> explicitly named recipients.  Maybe we don't need both
    Barry> features, as the former seems to be much less requested
    Barry> than the latter?

    Marc> That's true. The later is nice for instance when you have
    Marc> threads Cced accross mailman-devel and mailman-users, but
    Marc> having the former by itself would be good already.

I agree, and from what I understand on IRC, this is what ended up
happening.  I will work on making a separate, proper patch for the
in-memory Message-ID cache that has a time to live associated with
each entry.

Ben

-- 
Brought to you by the letters H and X and the number 19.
"It is sad. *Campers* cannot *dance*. Not even a *party*."
Debian GNU/Linux maintainer of Gimp and Nethack -- http://www.debian.org/