[Mailman-Developers] Updated dupe removal patch
Ben Gertzfield
che@debian.org
Tue, 05 Mar 2002 10:49:13 +0900
>>>>> "Marc" == Marc MERLIN <marc_news@vasoftware.com> writes:
Barry> Let's ignore the duplicate or missing Message-ID: issue for
Barry> now. The biggest problem I see is that 1) you lose all the
Barry> mappings if you restart your IncomingRunner ...
Marc> That's probably not a problem because 1) it would only
Marc> affect a message being processed at the time you kill and
Marc> restart IncomingRunner, not very likely, and worst case, you
Marc> do get a second copy. 2) You don't restart IncomingRunner
Marc> often if at all 3) When you do restart qrunner, there can be
Marc> other qwirks, like a message being delivered twice (I've
Marc> seen this with VERP enabled, I probably killed it while it
Marc> was delivering a batch to exim, so it didn't complete and
Marc> did it all over again after the restart)
Marc has summed up all my comments here -- the worst thing that can
happen if IncomingRunner is restarted is that a duplicate is sent,
which is what we do now without the patch.
Barry> 2) your process will grow without bounds until you do
Barry> restart your IncomingRunner.
Marc> I think you're right. You'd have to have a lot of traffic
Marc> before it catches up with you, but it will eventually if you
Marc> never restart qrunner.
Yeah.. I knew about this, but I think my setup had a cron job to
restart the runner daily. Not at all an optimal solution, just
a hack to re-implement the /etc/aliases style list functionality
where a user belonging to multiple umbrella lists only receives
one copy of any given mail.
Barry> I'm not sure about the best thing to do. Sticking this
Barry> data structure in the list, or otherwise making it
Barry> persistent, could take too much resources for not much
Barry> gain. The second issue is more important, especially given
Barry> that all our runners are now long running processes, and I
Barry> think most of the unbounded memory growth issues are taken
Barry> care of. Probably the best thing to do is to evict any
Barry> entry in the dictionary that's older than a day or two.
Marc> That sounds like a reasonable plan.
So, is there functionality in the *Runners to run something on
a regular schedule? Say, if we clean out the structure once an
hour or so, it should work pretty well.
Barry> Then again, this whole data structure seems intended to
Barry> avoid duplicates when lists are crossposted. It shouldn't
Barry> be necessary if we just want to filter out duplicates to
Barry> explicitly named recipients. Maybe we don't need both
Barry> features, as the former seems to be much less requested
Barry> than the latter?
Marc> That's true. The later is nice for instance when you have
Marc> threads Cced accross mailman-devel and mailman-users, but
Marc> having the former by itself would be good already.
I agree, and from what I understand on IRC, this is what ended up
happening. I will work on making a separate, proper patch for the
in-memory Message-ID cache that has a time to live associated with
each entry.
Ben
--
Brought to you by the letters H and X and the number 19.
"It is sad. *Campers* cannot *dance*. Not even a *party*."
Debian GNU/Linux maintainer of Gimp and Nethack -- http://www.debian.org/