From barry at list.org Fri Dec 5 00:39:13 2008 From: barry at list.org (Barry Warsaw) Date: Thu, 4 Dec 2008 18:39:13 -0500 Subject: [Mailman-Developers] [Mailman-checkins] [Branch ~mailman-administrivia/mailman-administrivia/admin] Rev 38: Replaced an obsolete SF link to NEWS with the current LP link. In-Reply-To: <20081204224444.1168.84158.launchpad@forster.canonical.com> References: <20081204224444.1168.84158.launchpad@forster.canonical.com> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Dec 4, 2008, at 5:44 PM, noreply at launchpad.net wrote: > ------------------------------------------------------------ > revno: 38 > committer: Mark Sapiro > branch nick: admin > timestamp: Thu 2008-12-04 14:41:29 -0800 > message: > Replaced an obsolete SF link to NEWS with the current LP link. > modified: > www/features.ht > www/features.html > www/newsite/features/_index.html Pushed. - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Darwin) iEYEARECAAYFAkk4aiEACgkQ2YZpQepbvXFNkgCdFmQ13oI2X4TcYZ16BMdtfdYg Z5cAoJCR5gMsE3u+72qab+HExZUUHtGd =7rlw -----END PGP SIGNATURE----- From mark at msapiro.net Sat Dec 6 22:10:24 2008 From: mark at msapiro.net (Mark Sapiro) Date: Sat, 6 Dec 2008 13:10:24 -0800 Subject: [Mailman-Developers] Duplicate Prevention In-Reply-To: Message-ID: Dan Mahoney, System Admin wrote: > >We've recently taken on mailman to handle many large, popular lists at my >day job, and one strangely-missing feature is the inability to avoid >duplicates when someone cc's a list's old alias and new alias (we also >moved the lists to a subdomain, out from under our primary). This was reposted to mailman-users and answered there. Thread at . >Another great feature would be to have mailman "strip" multiple cc >recipients. I.e. if a message is sent: > >to: list >cc: list-alias(*), another-list-alias(*) > >-or- > >to: person >cc: list, list-alias(*) > >To have these (*) stripped (and prevent the need for this). But that's >more work, and right now the duplicates are a major regression from what >we had before. I assume by list-alias you mean something in the list's acceptable_aliases. If so, it's tricky since the contents of acceptable aliases are really regexp patterns, not simple strings. However, if in your case, all the strings in acceptable_aliases are full addresses all you need is to insert for r in mlist.acceptable_aliases.splitlines(): if ccaddrs.has_key(r.lower()): del ccaddrs[r.lower()] in Mailman.Handlers.AvoidDuplicates.py just ahead of # RFC 2822 specifies zero or one CC header del msg['cc'] if ccaddrs: msg['Cc'] = COMMASPACE.join([formataddr(i) for i in ccaddrs.values()]) at the end of the module. I wouldn't suggest doing this in general, even with a regexp match, because there are bound to be unintended consequences in cases where there is not an explicit reply to list and/or acceptable_aliases has one or more actual regexps. -- Mark Sapiro The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan From jcea at jcea.es Fri Dec 12 07:01:45 2008 From: jcea at jcea.es (Jesus Cea) Date: Fri, 12 Dec 2008 07:01:45 +0100 Subject: [Mailman-Developers] Memory pinned in ram, with huge lists Message-ID: <4941FE49.3030003@jcea.es> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 I experienced huge mailman queue processes, with ram usage of 700MB and more for each one of the six queue workers. I have several big mailing lists. One of about 180.000 subscriber, and other of about 70.000. Debugging the issue I found: 1. Each queue worker touching a message will load the entire mailing list database in RAM. So, the RAM used by each worker is the sum of all mailing lists in the system (if all of them have traffic). This is a big issue if you have huge mailing lists. 2. The list data is keep in memory using a cache managed via weak references. But the cache is never evicted, so there is a hard reference out there, somewhere. 3. I found a memory reference cycle between a Mailing list and its OldStyleMemberships component, linked via "self._memberadaptor". This cycle keeps the mailing list alive and, so, the cache never evicted the data. I changed the OldStyleMemberships constructor to: """ class OldStyleMemberships(MemberAdaptor.MemberAdaptor): def __init__(self, mlist): import weakref self.__mlist = weakref.proxy(mlist) """ to keep only a weak reference to the mailing list, breaking the cycle. Now, when a worker is done with a mailing list, the cache is correctly evicted. Since python doesn't give back memory to system, the consequence of this change is: 1. Now, memory used by each worker is proportional to the size of the biggest mailing list, instead of the sum of all mailing list sizes. Not perfect, but a huge improvement is you have some big lists. 2. Now, since cache in evicted frequently, mailing list data must be reloaded every time. This is a performance hit, but my mailing list are huge but with little traffic (maybe a couple of mails per week), so this is a non issue for me. I would suggest to separate the subscriber info from the rest of the mailing metadata, since most workers doesn't need the subscriber data in RAM to do its work. So, instead of 6 processes eating RAM, only of them (the outgoing worker) will use significant memory. In fact, mailing list subscribers could be splitted in several files, to avoid to load the entire membership at once. Let say, use 256 files and putting each subscriber in a file according to the last significant byte of its MD5 hash, for instance. Studying the code, it seems easy to migrate membership to a separate persistence system (let say, ZODB, Durus) or use a backend like sqlite. Any plan for that?. Any interest in patches?. - -- Jesus Cea Avion _/_/ _/_/_/ _/_/_/ jcea at jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/ jabber / xmpp:jcea at jabber.org _/_/ _/_/ _/_/_/_/_/ . _/_/ _/_/ _/_/ _/_/ _/_/ "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ "El amor es poner tu felicidad en la felicidad de otro" - Leibniz -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iQCVAwUBSUH+RJlgi5GaxT1NAQJq7AQAm5tbsJQL2zqLFJlHLvha9RUnguzEYKRW tS2LkHkZbmcFFXrYLswfl9Qn20x9FPA9iWN/j9hwh8YK3j7o0sdwS2Yll/44A8NX 4OtfYeOto4aIbYd8VWYa5RPe7ebSYwypkEvbH/FJRt8nDIEvLkr0t9iB7tQ42MsN z+ssg6D6DF4= =yOKL -----END PGP SIGNATURE----- From barry at list.org Fri Dec 12 17:06:27 2008 From: barry at list.org (Barry Warsaw) Date: Fri, 12 Dec 2008 11:06:27 -0500 Subject: [Mailman-Developers] Memory pinned in ram, with huge lists In-Reply-To: <4941FE49.3030003@jcea.es> References: <4941FE49.3030003@jcea.es> Message-ID: <7354445D-F7DD-4B4A-9513-0D61DB6C5CDC@list.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Dec 12, 2008, at 1:01 AM, Jesus Cea wrote: > Studying the code, it seems easy to migrate membership to a separate > persistence system (let say, ZODB, Durus) or use a backend like > sqlite. > Any plan for that?. Any interest in patches?. Yes, but not in Mailman 2. It's in Mailman 3 by default and any code that helps that branch get further along will be greatly appreciated. FWIW, I am planning on another alpha release before the end of the year. My intent is to have the system working and usable without a web ui, but possibly with the administrative REST interface we'd been talking about. You're analysis is essentially correct. For Mailman 2.2, I think adding the weakref would be fine in principle, but the more invasive data store changes would not be. Much better to get Mailman 3 out the door with its real database backend. - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Darwin) iEYEARECAAYFAklCjAMACgkQ2YZpQepbvXGELgCeLRYuuovefsgt5WgAVpRZh3R7 sRQAniPRBQ9vvQ/Wgng6lbHMVYZW04NY =Y2bn -----END PGP SIGNATURE----- From mark at msapiro.net Fri Dec 12 21:39:55 2008 From: mark at msapiro.net (Mark Sapiro) Date: Fri, 12 Dec 2008 12:39:55 -0800 Subject: [Mailman-Developers] Memory pinned in ram, with huge lists In-Reply-To: <4941FE49.3030003@jcea.es> References: <4941FE49.3030003@jcea.es> Message-ID: <4942CC1B.9080606@msapiro.net> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Jesus Cea wrote: > > 3. I found a memory reference cycle between a Mailing list and its > OldStyleMemberships component, linked via "self._memberadaptor". This > cycle keeps the mailing list alive and, so, the cache never evicted the > data. > > I changed the OldStyleMemberships constructor to: > > """ > class OldStyleMemberships(MemberAdaptor.MemberAdaptor): > def __init__(self, mlist): > import weakref > self.__mlist = weakref.proxy(mlist) > """ > > to keep only a weak reference to the mailing list, breaking the cycle. Thanks very much for your efforts in debugging this. > 2. Now, since cache in evicted frequently, mailing list data must be > reloaded every time. This is a performance hit, but my mailing list are > huge but with little traffic (maybe a couple of mails per week), so this > is a non issue for me. The use of the cache has been changed for 2.2. See the full thread at for more information. In 2.2, the cache will be less effective anyway, and the impact doesn't seem too severe. I am going to implement your change to OldStyleMemberships for 2.2. I'm almost inclined to drop the cache all together as I think with the 2.2 logic, hits may be rare. In theory, the logic can avoid a second read of the pickle if the runner first instantiates the list unlocked and subsequently locks it, but I suspect this normally happens in the same clock second so the second read wouldn't be avoided anyway. - -- Mark Sapiro The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (MingW32) iD8DBQFJQswbVVuXXpU7hpMRApSoAKDlxigg49X9N+JiQN2QFwjQvySDzACgrUcZ JG6h+E9bm29rY/GbriGbSpw= =rZiy -----END PGP SIGNATURE----- From mark at msapiro.net Sun Dec 14 00:38:56 2008 From: mark at msapiro.net (Mark Sapiro) Date: Sat, 13 Dec 2008 15:38:56 -0800 Subject: [Mailman-Developers] Memory pinned in ram, with huge lists In-Reply-To: <4942CC1B.9080606@msapiro.net> References: <4941FE49.3030003@jcea.es> <4942CC1B.9080606@msapiro.net> Message-ID: <49444790.1020306@msapiro.net> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Mark Sapiro wrote: > Jesus Cea wrote: > >> I changed the OldStyleMemberships constructor to: > >> """ >> class OldStyleMemberships(MemberAdaptor.MemberAdaptor): >> def __init__(self, mlist): >> import weakref >> self.__mlist = weakref.proxy(mlist) >> """ > >> to keep only a weak reference to the mailing list, breaking the cycle. > > > Thanks very much for your efforts in debugging this. > > >> 2. Now, since cache in evicted frequently, mailing list data must be >> reloaded every time. This is a performance hit, but my mailing list are >> huge but with little traffic (maybe a couple of mails per week), so this >> is a non issue for me. > > > The use of the cache has been changed for 2.2. See the full thread at > > for more information. In 2.2, the cache will be less effective anyway, > and the impact doesn't seem too severe. > > I am going to implement your change to OldStyleMemberships for 2.2. I'm > almost inclined to drop the cache all together as I think with the 2.2 > logic, hits may be rare. In theory, the logic can avoid a second read of > the pickle if the runner first instantiates the list unlocked and > subsequently locks it, but I suspect this normally happens in the same > clock second so the second read wouldn't be avoided anyway. There is a problem with the suggested change. If we are running under Python 2.6, the creation of the proxy self.__mlist = weakref.proxy(mlist) Produces two of the following messages Exception RuntimeError: 'maximum recursion depth exceeded while calling a Python object' in ignored These do not occur with Python 2.5.1. Presumably these are due to something within the structure of the list object itself and possibly render some parts of the list object unavailable to OldStyleMemberships via the proxy object. So far, I haven't identified an operational problem due to this, but because of this and the other considerations mentioned above, I'm now inclined to just abandon the list cache in Runner.py - -- Mark Sapiro The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (MingW32) iD8DBQFJREeQVVuXXpU7hpMRAsXyAJ9mRNP2jArqQoHLzX4DUoDkBeNXzQCcDruz FsRzPpaYptdRVcfZ5VXBy5Q= =sDrD -----END PGP SIGNATURE-----