[Mailman-Developers] Memory pinned in ram, with huge lists

Jesus Cea jcea at jcea.es
Fri Dec 12 07:01:45 CET 2008


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I experienced huge mailman queue processes, with ram usage of 700MB and
more for each one of the six queue workers. I have several big mailing
lists. One of about 180.000 subscriber, and other of about 70.000.

Debugging the issue I found:

1. Each queue worker touching a message will load the entire mailing
list database in RAM. So, the RAM used by each worker is the sum of all
mailing lists in the system (if all of them have traffic). This is a big
issue if you have huge mailing lists.

2. The list data is keep in memory using a cache managed via weak
references. But the cache is never evicted, so there is a hard reference
out there, somewhere.

3. I found a memory reference cycle between a Mailing list and its
OldStyleMemberships component, linked via "self._memberadaptor". This
cycle keeps the mailing list alive and, so, the cache never evicted the
data.

I changed the OldStyleMemberships constructor to:

"""
class OldStyleMemberships(MemberAdaptor.MemberAdaptor):
    def __init__(self, mlist):
        import weakref
        self.__mlist = weakref.proxy(mlist)
"""

to keep only a weak reference to the mailing list, breaking the cycle.

Now, when a worker is done with a mailing list, the cache is correctly
evicted.

Since python doesn't give back memory to system, the consequence of this
change is:

1. Now, memory used by each worker is proportional to the size of the
biggest mailing list, instead of the sum of all mailing list sizes. Not
perfect, but a huge improvement is you have some big lists.

2. Now, since cache in evicted frequently, mailing list data must be
reloaded every time. This is a performance hit, but my mailing list are
huge but with little traffic (maybe a couple of mails per week), so this
is a non issue for me.

I would suggest to separate the subscriber info from the rest of the
mailing metadata, since most workers doesn't need the subscriber data in
RAM to do its work. So, instead of 6 processes eating RAM, only of them
(the outgoing worker) will use significant memory. In fact, mailing list
subscribers could be splitted in several files, to avoid to load the
entire membership at once. Let say, use 256 files and putting each
subscriber in a file according to the last significant byte of its MD5
hash, for instance.

Studying the code, it seems easy to migrate membership to a separate
persistence system (let say, ZODB, Durus) or use a backend like sqlite.
Any plan for that?. Any interest in patches?.

- --
Jesus Cea Avion                         _/_/      _/_/_/        _/_/_/
jcea at jcea.es - http://www.jcea.es/     _/_/    _/_/  _/_/    _/_/  _/_/
jabber / xmpp:jcea at jabber.org         _/_/    _/_/          _/_/_/_/_/
.                              _/_/  _/_/    _/_/          _/_/  _/_/
"Things are not so easy"      _/_/  _/_/    _/_/  _/_/    _/_/  _/_/
"My name is Dump, Core Dump"   _/_/_/        _/_/_/      _/_/  _/_/
"El amor es poner tu felicidad en la felicidad de otro" - Leibniz
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iQCVAwUBSUH+RJlgi5GaxT1NAQJq7AQAm5tbsJQL2zqLFJlHLvha9RUnguzEYKRW
tS2LkHkZbmcFFXrYLswfl9Qn20x9FPA9iWN/j9hwh8YK3j7o0sdwS2Yll/44A8NX
4OtfYeOto4aIbYd8VWYa5RPe7ebSYwypkEvbH/FJRt8nDIEvLkr0t9iB7tQ42MsN
z+ssg6D6DF4=
=yOKL
-----END PGP SIGNATURE-----


More information about the Mailman-Developers mailing list