[Mailman-Developers] Re: Mailman - list corruption - problem found

David Birnbaum davidb at pins.net
Wed Aug 13 20:02:53 EDT 2003


Folks,

I found the problem, after grabbing a truss of the web process.  It turns
out the resident process size became 30M, which turned out to be the
RLimitMEM on our web server by default.  It was sporadic because the list
was *just* under the limit, and it would only fail if it couldn't lock the
list quickly enough (minor memory leak there, perhaps?)

Since python grabs the Err#12 ENOMEM, it would be nice to propagate that
one back if possible!

Sorry for the false alarm!

Side question, though - why *ARE* the locks made "in the future"?

Thanks,

David.

-----

On Wed, 13 Aug 2003, David Birnbaum wrote:

> Howdy,
>
> I just had a list corruption today - twice - with no smoking gun in the
> log files that indicated the cause of the problem, just that the list
> database became corrupted (see error messages below).  It's a list of
> about 22,000 people at the moment.
>
> I was able to dump the list members and list configuration, delete the
> list, and recreate it.  However, it became corrupted again a few hours
> later, although not as badly.  In the first corruption, the list was
> unusable by the web interface.  The second time, it "fixed" itself, in
> that the web interface started working again.  I investigated further, and
> I discovered a lot of broken locks (which seems odd, given a lifetime of
> five hours by default):
>
> Aug 13 08:44:14 2003 (911) thebody.lock lifetime has expired, breaking
> Aug 13 11:14:07 2003 (911) thebody.lock lifetime has expired, breaking
> Aug 13 12:40:41 2003 (911) thebody.lock lifetime has expired, breaking
> Aug 13 15:11:50 2003 (911) thebody.lock lifetime has expired, breaking
> Aug 13 15:58:56 2003 (911) thebody.lock lifetime has expired, breaking
> Aug 13 16:26:56 2003 (911) thebody.lock lifetime has expired, breaking
>
> Not sure if this has anything to do with it, but it's the only unusual
> thing.  I also noted that the dates on the locks are often in the future;
> perhaps this is by design, it's certainly very strange.  The locks are
> appearing/disappearing appropriately, as far as I can tell, and nothing
> strange is showing up otherwise.
>
> I've stopped and restarted the master mailman process, and dumped/restored
> the list again, and all appears well.  Any suggestions on where to start?
> I'm running the latest mailman 2.1.2, on Solaris 2.8, with Python 2.2.2.
> Haven't had any problems with the machine or memory that have showed up in
> the logs.
>
> Thanks in advance,
>
> David.
>
> -----
>
> Aug 13 17:01:40 2003 (22680) couldn't load config file /home/mailman/lists/thelist/config.pck
> Aug 13 17:01:40 2003 (22680) couldn't load config file /home/mailman/lists/thelist/config.pck.last
> Aug 13 17:01:40 2003 (22680) couldn't load config file /home/mailman/lists/thelist/config.db
> [Errno 2] No such file or directory: '/home/mailman/lists/thelist/config.db'
> Aug 13 17:01:40 2003 (22680) couldn't load config file /home/mailman/lists/thelist/config.db.last
> [Errno 2] No such file or directory: '/home/mailman/lists/thelist/config.db.last'
> Aug 13 17:01:40 2003 (22680) All thelist fallbacks were corrupt, giving up
> Aug 13 17:01:40 2003 admin(22680): @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
> admin(22680): [----- Mailman Version: 2.1.2 -----]
> admin(22680): [----- Traceback ------]
> admin(22680): Traceback (most recent call last):
> admin(22680):   File "/home/mailman/scripts/driver", line 87, in run_main
> admin(22680):     main()
> admin(22680):   File "/home/mailman/Mailman/Cgi/admin.py", line 162, in main
> admin(22680):     mlist.Lock()
> admin(22680):   File "/home/mailman/Mailman/MailList.py", line 159, in Lock
> admin(22680):     self.Load()
> admin(22680):   File "/home/mailman/Mailman/MailList.py", line 598, in Load
> admin(22680):     raise Errors.MMCorruptListDatabaseError, e
> admin(22680): MMCorruptListDatabaseError: [Errno 2] No such file or directory: '/home/mailman/lists/
> thelist/config.db.last'
> admin(22680): [----- Python Information -----]
> admin(22680): sys.version     =   2.2.2 (#1, Feb  4 2003, 14:15:12)
> [GCC 3.2.1]
> admin(22680): sys.executable  =   /usr/local/bin/python
> admin(22680): sys.prefix      =   /opt/python/2.2.2
> admin(22680): sys.exec_prefix =   /opt/python/2.2.2
> admin(22680): sys.path        =   /opt/python/2.2.2
> admin(22680): sys.platform    =   sunos5
> admin(22680): [----- Environment Variables -----]
> admin(22680):   HTTP_COOKIE: thelist+admin=28020000006988523a3f7328000000633165626261613432313637316
> 66631633863613437326636386435353365623233326666623938
> admin(22680):   SERVER_SOFTWARE: Apache/1.3.26 (Unix) mod_ssl/2.8.10 OpenSSL/0.9.6c
> admin(22680):   PYTHONPATH: /home/mailman
> admin(22680):   SCRIPT_FILENAME: /home/mailman/cgi-bin/admin.cgi
> admin(22680):   SERVER_ADMIN: webmaster at chelsea.net
> admin(22680):   SCRIPT_NAME: /cgi-bin/admin.cgi
> admin(22680):   REQUEST_METHOD: GET
> admin(22680):   HTTP_HOST: mailman.chelsea.net
> admin(22680):   PATH_INFO: /thelist
> admin(22680):   SERVER_PROTOCOL: HTTP/1.1
> admin(22680):   QUERY_STRING:
> admin(22680):   TZ: US/Eastern
> admin(22680):   REQUEST_URI: /cgi-bin/admin.cgi/thelist
> admin(22680):   HTTP_ACCEPT: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, application/msword
> , application/vnd.ms-excel, application/vnd.ms-powerpoint, application/x-shockwave-flash, */*
> admin(22680):   HTTP_USER_AGENT: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)
> admin(22680):   HTTP_CONNECTION: Keep-Alive
> admin(22680):   SERVER_NAME: mailman.chelsea.net
> admin(22680):   REMOTE_ADDR: 209.212.72.130
> admin(22680):   REMOTE_PORT: 5086
> admin(22680):   HTTP_ACCEPT_LANGUAGE: en-us
> admin(22680):   PATH_TRANSLATED: /home/mailman/htdocs/thelist
> admin(22680):   SERVER_PORT: 80
> admin(22680):   GATEWAY_INTERFACE: CGI/1.1
> admin(22680):   HTTP_ACCEPT_ENCODING: gzip, deflate
> admin(22680):   SERVER_ADDR: 209.212.66.37
>



More information about the Mailman-Developers mailing list