[Mailman-Developers] FYI -- problems with my new install...

J C Lawrence claw@kanga.nu
Sun, 29 Oct 2000 16:47:11 -0800


On Sun, 29 Oct 2000 16:22:57 -0800 
Chuq Von Rospach <chuqui@plaidworks.com> wrote:

> Be just, and fear not.

Sheesh, you're having more fun than I am.  (Very nice job on the
site there BTW)  

Latest status from me on the security stuff -- which happily points
away from Mailman and Python:

I started out by installing a firmware-based watchdog that would
auto-reboot the system when it locked (these are the references to
IPMI and EMP (Intel-specific things) -- basically the mainboard
firmware, running out of ASICs and not off the CPU, will hit the
power reset line if it doesn't get touched every 30 seconds).

I have been driving myself silly trying to figure out what is
causing the instability with Kanga.Nu, and getting nowhere.  There's
36Gig of disk on that box and I've near bit-walked thru every byte
in there.  A few minutes ago I opened a dozen plus terms on the box,
all tailing various logfiles or spinning on displaying various
system stats (mostly /proc stuff), plus an extra term on the
localhost pinging the target to see (nearly) exactly it went down
(if it did).  I then approved two posts for posting on a ~1K member
list.  The posts went thru Mailman, hit the MTA, were fully received
by the MTA, everything was looking normal and the queue runners were
busy delivering copies and every stat and logfile on the system
looked happy.

Then the ping stopped.  Every system stat I was reporting on
(memory, everything) looked good -- but the machine was down.
Reading thru the dozen or so terms and what they were reporting as
of the instant the box died revealed, well, nothing.  Everything
looked very very good -- except for the fact that it was now dead.

A few minutes later, after some time for the EMP watchdog to kick in
(30 seconds), and the longest POST in recorded history (Intel
Nightshade MB has a multi-minute-long POST that cannot be
shortened), back up it popped.

I'm now thinking that I may have bad memory in the box.  Certainly
physical RAM failure on one of the upper sticks could account for
such seemingly spontaneous locks.  

Oh joy.  I just love hardware problems.

-- 
J C Lawrence                                 Home: claw@kanga.nu
---------(*)                               Other: coder@kanga.nu
http://www.kanga.nu/~claw/        Keys etc: finger claw@kanga.nu
--=| A man is as sane as he is dangerous to his environment |=--