[spambayes-dev] Strange performancedipandDBRunRecoveryErrorretreat

Sun Jan 4 19:38:09 EST 2004

[Tim]
> I'll note one thing:  somewhere along the line the classifier 
> grew a funky "_post_training()" method.

This is explained here, for anyone interested:

<http://sourceforge.net/tracker/index.php?func=detail&group_id=61702&atid=49
8103&aid=797890>

Basically, Richie added it to help prevent the ham/spam count going to 0
when training was interrupted.

> After adding the line:
> 
>         self.db.sync()
> 
> to the end of _write_state_key(), your hammer.py (as checked 
> in, with the reopen-without-closing business) has run here 
> w/o complaint for a lot longer than it ran before adding the 
> sync() (I typically got a DBRunRecoveryError shortly after 
> the first occurrence of "Re-opening." output before; I've had 
> a few dozen of those go by so far after the change).
> 
> So maybe that's relevant.

I've more-or-less done this too, running hammer.py saving after every
message.  (The difference is that I called store(), which does the
words_changed cache magic as well as sync()).  It (as expected) failed to
trigger the reopen-without-closing bug.

I'm not certain about saving after every message, though.  I have the
feeling that Mark won't like this, too (after all, he recently changed the
plug-in code so that it *didn't* save after every message train).  If we
don't sync after every message, we probably shouldn't write the new state
key, though.  I think we could remove the _post_training() call without harm
(the bug report had the guy using dumbdbm, which isn't possible anymore, and
if we call store() often enough then the state key will be written anyway
(along with a sync)).

[Richie]
> If the browser's gone away, _doSave() will probably *not*
> get called.  And I recall a problem with the UI hanging after
> the browser goes away, so chances are the user will kill
> sb_server and restart it, with the database
> unsaved.  (Hunt, hunt, hunt) yes - I've fixed that problem, but only
> recently, after 1.0a7 went out.

This could certainly be a cause of the problem, then.  It seems unlikely
that all that many people would close the browser before the training was
finished, but then we don't really get all that many reports of the error
these days.  (It would also explain why we
developers-who-know-better-than-to-do-that don't see it).

> It's late and I have to get up early in the morning
> (it's my wife's first day back at work after her maternity
> leave, and little Jenny's first proper day at nursery, so 
> I need plenty of sleep before trying to face all that!).

Good luck with all that! :)

> But as soon as I can I'll see whether I can reproduce the bug
> using urllib/urllib2/timeoutsocket/whatever to simulate an impatient user
> who sets off a train and then disconnects his browser.

Sounds good.  May the crashes be with you.

> Good thinking - I hope you've hit the nail on the head with this...

It was your script that found it, really :)  I'm not convinced that this is
the only way that the RunRecovery error can be triggered, but I am hopeful
that it is one way.  If we remove enough ways to trigger it, then it should
be useable, and maybe that'll be less work than switching to a new db
backend (it sounds like almost anything would be <wink>).

=Tony Meyer