[spambayes-dev] Strange performance dip andDBRunRecoveryErrorretreat

Tony Meyer tameyer at ihug.co.nz
Thu Jan 1 21:58:42 EST 2004


[Richie Hindle]
> It manages about 40 train-and-classify loops 
> per second on my 2.4GHz P4, *except* between about 100 and 400 
> messages, when the performance drops to about a tenth of that and then 
> recovers.
>
> I've done enough investigation to know that the time is being spent in 
> the core SpamBayes code and not my script,

[Tim Peters]
> Is that a true dichotomy?  That is, do you know, for example, 
> that the time is being spent in the core spambayes code as 
> distinct from the Berkeley database library, or distinct from 
> random network traffic other programs are engaging in?  Or is 
> it that you just know it's not in your script, and you divide 
> the universe into "my script" and "the core SpamBayes code" here?

I see it too, roughly in the same place.  I also see it if I get hammer.py
to use a pickle (although the drop isn't as big), which presumably means
it's not Berkeley related.  Tim's guess of something Python is doing is
probably most likely.  It doesn't seem significant, though.

[Richie]
> I've committed the script as testtools/hammer.py,

Actually utilities/hammer.py :)

[Tim]
> Thanks for the effort!  Maybe somebody else can complicate it 
> now in a way that does provoke DBRunRecoveryErrors.  It's 
> never what you expect <wink>.

Richie - around about how many messages could you do before it crashed?  It
crashes early (~1700) for me with the 1.0a6 reopen-before-closing bug
(sometimes a DB_RUNRECOVERY, sometimes a ham/spam count) but if I get it to
always close or save the db before reopening, it just goes and goes - I got
to 57900 before Python crashed *.

If I get it to close before reopening, then interrupt (ctrl-c) it at some
point after its done its first save, and restart it (without it deleting the
existing db file) it will chug along for a while, and then at some later
point die with a RUNRECOVERY.  Restarting it then will provoke an immediate
RUNRECOVERY.  This suggests (I think) that reopening the db without closing
it can cause a RUNRECOVERY error at some later point - even several
reopenings later - rather than immediately like I expected.

Maybe this is one of the causes for the RUNRECOVERY errors - the user
doesn't close sb_server properly, so the db isn't properly closed/saved.
Some time later the error occurs.  It would explain why they are less
frequent with the plug-in, because the plug-in saves the db much more often
(after every "delete as spam"/"recover from spam" event, and IIRC after any
incremental training).

Does anyone see how it could hurt to have sb_server save the db after doing
a page of training?  (This would just be a one line addition to onReview in
ProxyUI.py).

=Tony Meyer

* Everything crashes sooner or later on this machine - python.exe, gcc.exe,
IE, ... I'm sure that it's unrelated to spambayes or python.




More information about the spambayes-dev mailing list