[spambayes-dev] Strange performance dip
andDBRunRecoveryErrorretreat
Tony Meyer
tameyer at ihug.co.nz
Thu Jan 1 21:58:42 EST 2004
[Richie Hindle]
> It manages about 40 train-and-classify loops
> per second on my 2.4GHz P4, *except* between about 100 and 400
> messages, when the performance drops to about a tenth of that and then
> recovers.
>
> I've done enough investigation to know that the time is being spent in
> the core SpamBayes code and not my script,
[Tim Peters]
> Is that a true dichotomy? That is, do you know, for example,
> that the time is being spent in the core spambayes code as
> distinct from the Berkeley database library, or distinct from
> random network traffic other programs are engaging in? Or is
> it that you just know it's not in your script, and you divide
> the universe into "my script" and "the core SpamBayes code" here?
I see it too, roughly in the same place. I also see it if I get hammer.py
to use a pickle (although the drop isn't as big), which presumably means
it's not Berkeley related. Tim's guess of something Python is doing is
probably most likely. It doesn't seem significant, though.
[Richie]
> I've committed the script as testtools/hammer.py,
Actually utilities/hammer.py :)
[Tim]
> Thanks for the effort! Maybe somebody else can complicate it
> now in a way that does provoke DBRunRecoveryErrors. It's
> never what you expect <wink>.
Richie - around about how many messages could you do before it crashed? It
crashes early (~1700) for me with the 1.0a6 reopen-before-closing bug
(sometimes a DB_RUNRECOVERY, sometimes a ham/spam count) but if I get it to
always close or save the db before reopening, it just goes and goes - I got
to 57900 before Python crashed *.
If I get it to close before reopening, then interrupt (ctrl-c) it at some
point after its done its first save, and restart it (without it deleting the
existing db file) it will chug along for a while, and then at some later
point die with a RUNRECOVERY. Restarting it then will provoke an immediate
RUNRECOVERY. This suggests (I think) that reopening the db without closing
it can cause a RUNRECOVERY error at some later point - even several
reopenings later - rather than immediately like I expected.
Maybe this is one of the causes for the RUNRECOVERY errors - the user
doesn't close sb_server properly, so the db isn't properly closed/saved.
Some time later the error occurs. It would explain why they are less
frequent with the plug-in, because the plug-in saves the db much more often
(after every "delete as spam"/"recover from spam" event, and IIRC after any
incremental training).
Does anyone see how it could hurt to have sb_server save the db after doing
a page of training? (This would just be a one line addition to onReview in
ProxyUI.py).
=Tony Meyer
* Everything crashes sooner or later on this machine - python.exe, gcc.exe,
IE, ... I'm sure that it's unrelated to spambayes or python.
More information about the spambayes-dev
mailing list