[spambayes-dev] Strange performance dip and
DBRunRecoveryErrorretreat
Richie Hindle
richie at entrian.com
Fri Jan 2 03:48:46 EST 2004
[Richie]
> I've done enough investigation to know that the time is being spent
> in the core SpamBayes code and not my script,
[Tim]
> Is that a true dichotomy? [...] Or is it that you just know it's not
> in your script, and you divide the universe into "my script" and "the
> core SpamBayes code" here?
Sorry, yes, I mean "it's not my script". Adding print statements before
and after calls to train() and classify() occasionally shows a delay
within those functions, but only between messages 100 and 400 (or
thereabouts). Whether the time is being spent in our code, the BerkeleyDB
code or the OS, I don't know.
> We could write our own database specialized to our project's specific needs,
> and probably get that working faster and better than any general-purpose
> beast.
That was my conclusion too, and I'm not about to write it either. 8-)
> Before you get too sick of it,
> you might also want to investigate Neil Schemenauer's adaptation of
> spambayes for cdb. cdb is an efficient and essentially worry-free
> disk-based database. It buys this at the cost of *not* being incrementally
> updatable: you can replace the whole thing atomically, in one giant gulp,
> but that's it.
I'll have a look - thanks for the heads-up.
> It's a pit, isn't it?
Is it ever.
--
Richie Hindle
richie at entrian.com
More information about the spambayes-dev
mailing list