[spambayes-dev] Strange performance dip and DBRunRecoveryErrorretreat

Richie Hindle richie at entrian.com
Fri Jan 2 03:48:46 EST 2004


[Richie]
> I've done enough investigation to know that the time is being spent
> in the core SpamBayes code and not my script,

[Tim]
> Is that a true dichotomy?  [...]  Or is it that you just know it's not
> in your script, and you divide the universe into "my script" and "the
> core SpamBayes code" here?

Sorry, yes, I mean "it's not my script".  Adding print statements before
and after calls to train() and classify() occasionally shows a delay
within those functions, but only between messages 100 and 400 (or
thereabouts).  Whether the time is being spent in our code, the BerkeleyDB
code or the OS, I don't know.

> We could write our own database specialized to our project's specific needs,
> and probably get that working faster and better than any general-purpose
> beast.

That was my conclusion too, and I'm not about to write it either.  8-)

> Before you get too sick of it,
> you might also want to investigate Neil Schemenauer's adaptation of
> spambayes for cdb.  cdb is an efficient and essentially worry-free
> disk-based database.  It buys this at the cost of *not* being incrementally
> updatable:  you can replace the whole thing atomically, in one giant gulp,
> but that's it.

I'll have a look - thanks for the heads-up.

> It's a pit, isn't it?

Is it ever.

-- 
Richie Hindle
richie at entrian.com




More information about the spambayes-dev mailing list