speed of spambayes?

Emile van Sebille emile at fenx.com
Sun Nov 30 21:02:15 EST 2003


Paul Rubin:
> Can someone using spambayes tell me about how fast it runs?

IIRC, Tim Peters did some specific measurements during spambayes
development.

... aah - here it is: (from message id
LNBBLJKPBEHFEDALKOLCMEJFAOAB.tim.one at comcast.net)
in http://mail.python.org/pipermail/python-dev/2002-August.txt.gz

[Eric S. Raymond]
> I'm in the process of speed-tuning this now.  I intend for it to be
> blazingly fast, usable for sites that process 100K mails a day, and
I
> think I know how to do that.  This is not a natural application for
> Python :-).

[Tim Peters]
> I'm not sure about that.  The all-Python version I checked in added
20,000
> Python-Dev messages to the database in 2 wall-clock minutes.  The
time for
> computing the statistics, and for scoring, is simply trivial (this
wouldn't
> be true of a "normal" Bayesian classifier (NBC), but Graham skips
most of
> the work an NBC does, in particular favoring fast classification
time over
> fast model-update time).

This was 15 months ago, and I'm not sure how that relates to GBs per
howlongs, but it's something to start with.

--

Emile van Sebille
emile at fenx.com






More information about the Python-list mailing list