[Spambayes] Client/server model

Alexander G. M. Smith agmsmith@rogers.com
Thu Oct 17 19:58:32 2002


Guido van Rossum wrote:
> What would make more sense from the POV of minimizing traffic and
> minimizing work done in the server:
> 
>   cli parses the message
>   cli sends the list of tokens to svr

I'd want the server to do tokenization for consistency reasons.
Particularly if you are also spam filtering news articles and not
just e-mail messages.  Also, the server can have all that mail
parsing code (discarding attachments, decoding BASE64 etc), making
the client simpler.

>   svr scores the list of tokens
>   svr returns the text to be inserted in the X-Hammie-Disposition header

I'm returning the spam ratio in my server (using BeOS inter-program
communication, though I suppose I could use the package which extends
the BMessage system to the Internet, but the spam database is really
a per-user thing so that isn't useful).  I let the client decide if
it's over their own threshold limit or not (ok, that may be a bad design
choice).  I'm also returning the list of words and their individual
scores, but that's mostly for debugging (and wastes a lot of space -
150 words at a time!).  The client (a plug-in filter for the BeMail
package) also does the sound effects (saying "Spam" or "Genuine" as
each message comes in).

>   cli inserts the X-Hammie-Disposition in the message
>   cli prints the message to stdout
> 
> (I like to minimize traffic as well as the work done by the server;
> minimizing traffic is always a good idea, while minimizing server work
> means less load on a shared server -- if the clients run on separate
> machines, the combined CPU power of the clients is much more than that
> of the server.)

Actually, it turns out that my server approach really isn't needed for
speed reasons.  It just takes a fraction of a second to load and parse the
spam database (a 0.5MB (stripped of unique strings after initial training
on 1500 messages / 21000 words) text file with words and numbers).  But
still it's nice to have it separate from other programs so that it is
more modular.

- Alex