[Spambayes] Results of playing with CDB

Neale Pickett neale@woozle.org
15 Sep 2002 23:31:52 -0700


So then, Tim Peters <tim.one@comcast.net> is all like:

> [Neale Pickett]
> > Unfortunately, that's the only situation I can afford :) I have 80
> > users, one box,
> 
> I don't understand.  You have 80 users attached to dumb terminals with
> no CPU of their own ... or what?

Sort of.  About half of them use pine on my server, and the other half
are just figuring out how to browse the web, and asking them to install
stuff is too tall an order.  On the plus side, their inboxes are likely
to be very jargon-free.

My end goal is a centralized classifier for an entire organization on an
embedded device.  At $FIRM, our current-generation devices have between
4MB and 16MB of flash and no hard drive.  The next ones we make will
have more, but I don't know what that'll be yet.  This is why I'm so
concerned about storage space, though :)

> > Do you think they should all contribute to one big vat-o-meat
> > database, or should individual words be tagged per-user?

> I *expect* to get good results from pooling python.org mailing lists
> because of high topic and poster commonality.  This has yet to be
> demonstrated, though.  I *don't* expect it would also work well to
> feed in personal email that happens to be handled by python.org;
[snip]

Plagued by conscience, I've just run my 1000 test hams against your
SpamHam1.pik classifier.  It came back saying that 51 were spam, and 949
were ham.  Not bad, eh?  I've put the output of a "hammie.py -u" run at

  http://woozle.org/~neale/tmp/results.txt

All default values in my .ini file.  That is to say, I have no .ini file
:)  I'll go over unread messages and see if you need any other tests run.
Sorry I've been slackin'.

Neale