[Spambayes] Need help getting started

Neale Pickett neale@woozle.org
18 Sep 2002 22:36:00 -0700


So then, Guido van Rossum <guido@python.org> is all like:

> I'd like to run experiments for Tim.  My ham corpus is over 80,000
> messages, spread over hundreds of MH folders, one message per file
> with numeric names.

I'm not really up on MH format, but my mail reader (Gnus) stores
messages one per file with numeric names too.  In fact, all the changes
you made to hammie to support MH folders worked great for me.  So I
suspect the format is pretty close, if not identical.

> But how to turn this into Tim's standard data setup?

Super easy.  Create the following directories:

  Data/Ham/Set[1-5]
  Data/Ham/reservoir
  Data/Spam/Set[1-5]
  Data/Spam/reservoir

Then go into the reservoirs, and link (I used hard links to save inodes)
all your ham/spam into the directories:

  $ cd Data/Ham/reservoir
  $ ln ~/Mail/inbox/* .
  $ cd ../../../Data/Spam/reservoir
  $ ln ~/Mail/spam/* .

At that point, you have a standard setup, and can use rebal.py to
populate your Set directories.  If you have a bourne shell, you can
become the second ever user of runtest.sh--call it with the -r option
and it'll rebal for you.

Neale