[Spambayes] There Can Be Only One

Anthony Baxter anthony@interlink.com.au
Fri, 27 Sep 2002 00:05:56 +1000


>>> Greg Ward wrote
> Anyways, that little exploration has me wondering just how valid my data
> is.  I should probably rerun everything without looking at "Received"
> headers at all (except to count them -- for the most part, they stop at
> either mail.python.org or starship.python.net, which are the front-line
> servers for these two collections).

The data's fine, you'll just need to be careful about the headers you
look at - it's distressing _how_ good this stuff is at spotting patterns.

Delivery-date is another header to watch out for, if the ham/spam comes
from different places or times. It's not clear to me what puts that 
header in - it might be an MH thing.

I tend to do multiple runs by having multiple ini files. Say, a 
common.ini with the options that are constant, then test1.ini, test2.ini
or whatever, that have the options that vary. I can then put 

BAYESCUSTOMIZE="common.ini test1.ini" python2.3 timcv.py ...... > test1.txt
BAYESCUSTOMIZE="common.ini test2.ini" python2.3 timcv.py ...... > test2.txt
BAYESCUSTOMIZE="common.ini test3.ini" python2.3 timcv.py ...... > test3.txt

in a shell script, run it and go get lunch (or coffee, or sleep, or
whatever)

(aside - I tend to name the changing ini files more like x01.ini x02.ini 
mindisc15.ini, makes life more sane...)

Anthony