[spambayes-dev] More stupid beats smart timcv.py results

Tony Meyer tameyer at ihug.co.nz
Wed Jan 26 03:28:40 CET 2005

[Tony Meyer, last week]
> The latter was prompted by a comment in JGC's latest 
> newsletter (though I'm sure I've seen this somewhere before, 
> too).  To avoid deliberate misspellings and the so-called 
> 'cambridge effect' you replace each (or generate a new) token 
> that is made up of the letters in the original token sorted 
> into a constant order (e.g. alphabetical).  So "god" becomes 
> "dgo", but so does "dog".

At the MIT Spam Conference John mentioned (offhand, regarding something
else) that POPFile does this just for words that are longer than 6
characters.  Since I already had the stuff at hand, I gave this a go, in
case the poor results were just from those short words.

Compared to all-defaults, fp and fn were unchanged and unsure rose 0.03%.
So the verdict is unchanged.

(I can post cmp.py or table.py results if anyone is interested, but there's
nothing really interesting here).


More information about the spambayes-dev mailing list