[spambayes-dev] More stupid beats smart timcv.py results
Tony Meyer
tameyer at ihug.co.nz
Wed Jan 26 03:28:40 CET 2005
[Tony Meyer, last week]
> The latter was prompted by a comment in JGC's latest
> newsletter (though I'm sure I've seen this somewhere before,
> too). To avoid deliberate misspellings and the so-called
> 'cambridge effect' you replace each (or generate a new) token
> that is made up of the letters in the original token sorted
> into a constant order (e.g. alphabetical). So "god" becomes
> "dgo", but so does "dog".
At the MIT Spam Conference John mentioned (offhand, regarding something
else) that POPFile does this just for words that are longer than 6
characters. Since I already had the stuff at hand, I gave this a go, in
case the poor results were just from those short words.
Compared to all-defaults, fp and fn were unchanged and unsure rose 0.03%.
So the verdict is unchanged.
(I can post cmp.py or table.py results if anyone is interested, but there's
nothing really interesting here).
=Tony.Meyer
More information about the spambayes-dev
mailing list