[Spambayes] Web interface statistics

David Abrahams dave at boost-consulting.com
Thu May 10 15:55:31 CEST 2007


on Thu May 10 2007, skip-AT-pobox.com wrote:

>     Dave> There really is something very fishy going on.  I actually added
>     Dave> instrumentation code to watch my training script train particular
>     Dave> words multiple times as ham or spam, but when I query those words
>     Dave> using the sb_imapfilter web interface, they always are shown as
>     Dave> having been trained 0 or 1 times, with one of two corresponding
>     Dave> probabilities.
>
>     Dave> I do a wildcard query with a single letter and returning 1000
>     Dave> results, and there's not a single number over 1 in the #spam or
>     Dave> #ham columns.
>
>     Dave> What could be going on?
>
> I've no idea.  It seems to be working for me.  I have lots of singletons(*),
> which is to be expected, but also lots of multiples:

OK, a couple of questions:

1. what kind of database are you using?  Maybe this is something in
   the DBM handling?

2. have you tried my patchset yet?  I'd like to know if it's somehow a
   bug I introduced.

> (*) Linguists call such singletons "hapax legemona".  I guess they were
> trying to be snooty when they came up with that term.

Oh, they weren't just _trying_ ;-)

-- 
Dave Abrahams
Boost Consulting
http://www.boost-consulting.com

Don't Miss BoostCon 2007! ==> http://www.boostcon.com


More information about the SpamBayes mailing list