[spambayes-dev] chi2 technique

Sam Nage seionage at lycos.com
Thu May 19 05:55:21 CEST 2005


Hi Tony,
Thanks a bunch for the reply. I'll come up with more specific questions about the classifier. If they are too generic, don't worry about hurting my feeling about smart remarks. I know what it is like when you have a noob at the other end of the list. ;-)  

btw, I've got 2 suggestions about trying out, to see how they work (hopefully i'm not overstepping my bounds here)

1) Has anyone tried classifing, or keeping track of unknown HTML tag names? Say for instance, some spam has the following html V<asdfsdf><br>I<asdfsdf><br>G<asdfsdf><br>R<asdfsdf><br>A
What if a token was added named "Unknown:Html" or something like that (because of the <asdfsdf> tags ?

also, I ran across this link today:

2) http://mmmservices.web.cern.ch/mmmservices/AntiSpam/
Basically they filter out the really small fonts between individual characters (look at evolution 3)
Would a technique like this be benneficial?

Also, if I want to test some type of technique, what levels of spam filtering/fp/fn are people getting? What percentage points should I shoot for?

TIA!


----- Original Message -----
From: "Tony Meyer" <tameyer at ihug.co.nz>
To: "'Sam Nage'" <seionage at lycos.com>, spambayes-dev at python.org
Subject: RE: [spambayes-dev] chi2 technique
Date: Thu, 19 May 2005 10:51:04 +1200

> 
> > I'm trying to understand how you implemented the Chi2 technique.
> 
> Do you mean the whole classifier, or just the inverse-chi-squared function
> (chi2.chi2Q())?
> 
> > Can someone tell me how chi2 method is implemented in spambayes?
> 
> What exactly do you mean by how it's implemented?  Do you want an
> explanation of how the math works (Gary Robinson's Linux Journal paper is
> probably best for that), or how the math was turned into Python?
> 
> If the latter, then reading classifier.py is probably the best thing to do.
> Concentrate on the Classifier class, particularly the chi2_spamprob() and
> probability() methods.  If you understand how they work (and the comments do
> a pretty good job of explaining how they relate to the math), then that's
> mostly it.
> 
> If there's something you don't understand, then asking about the specific
> part of the code would probably make it easier to answer.
> 
> > Has this been discussed before? I've tried searching the archives.
> 
> Well, I'm sure people discussed it at the time it was written (when it
> replaced the earlier combining methods).  That's long before this
> spambayes-dev list started, though.  It might be at the start of the
> spambayes at python.org list, but I suspect you'll have to go back to when
> spambayes was discussed on python-dev.  I wouldn't really recommend the
> archives as a place to find these answers.
> 
> =Tony.Meyer


-- 
_______________________________________________
NEW! Lycos Dating Search. The only place to search multiple dating sites at once.
http://datingsearch.lycos.com



More information about the spambayes-dev mailing list