[spambayes-bugs] [ spambayes-Support Requests-912781 ] Other uses?

SourceForge.net noreply at sourceforge.net
Wed Mar 10 07:54:45 EST 2004


Support Requests item #912781, was opened at 2004-03-09 15:03
Message generated for change (Comment added) made by rprosser
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498104&aid=912781&group_id=61702

Category: None
Group: None
Status: Closed
Priority: 5
Submitted By: Richard Prosser (rprosser)
Assigned to: Nobody/Anonymous (nobody)
Summary: Other uses?

Initial Comment:
Can the code be readily used in non-email applications? I 
need a filter that can be trained across different 
categories (rather like POPFile), but for simple text input.


Thanks ...

Richard Prosser


----------------------------------------------------------------------

>Comment By: Richard Prosser (rprosser)
Date: 2004-03-10 12:54

Message:
Logged In: YES 
user_id=599403

Thanks for the replies. I really need multiple classification, so 
I'll have a look at the nway.py script.

BTW, I also read recently about a researcher who had 
discovered a new way to optimise the filtering, but now I 
can't find the article :-(


----------------------------------------------------------------------

Comment By: Tony Meyer (anadelonbrin)
Date: 2004-03-10 00:07

Message:
Logged In: YES 
user_id=552329

Richie and I have both used the spambayes code for non-
email classification.  He was classifying a music database, 
IIRC, and I'm working with scripted dialogue.  The tokeniser 
has to be pretty much recreated from scratch, of course, but 
if you want the same sort of chi-squared distribution, then it 
should work fine.

With multiple categories, you can use spambayes, although, 
as Kenny said, it's not designed for that.  Check out the 
nway.py script in the contrib directory, which Skip wrote 
(although I don't think he actually uses it).  I've used this 
too, although the results aren't as great as they could be 
(although that could be for other reasons).  Looking at the 
POPfile code and how they do it could very well be a better 
option (I keep meaning to do this, but haven't got around to 
it).

----------------------------------------------------------------------

Comment By: Kenny Pitt (kpitt)
Date: 2004-03-09 21:04

Message:
Logged In: YES 
user_id=859086

Classification is performed based on "tokens" in each input, 
and the SpamBayes tokenizer relies on standard e-mail format 
to generate tokens that have proven useful for differentiating 
spam from good messages. You could probably use the 
classifier and storage portions on a different type of input by 
writing a different tokenizer routine, but I can't guarantee 
that everything is perfectly separated.

Also keep in mind that SpamBayes is only designed to support 
two categories. It doesn't matter what the two categories 
are as long as you give it the training data to define the 
characteristics of each. However, if you need to send things 
to 3 or more categories as POPFile does then the SpamBayes 
code won't support that.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498104&aid=912781&group_id=61702



More information about the Spambayes-bugs mailing list