[spambayes-bugs] [ spambayes-Feature Requests-854705 ] Detect "line noise" in subject and body

SourceForge.net noreply at sourceforge.net
Fri Dec 5 07:58:04 EST 2003


Feature Requests item #854705, was opened at 2003-12-05 12:58
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498106&aid=854705&group_id=61702

Category: None
Group: None
Status: Open
Priority: 5
Submitted By: Julian Morrison (julianm)
Assigned to: Nobody/Anonymous (nobody)
Summary: Detect "line noise" in subject and body

Initial Comment:
Spell check words in the message subject and body,
generate tokens for the count of misspellings in each.
Perhaps also generate tokens for the ratio of
incorrect/correct spellings? This could be chunked to
make it easier to train eg: all, more than half, about
half, less than half, none. These should be seperate
for subject and for body since garble in the header is
very predictive of spam.

Also, there has to be some way to look for words with
"impossible to pronounce" consonant clusters such as
"dvgkbm". Could spambayes be made to look for
"syllables"? Eg: by parsing words into syllables and
generating tokens for each? I'm not sure there's a
parsing technique that's sufficiently
internationalized.  Perhaps even just generating tokens
for ASCII consonant clusters would be better than nothing.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498106&aid=854705&group_id=61702



More information about the Spambayes-bugs mailing list