[Spambayes-checkins] spambayes NEWTRICKS.txt,1.3,1.3.2.1
Anthony Baxter
anthonybaxter at users.sourceforge.net
Wed Nov 5 07:44:55 EST 2003
Update of /cvsroot/spambayes/spambayes
In directory sc8-pr-cvs1:/tmp/cvs-serv2621
Modified Files:
Tag: release_1_0
NEWTRICKS.txt
Log Message:
merge from trunk
Index: NEWTRICKS.txt
===================================================================
RCS file: /cvsroot/spambayes/spambayes/NEWTRICKS.txt,v
retrieving revision 1.3
retrieving revision 1.3.2.1
diff -C2 -d -r1.3 -r1.3.2.1
*** NEWTRICKS.txt 15 Sep 2003 22:41:31 -0000 1.3
--- NEWTRICKS.txt 5 Nov 2003 12:44:52 -0000 1.3.2.1
***************
*** 20,21 ****
--- 20,38 ----
level. Also a token indicating the ratio of message length to the
number of tokens, and a token indicating the number of tokens.
+ Also, [817813] add a "not in database" token (I'm not sure about this
+ one, but I can't articulate why).
+
+ - A token indicating the ratio of hapax legomena to previously seen
+ tokens in the message.
+
+ - Punctuation sometimes gets inserted in otherwise spammy words or phrases,
+ e.g.: "Ch-eck ou=t ou-r sel)ection _of grea)t R_X -emgffj". It might be
+ helpful to try stripping punctuation. (Idea from Paul Sorenson)
+
+ - Similarly, some letters get replaced by numbers, e.g.: "V1agra" instead of
+ "Viagra". Mapping numbers to suitable letters might help in some
+ situations.
+
+ - [817813] Add a spelling checker and reasonable sized dictionary and generate
+ a "not in dictionary" token.
+
More information about the Spambayes-checkins
mailing list