[Spambayes-checkins] spambayes NEWTRICKS.txt,1.3,1.3.2.1

Anthony Baxter anthonybaxter at users.sourceforge.net
Wed Nov 5 07:44:55 EST 2003


Update of /cvsroot/spambayes/spambayes
In directory sc8-pr-cvs1:/tmp/cvs-serv2621

Modified Files:
      Tag: release_1_0
	NEWTRICKS.txt 
Log Message:
merge from trunk

Index: NEWTRICKS.txt
===================================================================
RCS file: /cvsroot/spambayes/spambayes/NEWTRICKS.txt,v
retrieving revision 1.3
retrieving revision 1.3.2.1
diff -C2 -d -r1.3 -r1.3.2.1
*** NEWTRICKS.txt	15 Sep 2003 22:41:31 -0000	1.3
--- NEWTRICKS.txt	5 Nov 2003 12:44:52 -0000	1.3.2.1
***************
*** 20,21 ****
--- 20,38 ----
    level.  Also a token indicating the ratio of message length to the
    number of tokens, and a token indicating the number of tokens.
+   Also, [817813] add a "not in database" token (I'm not sure about this
+   one, but I can't articulate why).
+   
+ - A token indicating the ratio of hapax legomena to previously seen
+   tokens in the message.
+ 
+ - Punctuation sometimes gets inserted in otherwise spammy words or phrases,
+   e.g.: "Ch-eck ou=t ou-r sel)ection _of grea)t R_X -emgffj".  It might be
+   helpful to try stripping punctuation.  (Idea from Paul Sorenson)
+ 
+ - Similarly, some letters get replaced by numbers, e.g.: "V1agra" instead of
+   "Viagra".  Mapping numbers to suitable letters might help in some
+   situations.
+ 
+ - [817813] Add a spelling checker and reasonable sized dictionary and generate
+   a "not in dictionary" token.
+ 





More information about the Spambayes-checkins mailing list