[Python-checkins] python/nondist/sandbox/spambayes GBayes.py,1.14,1.15

montanaro@users.sourceforge.net montanaro@users.sourceforge.net
Tue, 27 Aug 2002 20:45:08 -0700


Update of /cvsroot/python/python/nondist/sandbox/spambayes
In directory usw-pr-cvs1:/tmp/cvs-serv4506

Modified Files:
	GBayes.py 
Log Message:
ehh - it actually didn't work all that well.  the spurious report that it
did well was pilot error.  besides, tim's report suggests that a simple
str.split() may be the best tokenizer anyway.


Index: GBayes.py
===================================================================
RCS file: /cvsroot/python/python/nondist/sandbox/spambayes/GBayes.py,v
retrieving revision 1.14
retrieving revision 1.15
diff -C2 -d -r1.14 -r1.15
*** GBayes.py	28 Aug 2002 00:43:44 -0000	1.14
--- GBayes.py	28 Aug 2002 03:45:06 -0000	1.15
***************
*** 108,116 ****
      return tokenize_ngram(string, 15)
  
- def tokenize_trigram(string):
-     r"""tokenize w/ re '[\w$-]+', result squished to 3-char runs"""
-     lst = "".join(_token_re.findall(string))
-     return tokenize_ngram(string, 3)
- 
  # add user-visible string as key and function as value - function's docstring
  # serves as help string when -H is used, so keep it brief!
--- 108,111 ----
***************
*** 124,128 ****
      "split": tokenize_split,
      "split_fold": tokenize_split_foldcase,
-     "trigram": tokenize_trigram,
      "words": tokenize_words,
      "words_fold": tokenize_words_foldcase,
--- 119,122 ----