[Python-checkins] python/nondist/sandbox/spambayes GBayes.py,1.14,1.15
montanaro@users.sourceforge.net
montanaro@users.sourceforge.net
Tue, 27 Aug 2002 20:45:08 -0700
Update of /cvsroot/python/python/nondist/sandbox/spambayes
In directory usw-pr-cvs1:/tmp/cvs-serv4506
Modified Files:
GBayes.py
Log Message:
ehh - it actually didn't work all that well. the spurious report that it
did well was pilot error. besides, tim's report suggests that a simple
str.split() may be the best tokenizer anyway.
Index: GBayes.py
===================================================================
RCS file: /cvsroot/python/python/nondist/sandbox/spambayes/GBayes.py,v
retrieving revision 1.14
retrieving revision 1.15
diff -C2 -d -r1.14 -r1.15
*** GBayes.py 28 Aug 2002 00:43:44 -0000 1.14
--- GBayes.py 28 Aug 2002 03:45:06 -0000 1.15
***************
*** 108,116 ****
return tokenize_ngram(string, 15)
- def tokenize_trigram(string):
- r"""tokenize w/ re '[\w$-]+', result squished to 3-char runs"""
- lst = "".join(_token_re.findall(string))
- return tokenize_ngram(string, 3)
-
# add user-visible string as key and function as value - function's docstring
# serves as help string when -H is used, so keep it brief!
--- 108,111 ----
***************
*** 124,128 ****
"split": tokenize_split,
"split_fold": tokenize_split_foldcase,
- "trigram": tokenize_trigram,
"words": tokenize_words,
"words_fold": tokenize_words_foldcase,
--- 119,122 ----