[Spambayes-checkins] spambayes/spambayes classifier.py,1.14,1.15

Tim Peters tim_one at users.sourceforge.net
Tue Dec 16 00:19:10 EST 2003


Update of /cvsroot/spambayes/spambayes/spambayes
In directory sc8-pr-cvs1:/tmp/cvs-serv14339/spambayes

Modified Files:
	classifier.py 
Log Message:
_enhance_wordstream():  Simplify and speed; repaired docstring; now
delivers the last token in the input stream too.  NOT TESTED, though.


Index: classifier.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/spambayes/classifier.py,v
retrieving revision 1.14
retrieving revision 1.15
diff -C2 -d -r1.14 -r1.15
*** classifier.py	16 Dec 2003 04:59:58 -0000	1.14
--- classifier.py	16 Dec 2003 05:19:08 -0000	1.15
***************
*** 427,434 ****
  
      def _enhance_wordstream(self, wordstream):
!         """Add bigrams to the wordstream.  This wraps the last token
!         to the first one, so a small number of odd tokens might get
!         generated from that, but it shouldn't be significant.  Note
!         that these are *token* bigrams, and not *word* bigrams - i.e.
          'synthetic' tokens get bigram'ed, too.
  
--- 427,435 ----
  
      def _enhance_wordstream(self, wordstream):
!         """Add bigrams to the wordstream.
! 
!         For example, a b c -> a b "a b" c "b c"
! 
!         Note that these are *token* bigrams, and not *word* bigrams - i.e.
          'synthetic' tokens get bigram'ed, too.
  
***************
*** 438,453 ****
  
          If the experimental "Classifier":"x-use_bigrams" option is
!         removed, this function can be removed, too."""
!         p = None
!         while True:
!             try:
!                 if p:
!                     yield p
!                 q = wordstream.next()
!                 if p:
!                     yield "%s %s" % (p, q)
!                 p = q
!             except StopIteration:
!                 break
  
      def _wordinfokeys(self):
--- 439,451 ----
  
          If the experimental "Classifier":"x-use_bigrams" option is
!         removed, this function can be removed, too.
!         """
! 
!         last = None
!         for token in wordstream:
!             yield token
!             if last:
!                 yield "%s %s" % (last, token)
!             last = token
  
      def _wordinfokeys(self):





More information about the Spambayes-checkins mailing list