[Spambayes-checkins] spambayes/spambayes classifier.py,1.23,1.24

Tony Meyer anadelonbrin at users.sourceforge.net
Wed Jul 14 09:11:11 CEST 2004


Update of /cvsroot/spambayes/spambayes/spambayes
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv30909/spambayes

Modified Files:
	classifier.py 
Log Message:
Update a comment.

When slurping, use a lower timeout so things work faster (with Python >=2.3)

Avoid using message.setPayload, as this is now deprecated (doesn't work with Python 2.4a1).
Use the new form of makeMessage instead.

Index: classifier.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/spambayes/classifier.py,v
retrieving revision 1.23
retrieving revision 1.24
diff -C2 -d -r1.23 -r1.24
*** classifier.py	6 Feb 2004 21:43:00 -0000	1.23
--- classifier.py	14 Jul 2004 07:11:08 -0000	1.24
***************
*** 527,533 ****
          'synthetic' tokens get bigram'ed, too.
  
!         The bigram token is simply "unigram1 unigram2" - a space should
          be sufficient as a separator, since spaces aren't in any other
!         tokens, apart from 'synthetic' ones.
  
          If the experimental "Classifier":"x-use_bigrams" option is
--- 527,536 ----
          'synthetic' tokens get bigram'ed, too.
  
!         The bigram token is simply "bi:unigram1 unigram2" - a space should
          be sufficient as a separator, since spaces aren't in any other
!         tokens, apart from 'synthetic' ones.  The "bi:" prefix is added
!         to avoid conflict with tokens we generate (like "subject: word",
!         which could be "word" in a subject, or a bigram of "subject:" and
!         "word").
  
          If the experimental "Classifier":"x-use_bigrams" option is
***************
*** 686,689 ****
--- 689,701 ----
                  return ["url:non_html"]
  
+             # Waiting for the default timeout period slows everything
+             # down far too much, so try and reduce it for just this
+             # call (this will only work with Python 2.3 and above).
+             try:
+                 timeout = socket.getdefaulttimeout()
+                 socket.setdefaulttimeout(5)
+             except AttributeError:
+                 # Probably Python 2.2.
+                 pass
              try:
                  if options["globals", "verbose"]:
***************
*** 697,700 ****
--- 709,718 ----
                  self.bad_urls["url:unknown_error"] += (url,)
                  return ["url:unknown_error"]
+             # Restore the timeout
+             try:
+                 socket.setdefaulttimeout(timeout)
+             except AttributeError:
+                 # Probably Python 2.2.
+                 pass
  
              # Anything that isn't text/html is ignored
***************
*** 712,717 ****
              # Retrieving the same messages over and over again will tire
              # us out, so we store them in our own wee cache.
!             message = self.urlCorpus.makeMessage(url_key)
!             message.setPayload(fake_message_string)
              self.urlCorpus.addMessage(message)
          else:
--- 730,735 ----
              # Retrieving the same messages over and over again will tire
              # us out, so we store them in our own wee cache.
!             message = self.urlCorpus.makeMessage(url_key,
!                                                  fake_message_string)
              self.urlCorpus.addMessage(message)
          else:



More information about the Spambayes-checkins mailing list