[Spambayes-checkins] spambayes/spambayes classifier.py,1.23,1.24
Tony Meyer
anadelonbrin at users.sourceforge.net
Wed Jul 14 09:11:11 CEST 2004
Update of /cvsroot/spambayes/spambayes/spambayes
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv30909/spambayes
Modified Files:
classifier.py
Log Message:
Update a comment.
When slurping, use a lower timeout so things work faster (with Python >=2.3)
Avoid using message.setPayload, as this is now deprecated (doesn't work with Python 2.4a1).
Use the new form of makeMessage instead.
Index: classifier.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/spambayes/classifier.py,v
retrieving revision 1.23
retrieving revision 1.24
diff -C2 -d -r1.23 -r1.24
*** classifier.py 6 Feb 2004 21:43:00 -0000 1.23
--- classifier.py 14 Jul 2004 07:11:08 -0000 1.24
***************
*** 527,533 ****
'synthetic' tokens get bigram'ed, too.
! The bigram token is simply "unigram1 unigram2" - a space should
be sufficient as a separator, since spaces aren't in any other
! tokens, apart from 'synthetic' ones.
If the experimental "Classifier":"x-use_bigrams" option is
--- 527,536 ----
'synthetic' tokens get bigram'ed, too.
! The bigram token is simply "bi:unigram1 unigram2" - a space should
be sufficient as a separator, since spaces aren't in any other
! tokens, apart from 'synthetic' ones. The "bi:" prefix is added
! to avoid conflict with tokens we generate (like "subject: word",
! which could be "word" in a subject, or a bigram of "subject:" and
! "word").
If the experimental "Classifier":"x-use_bigrams" option is
***************
*** 686,689 ****
--- 689,701 ----
return ["url:non_html"]
+ # Waiting for the default timeout period slows everything
+ # down far too much, so try and reduce it for just this
+ # call (this will only work with Python 2.3 and above).
+ try:
+ timeout = socket.getdefaulttimeout()
+ socket.setdefaulttimeout(5)
+ except AttributeError:
+ # Probably Python 2.2.
+ pass
try:
if options["globals", "verbose"]:
***************
*** 697,700 ****
--- 709,718 ----
self.bad_urls["url:unknown_error"] += (url,)
return ["url:unknown_error"]
+ # Restore the timeout
+ try:
+ socket.setdefaulttimeout(timeout)
+ except AttributeError:
+ # Probably Python 2.2.
+ pass
# Anything that isn't text/html is ignored
***************
*** 712,717 ****
# Retrieving the same messages over and over again will tire
# us out, so we store them in our own wee cache.
! message = self.urlCorpus.makeMessage(url_key)
! message.setPayload(fake_message_string)
self.urlCorpus.addMessage(message)
else:
--- 730,735 ----
# Retrieving the same messages over and over again will tire
# us out, so we store them in our own wee cache.
! message = self.urlCorpus.makeMessage(url_key,
! fake_message_string)
self.urlCorpus.addMessage(message)
else:
More information about the Spambayes-checkins
mailing list