From anadelonbrin at users.sourceforge.net Tue Nov 2 07:33:26 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Tue Nov 2 07:33:29 2004
Subject: [Spambayes-checkins] spambayes/spambayes Stats.py,1.6,1.7
Message-ID:
Update of /cvsroot/spambayes/spambayes/spambayes
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv28614/spambayes
Modified Files:
Stats.py
Log Message:
Improve the web interface statistics.
This is the format that was devised by Mark Moraes and Kenny Pitt on spambayes-dev
quite some time ago (but was never checked in - maybe we were feature frozen then?).
This is my own code, though, not the patch that Mark submitted, which added unnecessary
counters.
At some point I'll copy across the code that Outlook has that lets the number of decimal
places for the percentages be specified. The Outlook stats could be changed to look
more like this (or the damn code could be centralised), too, maybe, except that there
isn't much room in the dialog for a lot of text. Maybe Kenny has a patch for that?
(A spambayes-dev message indicated that he might).
The new stats should look something like this:
SpamBayes has classified a total of 1223 messages:
827 ham (67.6% of total)
333 spam (27.2% of total)
63 unsure (5.2% of total)
1125 messages were classified correctly (92.0% of total)
35 messages were classified incorrectly (2.9% of total)
0 false positives (0.0% of total)
35 false negatives (2.9% of total)
6 unsures trained as ham (9.5% of unsures)
56 unsures trained as spam (88.9% of unsures)
1 unsure was not trained (1.6% of unsures)
A total of 760 messages have been trained:
346 ham (98.3% ham, 1.7% unsure, 0.0% false positives)
414 spam (78.0% spam, 13.5% unsure, 8.5% false negatives)
Index: Stats.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/spambayes/Stats.py,v
retrieving revision 1.6
retrieving revision 1.7
diff -C2 -d -r1.6 -r1.7
*** Stats.py 15 Feb 2004 02:15:51 -0000 1.6
--- Stats.py 2 Nov 2004 06:33:23 -0000 1.7
***************
*** 25,34 ****
"""
! # This module is part of the spambayes project, which is Copyright 2002-3
# The Python Software Foundation and is covered by the Python Software
# Foundation license.
__author__ = "Tony Meyer "
! __credits__ = "Mark Hammond, all the spambayes folk."
from spambayes.message import msginfoDB
--- 25,42 ----
"""
! # This module is part of the spambayes project, which is Copyright 2002-4
# The Python Software Foundation and is covered by the Python Software
# Foundation license.
__author__ = "Tony Meyer "
! __credits__ = "Kenny Pitt, Mark Hammond, all the spambayes folk."
!
! try:
! True, False
! except NameError:
! # Maintain compatibility with Python 2.2
! True, False = 1, 0
!
! import types
from spambayes.message import msginfoDB
***************
*** 62,81 ****
msginfoDB._getState(m)
if m.c == 's':
self.cls_spam += 1
! if m.t == 0:
self.fp += 1
elif m.c == 'h':
self.cls_ham += 1
! if m.t == 1:
self.fn += 1
elif m.c == 'u':
self.cls_unsure += 1
! if m.t == 0:
self.trn_unsure_ham += 1
! elif m.t == 1:
self.trn_unsure_spam += 1
! if m.t == 1:
self.trn_spam += 1
! elif m.t == 0:
self.trn_ham += 1
--- 70,94 ----
msginfoDB._getState(m)
if m.c == 's':
+ # Classified as spam.
self.cls_spam += 1
! if m.t == False:
! # False positive (classified as spam, trained as ham)
self.fp += 1
elif m.c == 'h':
+ # Classified as ham.
self.cls_ham += 1
! if m.t == True:
! # False negative (classified as ham, trained as spam)
self.fn += 1
elif m.c == 'u':
+ # Classified as unsure.
self.cls_unsure += 1
! if m.t == False:
self.trn_unsure_ham += 1
! elif m.t == True:
self.trn_unsure_spam += 1
! if m.t == True:
self.trn_spam += 1
! elif m.t == False:
self.trn_ham += 1
***************
*** 85,128 ****
chunks = []
push = chunks.append
! perc_ham = 100.0 * self.cls_ham / self.total
! perc_spam = 100.0 * self.cls_spam / self.total
! perc_unsure = 100.0 * self.cls_unsure / self.total
format_dict = {
! 'perc_spam': perc_spam,
! 'perc_ham': perc_ham,
! 'perc_unsure': perc_unsure,
! 'num_seen': self.total
}
format_dict.update(self.__dict__)
# Figure out plurals
! for num, key in [(self.total, "sp1"), (self.trn_ham, "sp2"),
! (self.trn_spam, "sp3"),
! (self.trn_unsure_ham, "sp4"),
! (self.fp, "sp5"), (self.fn, "sp6")]:
! if num == 1:
format_dict[key] = ''
else:
format_dict[key] = 's'
! for num, key in [(self.fp, "wp1"), (self.fn, "wp2")]:
! if num == 1:
! format_dict[key] = 'was a'
else:
format_dict[key] = 'were'
! push("SpamBayes has processed %(num_seen)d message%(sp1)s - " \
! "%(cls_ham)d (%(perc_ham).0f%%) good, " \
! "%(cls_spam)d (%(perc_spam).0f%%) spam " \
! "and %(cls_unsure)d (%(perc_unsure)d%%) unsure." % format_dict)
! push("%(trn_ham)d message%(sp2)s were manually " \
! "classified as good (%(fp)d %(wp1)s false positive%(sp5)s)." \
! % format_dict)
! push("%(trn_spam)d message%(sp3)s were manually " \
! "classified as spam (%(fn)d %(wp2)s false negative%(sp6)s)." \
! % format_dict)
! push("%(trn_unsure_ham)d unsure message%(sp4)s were manually " \
! "identified as good, and %(trn_unsure_spam)d as spam." \
! % format_dict)
return chunks
if __name__=='__main__':
s = Stats()
--- 98,238 ----
chunks = []
push = chunks.append
! not_trn_unsure = self.cls_unsure - self.trn_unsure_ham - \
! self.trn_unsure_spam
! if self.cls_unsure:
! unsure_ham_perc = 100.0 * self.trn_unsure_ham / self.cls_unsure
! unsure_spam_perc = 100.0 * self.trn_unsure_spam / self.cls_unsure
! unsure_not_perc = 100.0 * not_trn_unsure / self.cls_unsure
! else:
! unsure_ham_perc = 0.0 # Not correct, really!
! unsure_spam_perc = 0.0 # Not correct, really!
! unsure_not_perc = 0.0 # Not correct, really!
! if self.trn_ham:
! trn_perc_unsure_ham = 100.0 * self.trn_unsure_ham / \
! self.trn_ham
! trn_perc_fp = 100.0 * self.fp / self.trn_ham
! trn_perc_ham = 100.0 - (trn_perc_unsure_ham + trn_perc_fp)
! else:
! trn_perc_ham = 0.0 # Not correct, really!
! trn_perc_unsure_ham = 0.0 # Not correct, really!
! trn_perc_fp = 0.0 # Not correct, really!
! if self.trn_spam:
! trn_perc_unsure_spam = 100.0 * self.trn_unsure_spam / \
! self.trn_spam
! trn_perc_fn = 100.0 * self.fn / self.trn_spam
! trn_perc_spam = 100.0 - (trn_perc_unsure_spam + trn_perc_fn)
! else:
! trn_perc_spam = 0.0 # Not correct, really!
! trn_perc_unsure_spam = 0.0 # Not correct, really!
! trn_perc_fn = 0.0 # Not correct, really!
format_dict = {
! 'num_seen' : self.total,
! 'correct' : self.total - (self.cls_unsure + self.fp + self.fn),
! 'incorrect' : self.cls_unsure + self.fp + self.fn,
! 'unsure_ham_perc' : unsure_ham_perc,
! 'unsure_spam_perc' : unsure_spam_perc,
! 'unsure_not_perc' : unsure_not_perc,
! 'not_trn_unsure' : not_trn_unsure,
! 'trn_total' : (self.trn_ham + self.trn_spam + \
! self.trn_unsure_ham + self.trn_unsure_spam),
! 'trn_perc_ham' : trn_perc_ham,
! 'trn_perc_unsure_ham' : trn_perc_unsure_ham,
! 'trn_perc_fp' : trn_perc_fp,
! 'trn_perc_spam' : trn_perc_spam,
! 'trn_perc_unsure_spam' : trn_perc_unsure_spam,
! 'trn_perc_fn' : trn_perc_fn,
}
format_dict.update(self.__dict__)
+
+ # Add percentages of everything.
+ for key, val in format_dict.items():
+ perc_key = "perc_" + key
+ if self.total and isinstance(val, types.IntType):
+ format_dict[perc_key] = 100.0 * val / self.total
+ else:
+ format_dict[perc_key] = 0.0 # Not correct, really!
+
# Figure out plurals
! for num, key in [("num_seen", "sp1"),
! ("correct", "sp2"),
! ("incorrect", "sp3"),
! ("fp", "sp4"),
! ("fn", "sp5"),
! ("trn_unsure_ham", "sp6"),
! ("trn_unsure_spam", "sp7"),
! ("not_trn_unsure", "sp8"),
! ("trn_total", "sp9"),
! ]:
! if format_dict[num] == 1:
format_dict[key] = ''
else:
format_dict[key] = 's'
! for num, key in [("correct", "wp1"),
! ("incorrect", "wp2"),
! ("not_trn_unsure", "wp3"),
! ]:
! if format_dict[num] == 1:
! format_dict[key] = 'was'
else:
format_dict[key] = 'were'
! ## Our result should look something like this:
! ## (devised by Mark Moraes and Kenny Pitt)
! ##
! ## SpamBayes has classified a total of 1223 messages:
! ## 827 ham (67.6% of total)
! ## 333 spam (27.2% of total)
! ## 63 unsure (5.2% of total)
! ##
! ## 1125 messages were classified correctly (92.0% of total)
! ## 35 messages were classified incorrectly (2.9% of total)
! ## 0 false positives (0.0% of total)
! ## 35 false negatives (2.9% of total)
! ##
! ## 6 unsures trained as ham (9.5% of unsures)
! ## 56 unsures trained as spam (88.9% of unsures)
! ## 1 unsure was not trained (1.6% of unsures)
! ##
! ## A total of 760 messages have been trained:
! ## 346 ham (98.3% ham, 1.7% unsure, 0.0% false positives)
! ## 414 spam (78.0% spam, 13.5% unsure, 8.5% false negatives)
!
! push("SpamBayes has classified a total of " \
! "%(num_seen)d message%(sp1)s:" \
! "
%(cls_ham)d " \
! "(%(perc_cls_ham).0f%% of total) good" \
! "
%(cls_spam)d " \
! "(%(perc_cls_spam).0f%% of total) spam" \
! "
%(cls_unsure)d " \
! "(%(perc_cls_unsure).0f%% of total) unsure." % \
! format_dict)
! push("%(correct)d message%(sp2)s %(wp1)s classified correctly " \
! "(%(perc_correct).0f%% of total)" \
! "
%(incorrect)d message%(sp3)s %(wp2)s classified " \
! "incorrectly " \
! "(%(perc_incorrect).0f%% of total)" \
! "
%(fp)d false positive%(sp4)s " \
! "(%(perc_fp).0f%% of total)" \
! "
%(fn)d false negative%(sp5)s " \
! "(%(perc_fn).0f%% of total)" % \
! format_dict)
! push("%(trn_unsure_ham)d unsure%(sp6)s trained as good " \
! "(%(unsure_ham_perc).0f%% of unsures)" \
! "
%(trn_unsure_spam)d unsure%(sp7)s trained as spam " \
! "(%(unsure_spam_perc).0f%% of unsures)" \
! "
%(not_trn_unsure)d unsure%(sp8)s %(wp3)s not trained " \
! "(%(unsure_not_perc).0f%% of unsures)" % \
! format_dict)
! push("A total of %(trn_total)d message%(sp9)s have been trained:" \
! "
%(trn_ham)d good " \
! "(%(trn_perc_ham)0.f%% good, %(trn_perc_unsure_ham)0.f%% " \
! "unsure, %(trn_perc_fp).0f%% false positives)" \
! "
%(trn_spam)d spam " \
! "(%(trn_perc_spam)0.f%% spam, %(trn_perc_unsure_spam)0.f%% " \
! "unsure, %(trn_perc_fn).0f%% false negatives)" % \
! format_dict)
return chunks
+
if __name__=='__main__':
s = Stats()
From anadelonbrin at users.sourceforge.net Tue Nov 2 22:27:46 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Tue Nov 2 22:27:50 2004
Subject: [Spambayes-checkins] spambayes/spambayes i18n.py, NONE,
1.1 Options.py, 1.114, 1.115
Message-ID:
Update of /cvsroot/spambayes/spambayes/spambayes
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv5690/spambayes
Modified Files:
Options.py
Added Files:
i18n.py
Log Message:
Add [ 1052816 ] I18N
This is basically the patch created by Hernan Martinez Foffani, but modified a tad
by me.
--- NEW FILE: i18n.py ---
"""Internationalisation
Classes:
LanguageManager - Interface class for languages.
Abstract:
Manages the internationalisation (i18n) aspects of SpamBayes.
"""
# This module is part of the spambayes project, which is Copyright 2002-4
# The Python Software Foundation and is covered by the Python Software
# Foundation license.
__author__ = "Hernan Martinez Foffani "
__credits__ = "Tony Meyer, All the SpamBayes folk."
try:
True, False
except NameError:
# Maintain compatibility with Python 2.2
True, False = 1, 0
import os
import sys
from locale import getdefaultlocale
from gettext import translation, NullTranslations
# Note, we must not import spambayes.Options, or Outlook will not be happy.
## Set language environment for gettext and for dynamic load of dialogs.
##
## Our directory layout is:
## spambayes
## spambayes
## i18n.py <--- this file
## languages <--- the directory for lang packs
## es <-- generic language data
## DIALOGS
## LC_MESSAGES
## es_ES <-- specific language/country data.
## DIALOGS <-- resource dialogs
## LC_MESSAGES <-- gettext messages files
## zn
## zn_TW
## Outlook2000
## utilities
## ..etc..
class LanguageManager:
def __init__(self, directory=os.path.dirname(__file__)):
"""Initialisation.
'directory' is the parent directory of the 'languages'
directory. It defaults to the directory of this module."""
self.current_langs_codes = []
self.local_dir = os.path.join(directory, "..", "languages")
self._sys_path_modifications = []
def set_language(self, lang_code=None):
"""Set a language as the current one."""
if not lang_code:
return
self.current_langs_codes = [ lang_code ]
self._rebuild_syspath_for_dialogs()
self._install_gettext()
def locale_default_lang(self):
"""Get the default language for the locale."""
# Note that this may return None.
return getdefaultlocale()[0]
def add_language(self, lang_code=None):
"""Add a language to the current languages list.
The list acts as a fallback mechanism, where the first language of
the list is used if possible, and if not the second one, and so on.
"""
if not lang_code:
return
self.current_langs_codes.insert(0, lang_code)
self._rebuild_syspath_for_dialogs()
self._install_gettext()
def clear_language(self):
"""Clear the current language(s) and set SpamBayes to use
the default."""
self.current_langs_codes = []
self._clear_syspath()
lang = NullTranslations()
lang.install()
def _install_gettext(self):
"""Set the gettext specific environment."""
lang = translation("outlook_addin", self.local_dir,
self.current_langs_codes, fallback=True)
lang.install()
def _rebuild_syspath_for_dialogs(self):
"""Add to sys.path the directories of the translated dialogs.
For each language of the current list, we add two directories,
one for language code and country and the other for the language
code only, so we can simulate the fallback procedures."""
self._clear_syspath()
for lcode in self.current_langs_codes:
code_and_country = os.path.join(self.local_dir, lcode,
'DIALOGS')
code_only = os.path.join(self.local_dir, lcode.split("_")[0],
'DIALOGS')
if code_and_country not in sys.path:
sys.path.append(code_and_country)
self._sys_path_modifications.append(code_and_country)
if code_only not in sys.path:
sys.path.append(code_only)
self._sys_path_modifications.append(code_only)
def _clear_syspath(self):
"""Clean sys.path of the stuff that we put in it."""
for path in self._sys_path_modifications:
sys.path.remove(path)
self._sys_path_modifications = []
def test():
lm = LanguageManager()
print "INIT: len(sys.path): ", len(sys.path)
print "TEST default lang"
lm.set_language(lm.locale_default_lang())
print "\tCurrent Languages: ", lm.current_langs_codes
print "\tlen(sys.path): ", len(sys.path)
print "\t", _("Help")
print "TEST clear_language"
lm.clear_language()
print "\tCurrent Languages: ", lm.current_langs_codes
print "\tlen(sys.path): ", len(sys.path)
print "\t", _("Help")
print "TEST set_language"
for langcode in ["kk_KK", "z", "", "es", None, "es_AR"]:
print "lang: ", langcode
lm.set_language(langcode)
print "\tCurrent Languages: ", lm.current_langs_codes
print "\tlen(sys.path): ", len(sys.path)
print "\t", _("Help")
lm.clear_language()
print "TEST add_language"
for langcode in ["kk_KK", "z", "", "es", None, "es_AR"]:
print "lang: ", langcode
lm.add_language(langcode)
print "\tCurrent Languages: ", lm.current_langs_codes
print "\tlen(sys.path): ", len(sys.path)
print "\t", _("Help")
if __name__ == '__main__':
test()
Index: Options.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/spambayes/Options.py,v
retrieving revision 1.114
retrieving revision 1.115
diff -C2 -d -r1.114 -r1.115
*** Options.py 30 Sep 2004 05:16:30 -0000 1.114
--- Options.py 2 Nov 2004 21:27:42 -0000 1.115
***************
*** 1126,1129 ****
--- 1126,1134 ----
entered with the server:port form.""",
SERVER, DO_NOT_RESTORE),
+
+ ("language", "User Interface Language", ("en_US",),
+ """If possible, the user interface should use a language from this
+ list (in order of preference).""",
+ r"\w\w(?:_\w\w)?", RESTORE),
),
}
From anadelonbrin at users.sourceforge.net Tue Nov 2 22:29:42 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Tue Nov 2 22:29:46 2004
Subject: [Spambayes-checkins]
spambayes/Outlook2000/dialogs __init__.py, 1.12, 1.13
Message-ID:
Update of /cvsroot/spambayes/spambayes/Outlook2000/dialogs
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv6166/Outlook2000/dialogs
Modified Files:
__init__.py
Log Message:
Add [ 1052816 ] I18N
This is basically the patch created by Hernan Martinez Foffani, but modified a tad
by me.
Index: __init__.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/Outlook2000/dialogs/__init__.py,v
retrieving revision 1.12
retrieving revision 1.13
diff -C2 -d -r1.12 -r1.13
*** __init__.py 16 Dec 2003 05:06:33 -0000 1.12
--- __init__.py 2 Nov 2004 21:29:36 -0000 1.13
***************
*** 6,15 ****
base_name = os.path.splitext(rc_name)[0]
mod_name = "dialogs.resources." + base_name
! mod = None
# If we are running from source code, check the .py file is up to date
# wrt the .rc file passed in.
# If we are running from binaries, the rc name is not used at all - we
# assume someone running from source previously generated the .py!
! if not hasattr(sys, "frozen"):
from resources import rc2py
rc_path = os.path.dirname( rc2py.__file__ )
--- 6,23 ----
base_name = os.path.splitext(rc_name)[0]
mod_name = "dialogs.resources." + base_name
!
! # I18N
! # Loads a foreign language dialogs.py file, assuming that sys.path
! # already points to one with the foreign language resources.
! try:
! mod = __import__("i18n_" + base_name)
! except ImportError:
! mod = None
!
# If we are running from source code, check the .py file is up to date
# wrt the .rc file passed in.
# If we are running from binaries, the rc name is not used at all - we
# assume someone running from source previously generated the .py!
! if not hasattr(sys, "frozen") and not mod:
from resources import rc2py
rc_path = os.path.dirname( rc2py.__file__ )
From anadelonbrin at users.sourceforge.net Tue Nov 2 22:29:42 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Tue Nov 2 22:29:46 2004
Subject: [Spambayes-checkins] spambayes/Outlook2000/dialogs/resources
rc2py.py, 1.6, 1.7 rcparser.py, 1.11, 1.12
Message-ID:
Update of /cvsroot/spambayes/spambayes/Outlook2000/dialogs/resources
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv6166/Outlook2000/dialogs/resources
Modified Files:
rc2py.py rcparser.py
Log Message:
Add [ 1052816 ] I18N
This is basically the patch created by Hernan Martinez Foffani, but modified a tad
by me.
Index: rc2py.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/Outlook2000/dialogs/resources/rc2py.py,v
retrieving revision 1.6
retrieving revision 1.7
diff -C2 -d -r1.6 -r1.7
*** rc2py.py 26 Aug 2003 10:57:44 -0000 1.6
--- rc2py.py 2 Nov 2004 21:29:36 -0000 1.7
***************
*** 11,15 ****
import rcparser
! def convert(inputFilename = None, outputFilename = None):
"""See the module doc string"""
if inputFilename is None:
--- 11,16 ----
import rcparser
! def convert(inputFilename = None, outputFilename = None,
! enableGettext = True):
"""See the module doc string"""
if inputFilename is None:
***************
*** 17,21 ****
if outputFilename is None:
outputFilename = "test.py"
! rcp = rcparser.ParseDialogs(inputFilename)
in_stat = os.stat(inputFilename)
--- 18,22 ----
if outputFilename is None:
outputFilename = "test.py"
! rcp = rcparser.ParseDialogs(inputFilename, enableGettext)
in_stat = os.stat(inputFilename)
***************
*** 34,39 ****
if __name__=="__main__":
! if len(sys.argv)>1:
! convert(sys.argv[1], sys.argv[2])
else:
convert()
--- 35,42 ----
if __name__=="__main__":
! if len(sys.argv) > 3:
! convert(sys.argv[1], sys.argv[2], bool(sys.argv[3]))
! elif len(sys.argv) > 2:
! convert(sys.argv[1], sys.argv[2], True)
else:
convert()
Index: rcparser.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/Outlook2000/dialogs/resources/rcparser.py,v
retrieving revision 1.11
retrieving revision 1.12
diff -C2 -d -r1.11 -r1.12
*** rcparser.py 16 Dec 2003 05:06:33 -0000 1.11
--- rcparser.py 2 Nov 2004 21:29:37 -0000 1.12
***************
*** 6,9 ****
--- 6,15 ----
__author__="Adam Walker"
+ try:
+ True, False
+ except NameError:
+ # Maintain compatibility with Python 2.2
+ True, False = 1, 0
+
import sys, os, shlex
import win32con
***************
*** 92,95 ****
--- 98,112 ----
+ class gt_str(str):
+ """Change a string to a gettext version of itself."""
+ def __repr__(self):
+ if len(self) > 0:
+ # timeit indicates that addition is faster than interpolation
+ # here
+ return "_(" + super(gt_str, self).__repr__() + ")"
+ else:
+ return super(gt_str, self).__repr__()
+
+
class RCParser:
next_id = 1001
***************
*** 103,106 ****
--- 120,124 ----
self.names = {1:"IDOK", 2:"IDCANCEL", -1:"IDC_STATIC"}
self.bitmaps = {}
+ self.gettexted = False
def debug(self, *args):
***************
*** 293,297 ****
self.token = self.token[1:-1]
self.debug("Caption is:",self.token)
! dlg.caption = self.token
self.getToken()
def dialogFont(self, dlg):
--- 311,319 ----
self.token = self.token[1:-1]
self.debug("Caption is:",self.token)
! if self.gettexted:
! # gettext captions
! dlg.caption = gt_str(self.token)
! else:
! dlg.caption = self.token
self.getToken()
def dialogFont(self, dlg):
***************
*** 313,317 ****
self.getToken()
if self.token[0:1]=='"':
! control.label = self.token[1:-1]
self.getCommaToken()
self.getToken()
--- 335,343 ----
self.getToken()
if self.token[0:1]=='"':
! if self.gettexted:
! # gettext labels
! control.label = gt_str(self.token[1:-1])
! else:
! control.label = self.token[1:-1]
self.getCommaToken()
self.getToken()
***************
*** 352,357 ****
#print control.toString()
dlg.controls.append(control)
! def ParseDialogs(rc_file):
rcp = RCParser()
try:
rcp.loadDialogs(rc_file)
--- 378,385 ----
#print control.toString()
dlg.controls.append(control)
!
! def ParseDialogs(rc_file, gettexted=False):
rcp = RCParser()
+ rcp.gettexted = gettexted
try:
rcp.loadDialogs(rc_file)
From anadelonbrin at users.sourceforge.net Tue Nov 2 22:33:49 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Tue Nov 2 22:33:53 2004
Subject: [Spambayes-checkins] spambayes/Outlook2000 addin.py, 1.133,
1.134 config.py, 1.31, 1.32 config_wizard.py, 1.9,
1.10 filter.py, 1.38, 1.39 manager.py, 1.97, 1.98 oastats.py,
1.7, 1.8
Message-ID:
Update of /cvsroot/spambayes/spambayes/Outlook2000
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv7107/Outlook2000
Modified Files:
addin.py config.py config_wizard.py filter.py manager.py
oastats.py
Log Message:
Wrap strings to translate in _(). Leave all log messages untranslated, because logs
are more use to the spambayes@python.org people than they are to users, really, and
I don't want to have to translate logs into English to try and help people.
Not sure if the config.py stuff is right or not - will check.
Index: addin.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/Outlook2000/addin.py,v
retrieving revision 1.133
retrieving revision 1.134
diff -C2 -d -r1.133 -r1.134
*** addin.py 16 Oct 2004 22:37:10 -0000 1.133
--- addin.py 2 Nov 2004 21:33:46 -0000 1.134
***************
*** 539,554 ****
try:
if spam_folder.GetItemCount() > 0:
! message = "Are you sure you want to permanently delete all items " \
! "in the \"%s\" folder?" % spam_folder.name
if mgr.AskQuestion(message):
! mgr.LogDebug(2, "Emptying spam from folder '%s'" % spam_folder.GetFQName())
import manager
spam_folder.EmptyFolder(manager._GetParent())
else:
! mgr.LogDebug(2, "Spam folder '%s' was already empty" % spam_folder.GetFQName())
! message = "The \"%s\" folder is already empty." % spam_folder.name
mgr.ReportInformation(message)
except:
! mgr.LogDebug(0, "Error emptying spam folder '%s'!" % spam_folder.GetFQName())
traceback.print_exc()
--- 539,559 ----
try:
if spam_folder.GetItemCount() > 0:
! message = _("Are you sure you want to permanently delete " \
! "all items in the \"%s\" folder?") \
! % spam_folder.name
if mgr.AskQuestion(message):
! mgr.LogDebug(2, "Emptying spam from folder '%s'" % \
! spam_folder.GetFQName())
import manager
spam_folder.EmptyFolder(manager._GetParent())
else:
! mgr.LogDebug(2, "Spam folder '%s' was already empty" % \
! spam_folder.GetFQName())
! message = _("The \"%s\" folder is already empty.") % \
! spam_folder.name
mgr.ReportInformation(message)
except:
! mgr.LogDebug(0, "Error emptying spam folder '%s'!" % \
! spam_folder.GetFQName())
traceback.print_exc()
***************
*** 578,584 ****
traceback.print_exc()
manager.ReportError(
! "There was an error checking for the latest version\r\n"
! "For specific details on the error, please see the SpamBayes log"
! "\r\n\r\nPlease check your internet connection, or try again later"
)
return
--- 583,589 ----
traceback.print_exc()
manager.ReportError(
! _("There was an error checking for the latest version\r\n"
! "For specific details on the error, please see the SpamBayes log"
! "\r\n\r\nPlease check your internet connection, or try again later")
)
return
***************
*** 587,600 ****
if latest_ver_num > cur_ver_num:
url = get_version_string(app_name, "Download Page", version_dict=latest)
! msg = "You are running %s\r\n\r\nThe latest available version is %s" \
! "\r\n\r\nThe download page for the latest version is\r\n%s" \
! "\r\n\r\nWould you like to visit this page now?" \
! % (cur_ver_string, latest_ver_string, url)
if manager.AskQuestion(msg):
print "Opening browser page", url
os.startfile(url)
else:
! msg = "The latest available version is %s\r\n\r\n" \
! "No later version is available." % latest_ver_string
manager.ReportInformation(msg)
--- 592,605 ----
if latest_ver_num > cur_ver_num:
url = get_version_string(app_name, "Download Page", version_dict=latest)
! msg = _("You are running %s\r\n\r\nThe latest available version is %s" \
! "\r\n\r\nThe download page for the latest version is\r\n%s" \
! "\r\n\r\nWould you like to visit this page now?") \
! % (cur_ver_string, latest_ver_string, url)
if manager.AskQuestion(msg):
print "Opening browser page", url
os.startfile(url)
else:
! msg = _("The latest available version is %s\r\n\r\n" \
! "No later version is available.") % latest_ver_string
manager.ReportInformation(msg)
***************
*** 635,640 ****
if not self.manager.config.filter.enabled:
self.manager.ReportError(
! "You must configure and enable SpamBayes before you can " \
! "mark messages as spam")
return
SetWaitCursor(1)
--- 640,645 ----
if not self.manager.config.filter.enabled:
self.manager.ReportError(
! _("You must configure and enable SpamBayes before you " \
! "can mark messages as spam"))
return
SetWaitCursor(1)
***************
*** 650,655 ****
pass
if spam_folder is None:
! self.manager.ReportError("You must configure the Spam folder",
! "Invalid Configuration")
return
import train
--- 655,660 ----
pass
if spam_folder is None:
! self.manager.ReportError(_("You must configure the Spam folder"),
! _("Invalid Configuration"))
return
import train
***************
*** 694,699 ****
if not self.manager.config.filter.enabled:
self.manager.ReportError(
! "You must configure and enable SpamBayes before you can " \
! "mark messages as not spam")
return
SetWaitCursor(1)
--- 699,704 ----
if not self.manager.config.filter.enabled:
self.manager.ReportError(
! _("You must configure and enable SpamBayes before you " \
! "can mark messages as not spam"))
return
SetWaitCursor(1)
***************
*** 793,803 ****
# Add our "Spam" and "Not Spam" buttons
! tt_text = "Move the selected message to the Spam folder,\n" \
! "and train the system that this is Spam."
self.but_delete_as = self._AddControl(
None,
constants.msoControlButton,
ButtonDeleteAsSpamEvent, (self.manager, self),
! Caption="Spam",
TooltipText = tt_text,
BeginGroup = False,
--- 798,808 ----
# Add our "Spam" and "Not Spam" buttons
! tt_text = _("Move the selected message to the Spam folder,\n" \
! "and train the system that this is Spam.")
self.but_delete_as = self._AddControl(
None,
constants.msoControlButton,
ButtonDeleteAsSpamEvent, (self.manager, self),
! Caption=_("Spam"),
TooltipText = tt_text,
BeginGroup = False,
***************
*** 805,818 ****
image = "delete_as_spam.bmp")
# And again for "Not Spam"
! tt_text = \
! "Recovers the selected item back to the folder\n" \
! "it was filtered from (or to the Inbox if this\n" \
! "folder is not known), and trains the system that\n" \
! "this is a good message\n"
self.but_recover_as = self._AddControl(
None,
constants.msoControlButton,
ButtonRecoverFromSpamEvent, (self.manager, self),
! Caption="Not Spam",
TooltipText = tt_text,
Tag = "SpamBayesCommand.RecoverFromSpam",
--- 810,823 ----
image = "delete_as_spam.bmp")
# And again for "Not Spam"
! tt_text = _(\
! "Recovers the selected item back to the folder\n" \
! "it was filtered from (or to the Inbox if this\n" \
! "folder is not known), and trains the system that\n" \
! "this is a good message\n")
self.but_recover_as = self._AddControl(
None,
constants.msoControlButton,
ButtonRecoverFromSpamEvent, (self.manager, self),
! Caption=_("Not Spam"),
TooltipText = tt_text,
Tag = "SpamBayesCommand.RecoverFromSpam",
***************
*** 828,833 ****
constants.msoControlPopup,
None, None,
! Caption="SpamBayes",
! TooltipText = "SpamBayes anti-spam filters and functions",
Enabled = True,
Tag = "SpamBayesCommand.Popup")
--- 833,838 ----
constants.msoControlPopup,
None, None,
! Caption=_("SpamBayes"),
! TooltipText = _("SpamBayes anti-spam filters and functions"),
Enabled = True,
Tag = "SpamBayesCommand.Popup")
***************
*** 846,851 ****
constants.msoControlButton,
ButtonEvent, (manager.ShowManager,),
! Caption="SpamBayes Manager...",
! TooltipText = "Show the SpamBayes manager dialog.",
Enabled = True,
Visible=True,
--- 851,856 ----
constants.msoControlButton,
ButtonEvent, (manager.ShowManager,),
! Caption=_("SpamBayes Manager..."),
! TooltipText = _("Show the SpamBayes manager dialog."),
Enabled = True,
Visible=True,
***************
*** 874,878 ****
constants.msoControlButton,
ButtonEvent, (ShowClues, self.manager, self),
! Caption="Show spam clues for current message",
Enabled=True,
Visible=True,
--- 879,883 ----
constants.msoControlButton,
ButtonEvent, (ShowClues, self.manager, self),
! Caption=_("Show spam clues for current message"),
Enabled=True,
Visible=True,
***************
*** 881,885 ****
constants.msoControlButton,
ButtonEvent, (manager.ShowFilterNow,),
! Caption="Filter messages...",
Enabled=True,
Visible=True,
--- 886,890 ----
constants.msoControlButton,
ButtonEvent, (manager.ShowFilterNow,),
! Caption=_("Filter messages..."),
Enabled=True,
Visible=True,
***************
*** 888,892 ****
constants.msoControlButton,
ButtonEvent, (EmptySpamFolder, self.manager),
! Caption="Empty Spam Folder",
Enabled=True,
Visible=True,
--- 893,897 ----
constants.msoControlButton,
ButtonEvent, (EmptySpamFolder, self.manager),
! Caption=_("Empty Spam Folder"),
Enabled=True,
Visible=True,
***************
*** 896,900 ****
constants.msoControlButton,
ButtonEvent, (CheckLatestVersion, self.manager,),
! Caption="Check for new version",
Enabled=True,
Visible=True,
--- 901,905 ----
constants.msoControlButton,
ButtonEvent, (CheckLatestVersion, self.manager,),
! Caption=_("Check for new version"),
Enabled=True,
Visible=True,
***************
*** 905,910 ****
constants.msoControlPopup,
None, None,
! Caption="Help",
! TooltipText = "SpamBayes help documents",
Enabled = True,
Tag = "SpamBayesCommand.HelpPopup")
--- 910,915 ----
constants.msoControlPopup,
None, None,
! Caption=_("Help"),
! TooltipText = _("SpamBayes help documents"),
Enabled = True,
Tag = "SpamBayesCommand.HelpPopup")
***************
*** 912,932 ****
helpPopup = CastTo(helpPopup, "CommandBarPopup")
self._AddHelpControl(helpPopup,
! "About SpamBayes",
"about.html",
"SpamBayesCommand.Help.ShowAbout")
self._AddHelpControl(helpPopup,
! "Troubleshooting Guide",
"docs/troubleshooting.html",
"SpamBayesCommand.Help.ShowTroubleshooting")
self._AddHelpControl(helpPopup,
! "SpamBayes Website",
"http://spambayes.sourceforge.net/",
"SpamBayesCommand.Help.ShowSpamBayes Website")
self._AddHelpControl(helpPopup,
! "Frequently Asked Questions",
"http://spambayes.sourceforge.net/faq.html",
"SpamBayesCommand.Help.ShowFAQ")
self._AddHelpControl(helpPopup,
! "SpamBayes Bug Tracker",
"http://sourceforge.net/tracker/?group_id=61702&atid=498103",
"SpamBayesCommand.Help.BugTacker")
--- 917,937 ----
helpPopup = CastTo(helpPopup, "CommandBarPopup")
self._AddHelpControl(helpPopup,
! _("About SpamBayes"),
"about.html",
"SpamBayesCommand.Help.ShowAbout")
self._AddHelpControl(helpPopup,
! _("Troubleshooting Guide"),
"docs/troubleshooting.html",
"SpamBayesCommand.Help.ShowTroubleshooting")
self._AddHelpControl(helpPopup,
! _("SpamBayes Website"),
"http://spambayes.sourceforge.net/",
"SpamBayesCommand.Help.ShowSpamBayes Website")
self._AddHelpControl(helpPopup,
! _("Frequently Asked Questions"),
"http://spambayes.sourceforge.net/faq.html",
"SpamBayesCommand.Help.ShowFAQ")
self._AddHelpControl(helpPopup,
! _("SpamBayes Bug Tracker"),
"http://sourceforge.net/tracker/?group_id=61702&atid=498103",
"SpamBayesCommand.Help.BugTacker")
***************
*** 937,941 ****
constants.msoControlButton,
ButtonEvent, (Tester, self.manager),
! Caption="Execute test suite",
Enabled=True,
Visible=True,
--- 942,946 ----
constants.msoControlButton,
ButtonEvent, (Tester, self.manager),
! Caption=_("Execute test suite"),
Enabled=True,
Visible=True,
***************
*** 1051,1055 ****
sel = explorer.Selection
if sel.Count > 1 and not allow_multi:
! self.manager.ReportError("Please select a single item", "Large selection")
return None
--- 1056,1061 ----
sel = explorer.Selection
if sel.Count > 1 and not allow_multi:
! self.manager.ReportError(_("Please select a single item"),
! _("Large selection"))
return None
***************
*** 1070,1074 ****
if len(ret) == 0:
! self.manager.ReportError("No filterable mail items are selected", "No selection")
return None
if allow_multi:
--- 1076,1081 ----
if len(ret) == 0:
! self.manager.ReportError(_("No filterable mail items are selected"),
! _("No selection"))
return None
if allow_multi:
***************
*** 1140,1149 ****
print "Error finding the MAPI folders for a folder switch event"
# As this happens once per move, we should only display it once.
! self.manager.ReportErrorOnce(
"There appears to be a problem with the SpamBayes"
" configuration\r\n\r\nPlease select the SpamBayes"
" manager, and run the\r\nConfiguration Wizard to"
! " reconfigure the filter.",
! "Invalid SpamBayes Configuration")
traceback.print_exc()
if self.but_recover_as is not None:
--- 1147,1156 ----
print "Error finding the MAPI folders for a folder switch event"
# As this happens once per move, we should only display it once.
! self.manager.ReportErrorOnce(_(
"There appears to be a problem with the SpamBayes"
" configuration\r\n\r\nPlease select the SpamBayes"
" manager, and run the\r\nConfiguration Wizard to"
! " reconfigure the filter."),
! _("Invalid SpamBayes Configuration"))
traceback.print_exc()
if self.but_recover_as is not None:
***************
*** 1255,1258 ****
--- 1262,1267 ----
print "Error connecting to Outlook!"
traceback.print_exc()
+ # We can't translate this string, as we haven't managed to load
+ # the translation tools.
manager.ReportError(
"There was an error initializing the SpamBayes addin\r\n\r\n"
***************
*** 1278,1283 ****
if not self.manager.config.filter.spam_folder_id or \
not self.manager.config.filter.watch_folder_ids:
! msg = "It appears there was an error loading your configuration\r\n\r\n" \
! "Please re-configure SpamBayes via the SpamBayes dropdown"
self.manager.ReportError(msg)
# But continue on regardless.
--- 1287,1292 ----
if not self.manager.config.filter.spam_folder_id or \
not self.manager.config.filter.watch_folder_ids:
! msg = _("It appears there was an error loading your configuration\r\n\r\n" \
! "Please re-configure SpamBayes via the SpamBayes dropdown")
self.manager.ReportError(msg)
# But continue on regardless.
***************
*** 1293,1298 ****
# being enabled. The new Wizard should help, but things can
# still screw up.
! self.manager.LogDebug(0, "*** SpamBayes is NOT enabled, so will " \
! "not filter incoming mail. ***")
# Toolbar and other UI stuff must be setup once startup is complete.
explorers = self.application.Explorers
--- 1302,1307 ----
# being enabled. The new Wizard should help, but things can
# still screw up.
! self.manager.LogDebug(0, _("*** SpamBayes is NOT enabled, so " \
! "will not filter incoming mail. ***"))
# Toolbar and other UI stuff must be setup once startup is complete.
explorers = self.application.Explorers
Index: config.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/Outlook2000/config.py,v
retrieving revision 1.31
retrieving revision 1.32
diff -C2 -d -r1.31 -r1.32
*** config.py 1 Oct 2004 14:31:34 -0000 1.31
--- config.py 2 Nov 2004 21:33:46 -0000 1.32
***************
*** 12,15 ****
--- 12,16 ----
import sys, types
+ def _(text): return text
try:
***************
*** 22,27 ****
FOLDER_ID = r"\(\'[a-fA-F0-9]+\', \'[a-fA-F0-9]+\'\)"
FIELD_NAME = r"[a-zA-Z0-9 ]+"
! FILTER_ACTION = "Untouched", "Moved", "Copied"
! MSG_READ_STATE = "None", "Read", "Unread"
from spambayes.OptionsClass import OptionsClass, Option
--- 23,28 ----
FOLDER_ID = r"\(\'[a-fA-F0-9]+\', \'[a-fA-F0-9]+\'\)"
FIELD_NAME = r"[a-zA-Z0-9 ]+"
! FILTER_ACTION = _("Untouched"), _("Moved"), _("Copied")
! MSG_READ_STATE = _("None"), _("Read"), _("Unread")
from spambayes.OptionsClass import OptionsClass, Option
***************
*** 89,109 ****
defaults = {
"General" : (
! ("field_score_name", "The name of the field used to store the spam score", "Spam",
! """SpamBayes stores the spam score for each message in a custom field.
! This option specifies the name of the field""",
FIELD_NAME, RESTORE),
! ("data_directory", "The directory to store the data files.", "",
! """""",
PATH, DO_NOT_RESTORE),
! ("delete_as_spam_message_state", "How the 'read' flag on a message is modified", "None",
! """When the 'Spam' function is used, the message 'read' flag can
! also be set.""",
MSG_READ_STATE, RESTORE),
! ("recover_from_spam_message_state", "How the 'read' flag on a message is modified", "None",
! """When the 'Not Spam' function is used, the message 'read' flag can
! also be set.""",
MSG_READ_STATE, RESTORE),
! ("verbose", "Changes the verbosity of the debug output from the program", 0,
! """Indicates how much information is written to the SpamBayes log file.""",
INTEGER, RESTORE),
),
--- 90,110 ----
defaults = {
"General" : (
! ("field_score_name", _("The name of the field used to store the spam score"), _("Spam"),
! _("""SpamBayes stores the spam score for each message in a custom field.
! This option specifies the name of the field"""),
FIELD_NAME, RESTORE),
! ("data_directory", _("The directory to store the data files."), "",
! _(""""""),
PATH, DO_NOT_RESTORE),
! ("delete_as_spam_message_state", _("How the 'read' flag on a message is modified"), "None",
! _("""When the 'Spam' function is used, the message 'read' flag can
! also be set."""),
MSG_READ_STATE, RESTORE),
! ("recover_from_spam_message_state", _("How the 'read' flag on a message is modified"), "None",
! _("""When the 'Not Spam' function is used, the message 'read' flag can
! also be set."""),
MSG_READ_STATE, RESTORE),
! ("verbose", _("Changes the verbosity of the debug output from the program"), 0,
! _("""Indicates how much information is written to the SpamBayes log file."""),
INTEGER, RESTORE),
),
***************
*** 118,162 ****
("timer_interval", "obsolete", 1000, "", INTEGER, RESTORE),
("timer_only_receive_folders", "obsolete", True, "", BOOLEAN, RESTORE),
),
"Training" : (
(FolderIDOption,
! "ham_folder_ids", "Folders containing known good messages", [],
! """A list of folders known to contain good (ham) messages. When SpamBayes
! is trained, these messages will be used as examples of good messages.""",
FOLDER_ID, DO_NOT_RESTORE),
! ("ham_include_sub", "Does the nominated ham folders include sub-folders?", False,
! """""",
BOOLEAN, DO_NOT_RESTORE),
(FolderIDOption,
! "spam_folder_ids", "Folders containing known bad or spam messages", [],
! """A list of folders known to contain bad (spam) messages. When SpamBayes
! is trained, these messages will be used as examples of messages to filter.""",
FOLDER_ID, DO_NOT_RESTORE),
! ("spam_include_sub", "Does the nominated spam folders include sub-folders?", False,
! """""",
BOOLEAN, DO_NOT_RESTORE),
! ("train_recovered_spam", "Train as good as items are recovered?", True,
! """SpamBayes can detect when a message previously classified as spam
(or unsure) is moved back to the folder from which it was filtered.
If this option is enabled, SpamBayes will automatically train on
! such messages""",
BOOLEAN, RESTORE),
! ("train_manual_spam", "Train as spam items are manually moved?", True,
! """SpamBayes can detect when a message previously classified as good
(or unsure) is manually moved to the Spam folder. If this option is
! enabled, SpamBayes will automatically train on such messages""",
BOOLEAN, RESTORE),
! ("rescore", "Rescore message after training?", True,
! """After the training has completed, should all the messages be
scored for their Spam value. This is particularly useful after
your initial training runs, so you can see how effective your
! sorting of spam and ham was.""",
BOOLEAN, RESTORE),
! ("rebuild", "Rescore message after training?", True,
! """Should the entire database be rebuilt? If enabled, then all
training information is reset, and a complete new database built
from the existing messages in your folders. If disabled, then only
new messages in the folders that have not previously been trained
! on will be processed""",
BOOLEAN, RESTORE),
),
--- 119,166 ----
("timer_interval", "obsolete", 1000, "", INTEGER, RESTORE),
("timer_only_receive_folders", "obsolete", True, "", BOOLEAN, RESTORE),
+ # Rather than fpfnunsure, do tte. DeleteAs/RecoverFrom just move
+ # the message, and a tte update is done on close.
+ ("train_to_exhaustion", "Train to exhaustion", False, "", BOOLEAN, RESTORE),
),
"Training" : (
(FolderIDOption,
! "ham_folder_ids", _("Folders containing known good messages"), [],
! _("""A list of folders known to contain good (ham) messages. When SpamBayes
! is trained, these messages will be used as examples of good messages."""),
FOLDER_ID, DO_NOT_RESTORE),
! ("ham_include_sub", _("Does the nominated ham folders include sub-folders?"), False,
! _(""""""),
BOOLEAN, DO_NOT_RESTORE),
(FolderIDOption,
! "spam_folder_ids", _("Folders containing known bad or spam messages"), [],
! _("""A list of folders known to contain bad (spam) messages. When SpamBayes
! is trained, these messages will be used as examples of messages to filter."""),
FOLDER_ID, DO_NOT_RESTORE),
! ("spam_include_sub", _("Does the nominated spam folders include sub-folders?"), False,
! _(""""""),
BOOLEAN, DO_NOT_RESTORE),
! ("train_recovered_spam", _("Train as good as items are recovered?"), True,
! _("""SpamBayes can detect when a message previously classified as spam
(or unsure) is moved back to the folder from which it was filtered.
If this option is enabled, SpamBayes will automatically train on
! such messages"""),
BOOLEAN, RESTORE),
! ("train_manual_spam", _("Train as spam items are manually moved?"), True,
! _("""SpamBayes can detect when a message previously classified as good
(or unsure) is manually moved to the Spam folder. If this option is
! enabled, SpamBayes will automatically train on such messages"""),
BOOLEAN, RESTORE),
! ("rescore", _("Rescore message after training?"), True,
! _("""After the training has completed, should all the messages be
scored for their Spam value. This is particularly useful after
your initial training runs, so you can see how effective your
! sorting of spam and ham was."""),
BOOLEAN, RESTORE),
! ("rebuild", _("Rescore message after training?"), True,
! _("""Should the entire database be rebuilt? If enabled, then all
training information is reset, and a complete new database built
from the existing messages in your folders. If disabled, then only
new messages in the folders that have not previously been trained
! on will be processed"""),
BOOLEAN, RESTORE),
),
***************
*** 164,269 ****
# These options control how a message is categorized
"Filter" : (
! ("filter_now", "State of 'Filter Now' checkbox", False,
! """Something useful.""",
BOOLEAN, RESTORE),
! ("save_spam_info", "Save spam score", True,
! """Should the spam score and other information be saved in each message
! as it is filtered or scored?""",
BOOLEAN, RESTORE),
(FolderIDOption,
! "watch_folder_ids", "Folders to watch for new messages", [],
! """The list of folders SpamBayes will watch for new messages,
! processing messages as defined by the filters.""",
FOLDER_ID, DO_NOT_RESTORE),
! ("watch_include_sub", "Does the nominated watch folders include sub-folders?", False,
! """""",
BOOLEAN, DO_NOT_RESTORE),
(FolderIDOption,
! "spam_folder_id", "The folder used to track spam", None,
! """The folder SpamBayes moves or copies spam to.""",
FOLDER_ID, DO_NOT_RESTORE),
! ("spam_threshold", "The score necessary to be considered 'certain' spam", 90.0,
! """Any message with a Spam score greater than or equal to this value
! will be considered spam, and processed accordingly.""",
REAL, RESTORE),
! ("spam_action", "The action to take for new spam", "Moved",
! """The action that should be taken as Spam messages arrive.""",
FILTER_ACTION, RESTORE),
! ("spam_mark_as_read", "Should filtered spam also be marked as 'read'", False,
! """Determines if spam messages are marked as 'Read' as they are
filtered. This can be set to 'True' if the new-mail folder counts bother
you when the only new items are spam. It can be set to 'False'
if you use the 'read' state of these messages to determine which
items you are yet to review. This option does not affect the
! new-mail icon in the system tray.""",
BOOLEAN, RESTORE),
(FolderIDOption,
! "unsure_folder_id", "The folder used to track uncertain messages", None,
! """The folder SpamBayes moves or copies uncertain messages to.""",
FOLDER_ID, DO_NOT_RESTORE),
! ("unsure_threshold", "The score necessary to be considered 'unsure'", 15.0,
! """Any message with a Spam score greater than or equal to this value
(but less than the spam threshold) will be considered spam, and
! processed accordingly.""",
REAL, RESTORE),
! ("unsure_action", "The action to take for new uncertain messages", "Moved",
! """The action that should be taken as unsure messages arrive.""",
FILTER_ACTION, RESTORE),
! ("unsure_mark_as_read", "Should filtered uncertain message also be marked as 'read'", False,
! """Determines if unsure messages are marked as 'Read' as they are
! filtered. See 'spam_mark_as_read' for more details.""",
BOOLEAN, RESTORE),
! ("enabled", "Is filtering enabled?", False,
! """""",
BOOLEAN, RESTORE),
# Options that allow the filtering to be done by a timer.
! ("timer_enabled", "Should items be filtered by a timer?", True,
! """Depending on a number of factors, SpamBayes may occasionally miss
messages or conflict with builtin Outlook rules. If this option
is set, SpamBayes will filter all messages in the background. This
generally solves both of these problem, at the cost of having Spam stay
! in your inbox for a few extra seconds.""",
BOOLEAN, RESTORE),
! ("timer_start_delay", "The interval (in seconds) before the timer starts.", 2.0,
! """Once a new item is received in the inbox, SpamBayes will begin
processing messages after the given delay. If a new message arrives
! during this period, the timer will be reset and the delay will start again.""",
REAL, RESTORE),
! ("timer_interval", "The interval between subsequent timer checks (in seconds)", 1.0,
! """Once the new message timer finds a new message, how long should
SpamBayes wait before checking for another new message, assuming no
other new messages arrive. Should a new message arrive during this
process, the timer will reset, meaning that timer_start_delay will
! elapse before the process begins again.""",
REAL, RESTORE),
("timer_only_receive_folders",
! "Should the timer only be used for 'Inbox' type folders?", True,
! """The point of using a timer is to prevent the SpamBayes filter
getting in the way the builtin Outlook rules. Therefore, is it
generally only necessary to use a timer for folders that have new
items being delivered directly to them. Folders that are not inbox
style folders generally are not subject to builtin filtering, so
! generally have no problems filtering messages in 'real time'.""",
BOOLEAN, RESTORE),
),
"Filter_Now": (
! (FolderIDOption, "folder_ids", "Folders to filter in a 'Filter Now' operation", [],
! """The list of folders that will be filtered by this process.""",
FOLDER_ID, DO_NOT_RESTORE),
! ("include_sub", "Does the nominated folders include sub-folders?", False,
! """""",
BOOLEAN, DO_NOT_RESTORE),
! ("only_unread", "Only filter unread messages?", False,
! """When scoring messages, should only messages that are unread be
! considered?""",
BOOLEAN, RESTORE),
! ("only_unseen", "Only filter previously unseen ?", False,
! """When scoring messages, should only messages that have never
! previously Spam scored be considered?""",
BOOLEAN, RESTORE),
! ("action_all", "Perform all filter actions?", True,
! """When scoring the messages, should all items be performed (such as
moving the items based on the score) or should the items only be scored,
! but otherwise untouched.""",
BOOLEAN, RESTORE),
),
--- 168,273 ----
# These options control how a message is categorized
"Filter" : (
! ("filter_now", _("State of 'Filter Now' checkbox"), False,
! _("""Something useful."""),
BOOLEAN, RESTORE),
! ("save_spam_info", _("Save spam score"), True,
! _("""Should the spam score and other information be saved in each message
! as it is filtered or scored?"""),
BOOLEAN, RESTORE),
(FolderIDOption,
! "watch_folder_ids", _("Folders to watch for new messages"), [],
! _("""The list of folders SpamBayes will watch for new messages,
! processing messages as defined by the filters."""),
FOLDER_ID, DO_NOT_RESTORE),
! ("watch_include_sub", _("Does the nominated watch folders include sub-folders?"), False,
! _(""""""),
BOOLEAN, DO_NOT_RESTORE),
(FolderIDOption,
! "spam_folder_id", _("The folder used to track spam"), None,
! _("""The folder SpamBayes moves or copies spam to."""),
FOLDER_ID, DO_NOT_RESTORE),
! ("spam_threshold", _("The score necessary to be considered 'certain' spam"), 90.0,
! _("""Any message with a Spam score greater than or equal to this value
! will be considered spam, and processed accordingly."""),
REAL, RESTORE),
! ("spam_action", _("The action to take for new spam"), "Moved",
! _("""The action that should be taken as Spam messages arrive."""),
FILTER_ACTION, RESTORE),
! ("spam_mark_as_read", _("Should filtered spam also be marked as 'read'"), False,
! _("""Determines if spam messages are marked as 'Read' as they are
filtered. This can be set to 'True' if the new-mail folder counts bother
you when the only new items are spam. It can be set to 'False'
if you use the 'read' state of these messages to determine which
items you are yet to review. This option does not affect the
! new-mail icon in the system tray."""),
BOOLEAN, RESTORE),
(FolderIDOption,
! "unsure_folder_id", _("The folder used to track uncertain messages"), None,
! _("""The folder SpamBayes moves or copies uncertain messages to."""),
FOLDER_ID, DO_NOT_RESTORE),
! ("unsure_threshold", _("The score necessary to be considered 'unsure'"), 15.0,
! _("""Any message with a Spam score greater than or equal to this value
(but less than the spam threshold) will be considered spam, and
! processed accordingly."""),
REAL, RESTORE),
! ("unsure_action", _("The action to take for new uncertain messages"), FILTER_ACTION[1],
! _("""The action that should be taken as unsure messages arrive."""),
FILTER_ACTION, RESTORE),
! ("unsure_mark_as_read", _("Should filtered uncertain message also be marked as 'read'"), False,
! _("""Determines if unsure messages are marked as 'Read' as they are
! filtered. See 'spam_mark_as_read' for more details."""),
BOOLEAN, RESTORE),
! ("enabled", _("Is filtering enabled?"), False,
! _(""""""),
BOOLEAN, RESTORE),
# Options that allow the filtering to be done by a timer.
! ("timer_enabled", _("Should items be filtered by a timer?"), True,
! _("""Depending on a number of factors, SpamBayes may occasionally miss
messages or conflict with builtin Outlook rules. If this option
is set, SpamBayes will filter all messages in the background. This
generally solves both of these problem, at the cost of having Spam stay
! in your inbox for a few extra seconds."""),
BOOLEAN, RESTORE),
! ("timer_start_delay", _("The interval (in seconds) before the timer starts."), 2.0,
! _("""Once a new item is received in the inbox, SpamBayes will begin
processing messages after the given delay. If a new message arrives
! during this period, the timer will be reset and the delay will start again."""),
REAL, RESTORE),
! ("timer_interval", _("The interval between subsequent timer checks (in seconds)"), 1.0,
! _("""Once the new message timer finds a new message, how long should
SpamBayes wait before checking for another new message, assuming no
other new messages arrive. Should a new message arrive during this
process, the timer will reset, meaning that timer_start_delay will
! elapse before the process begins again."""),
REAL, RESTORE),
("timer_only_receive_folders",
! _("Should the timer only be used for 'Inbox' type folders?"), True,
! _("""The point of using a timer is to prevent the SpamBayes filter
getting in the way the builtin Outlook rules. Therefore, is it
generally only necessary to use a timer for folders that have new
items being delivered directly to them. Folders that are not inbox
style folders generally are not subject to builtin filtering, so
! generally have no problems filtering messages in 'real time'."""),
BOOLEAN, RESTORE),
),
"Filter_Now": (
! (FolderIDOption, "folder_ids", _("Folders to filter in a 'Filter Now' operation"), [],
! _("""The list of folders that will be filtered by this process."""),
FOLDER_ID, DO_NOT_RESTORE),
! ("include_sub", _("Does the nominated folders include sub-folders?"), False,
! _(""""""),
BOOLEAN, DO_NOT_RESTORE),
! ("only_unread", _("Only filter unread messages?"), False,
! _("""When scoring messages, should only messages that are unread be
! considered?"""),
BOOLEAN, RESTORE),
! ("only_unseen", _("Only filter previously unseen ?"), False,
! _("""When scoring messages, should only messages that have never
! previously Spam scored be considered?"""),
BOOLEAN, RESTORE),
! ("action_all", _("Perform all filter actions?"), True,
! _("""When scoring the messages, should all items be performed (such as
moving the items based on the score) or should the items only be scored,
! but otherwise untouched."""),
BOOLEAN, RESTORE),
),
***************
*** 306,326 ****
# Migrate some "old" options to "new" options. Can be deleted in
# a few versions :)
! # Binary007 last with experimental timer values.
! delay = options.get("Experimental", "timer_start_delay")
! interval = options.get("Experimental", "timer_interval")
! if delay and interval:
! options.set("Filter", "timer_enabled", True)
! options.set("Filter", "timer_start_delay", float(delay / 1000))
! options.set("Filter", "timer_interval", float(interval / 1000))
! # and reset the old options so they are not written to the new file
! # (actually, resetting isn't enough - must hack and clobber)
! del options._options["Experimental", "timer_start_delay"]
! del options._options["Experimental", "timer_interval"]
!
! torf = options.get("Experimental", "timer_only_receive_folders")
! if not torf:
! options.set("Filter", "timer_only_receive_folders", False)
! # and reset old
! del options._options["Experimental", "timer_only_receive_folders"]
# Old code when we used a pickle. Still needed so old pickles can be
--- 310,314 ----
# Migrate some "old" options to "new" options. Can be deleted in
# a few versions :)
! pass
# Old code when we used a pickle. Still needed so old pickles can be
***************
*** 347,350 ****
--- 335,340 ----
# End of old pickle code.
+ del _
+
if __name__=='__main__':
options = CreateConfig()
Index: config_wizard.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/Outlook2000/config_wizard.py,v
retrieving revision 1.9
retrieving revision 1.10
diff -C2 -d -r1.9 -r1.10
*** config_wizard.py 16 Dec 2003 05:06:33 -0000 1.9
--- config_wizard.py 2 Nov 2004 21:33:46 -0000 1.10
***************
*** 83,88 ****
return new_folder
except:
! msg = "There was an error creating the folder named '%s'\r\n" \
! "Please restart Outlook and try again" % name
manager.ReportError(msg)
return None
--- 83,88 ----
return new_folder
except:
! msg = _("There was an error creating the folder named '%s'\r\n" \
! "Please restart Outlook and try again") % name
manager.ReportError(msg)
return None
Index: filter.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/Outlook2000/filter.py,v
retrieving revision 1.38
retrieving revision 1.39
diff -C2 -d -r1.38 -r1.39
*** filter.py 17 Mar 2004 14:11:22 -0000 1.38
--- filter.py 2 Nov 2004 21:33:46 -0000 1.39
***************
*** 141,148 ****
config = config.filter_now
if not config.folder_ids:
! progress.error("You must specify at least one folder")
return
! progress.set_status("Counting messages")
num_msgs = 0
for f in mgr.message_store.GetFolderGenerator(config.folder_ids, config.include_sub):
--- 141,148 ----
config = config.filter_now
if not config.folder_ids:
! progress.error(_("You must specify at least one folder"))
return
! progress.set_status(_("Counting messages"))
num_msgs = 0
for f in mgr.message_store.GetFolderGenerator(config.folder_ids, config.include_sub):
***************
*** 151,155 ****
dispositions = {}
for f in mgr.message_store.GetFolderGenerator(config.folder_ids, config.include_sub):
! progress.set_status("Filtering folder '%s'" % (f.name))
this_dispositions = filter_folder(f, mgr, config, progress)
for key, val in this_dispositions.items():
--- 151,155 ----
dispositions = {}
for f in mgr.message_store.GetFolderGenerator(config.folder_ids, config.include_sub):
! progress.set_status(_("Filtering folder '%s'") % (f.name))
this_dispositions = filter_folder(f, mgr, config, progress)
for key, val in this_dispositions.items():
***************
*** 160,167 ****
err_text = ""
if dispositions.has_key("Error"):
! err_text = " (%d errors)" % dispositions["Error"]
dget = dispositions.get
! text = "Found %d spam, %d unsure and %d good messages%s" % \
! (dget("Yes",0), dget("Unsure",0), dget("No",0), err_text)
progress.set_status(text)
--- 160,167 ----
err_text = ""
if dispositions.has_key("Error"):
! err_text = _(" (%d errors)") % dispositions["Error"]
dget = dispositions.get
! text = _("Found %d spam, %d unsure and %d good messages%s") % \
! (dget("Yes",0), dget("Unsure",0), dget("No",0), err_text)
progress.set_status(text)
Index: manager.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/Outlook2000/manager.py,v
retrieving revision 1.97
retrieving revision 1.98
diff -C2 -d -r1.97 -r1.98
*** manager.py 14 Oct 2004 23:36:12 -0000 1.97
--- manager.py 2 Nov 2004 21:33:46 -0000 1.98
***************
*** 107,110 ****
--- 107,111 ----
# stuff, which can include spambayes.Options, and assume sys.path in place.
def import_early_core_spambayes_stuff():
+ global bayes_i18n
try:
from spambayes import OptionsClass
***************
*** 113,119 ****
".."))
sys.path.insert(0, parent)
def import_core_spambayes_stuff(ini_filenames):
! global bayes_classifier, bayes_tokenize, bayes_storage
if "spambayes.Options" in sys.modules:
# The only thing we are worried about here is spambayes.Options
--- 114,122 ----
".."))
sys.path.insert(0, parent)
+ from spambayes import i18n
+ bayes_i18n = i18n
def import_core_spambayes_stuff(ini_filenames):
! global bayes_classifier, bayes_tokenize, bayes_storage, bayes_options
if "spambayes.Options" in sys.modules:
# The only thing we are worried about here is spambayes.Options
***************
*** 146,149 ****
--- 149,154 ----
assert "spambayes.Options" in sys.modules, \
"Expected 'spambayes.Options' to be loaded here"
+ from spambayes.Options import options
+ bayes_options = options
# Function to "safely" save a pickle, only overwriting
***************
*** 340,343 ****
--- 345,354 ----
self.application_directory = os.path.dirname(this_filename)
+
+ # Load the environment for translation.
+ lang_manager = bayes_i18n.LanguageManager(self.application_directory)
+ # Set the system user default language.
+ lang_manager.set_language(lang_manager.locale_default_lang())
+
# where windows would like our data stored (and where
# we do, unless overwritten via a config file)
***************
*** 397,400 ****
--- 408,421 ----
import_core_spambayes_stuff(bayes_option_filenames)
+ # Set interface to use the user language in configuration file.
+ for language in bayes_options["globals", "language"][::-1]:
+ # We leave the default in there as the last option, to fall
+ # back on if necessary.
+ lang_manager.add_language(language)
+ self.LogDebug(1, "Asked to add languages: " + \
+ ", ".join(bayes_options["globals", "language"]))
+ self.LogDebug(1, "Set language to " + \
+ str(lang_manager.current_langs_codes))
+
bayes_base = os.path.join(self.data_directory, "default_bayes_database")
mdb_base = os.path.join(self.data_directory, "default_message_database")
***************
*** 450,458 ****
if not self.reported_startup_error:
self.reported_startup_error = True
! full_message = \
"There was an error initializing the Spam plugin.\r\n\r\n" \
"Spam filtering has been disabled. Please re-configure\r\n" \
"and re-enable this plugin\r\n\r\n" \
! "Error details:\r\n" + message
# Disable the plugin
if self.config is not None:
--- 471,479 ----
if not self.reported_startup_error:
self.reported_startup_error = True
! full_message = _(\
"There was an error initializing the Spam plugin.\r\n\r\n" \
"Spam filtering has been disabled. Please re-configure\r\n" \
"and re-enable this plugin\r\n\r\n" \
! "Error details:\r\n") + message
# Disable the plugin
if self.config is not None:
***************
*** 569,573 ****
# Regarding the property type:
# We originally wanted to use the "Integer" Outlook field,
! # but it seems this property type alone is not expose via the Object
# model. So we resort to olPercent, and live with the % sign
# (which really is OK!)
--- 590,594 ----
# Regarding the property type:
# We originally wanted to use the "Integer" Outlook field,
! # but it seems this property type alone is not exposed via the Object
# model. So we resort to olPercent, and live with the % sign
# (which really is OK!)
***************
*** 640,646 ****
self.options.merge_file(filename)
except:
! msg = "The configuration file named below is invalid.\r\n" \
"Please either correct or remove this file\r\n\r\n" \
! "Filename: " + filename
self.ReportError(msg)
--- 661,667 ----
self.options.merge_file(filename)
except:
! msg = _("The configuration file named below is invalid.\r\n" \
"Please either correct or remove this file\r\n\r\n" \
! "Filename: ") + filename
self.ReportError(msg)
***************
*** 710,717 ****
print "FAILED to load old pickle"
traceback.print_exc()
! msg = "There was an error loading your old\r\n" \
! "SpamBayes configuration file.\r\n\r\n" \
! "It is likely that you will need to re-configure\r\n" \
! "SpamBayes before it will function correctly."
self.ReportError(msg)
# But we can't abort yet - we really should still try and
--- 731,738 ----
print "FAILED to load old pickle"
traceback.print_exc()
! msg = _("There was an error loading your old\r\n" \
! "SpamBayes configuration file.\r\n\r\n" \
! "It is likely that you will need to re-configure\r\n" \
! "SpamBayes before it will function correctly.")
self.ReportError(msg)
# But we can't abort yet - we really should still try and
***************
*** 739,746 ****
os.remove(pickle_filename)
except os.error:
! msg = "There was an error migrating and removing your old\r\n" \
! "SpamBayes configuration file. Configuration changes\r\n" \
! "you make are unlikely to be reflected next\r\n" \
! "time you start Outlook. Please try rebooting."
self.ReportError(msg)
--- 760,767 ----
os.remove(pickle_filename)
except os.error:
! msg = _("There was an error migrating and removing your old\r\n" \
! "SpamBayes configuration file. Configuration changes\r\n" \
! "you make are unlikely to be reflected next\r\n" \
! "time you start Outlook. Please try rebooting.")
self.ReportError(msg)
***************
*** 804,810 ****
# See bug 706520 assert fails in classifier
# For now, just tell the user.
! msg = "It appears your SpamBayes training database is corrupt.\r\n\r\n" \
! "We are working on solving this, but unfortunately you\r\n" \
! "must re-train the system via the SpamBayes manager."
self.ReportErrorOnce(msg)
# and disable the addin, as we are hosed!
--- 825,831 ----
# See bug 706520 assert fails in classifier
# For now, just tell the user.
! msg = _("It appears your SpamBayes training database is corrupt.\r\n\r\n" \
! "We are working on solving this, but unfortunately you\r\n" \
! "must re-train the system via the SpamBayes manager.")
self.ReportErrorOnce(msg)
# and disable the addin, as we are hosed!
***************
*** 819,829 ****
ok_to_enable = operator.truth(config.watch_folder_ids)
if not ok_to_enable:
! return "You must define folders to watch for new messages. " \
! "Select the 'Filtering' tab to define these folders."
ok_to_enable = operator.truth(config.spam_folder_id)
if not ok_to_enable:
! return "You must define the folder to receive your certain spam. " \
! "Select the 'Filtering' tab to define this folders."
# Check that the user hasn't selected the same folder as both
--- 840,850 ----
ok_to_enable = operator.truth(config.watch_folder_ids)
if not ok_to_enable:
! return _("You must define folders to watch for new messages. " \
! "Select the 'Filtering' tab to define these folders.")
ok_to_enable = operator.truth(config.spam_folder_id)
if not ok_to_enable:
! return _("You must define the folder to receive your certain spam. " \
! "Select the 'Filtering' tab to define this folder.")
# Check that the user hasn't selected the same folder as both
***************
*** 835,843 ****
unsure_folder = ms.GetFolder(config.unsure_folder_id)
except ms.MsgStoreException, details:
! return "The unsure folder is invalid: %s" % (details,)
try:
spam_folder = ms.GetFolder(config.spam_folder_id)
except ms.MsgStoreException, details:
! return "The spam folder is invalid: %s" % (details,)
if ok_to_enable:
for folder in ms.GetFolderGenerator(config.watch_folder_ids,
--- 856,864 ----
unsure_folder = ms.GetFolder(config.unsure_folder_id)
except ms.MsgStoreException, details:
! return _("The unsure folder is invalid: %s") % (details,)
try:
spam_folder = ms.GetFolder(config.spam_folder_id)
except ms.MsgStoreException, details:
! return _("The spam folder is invalid: %s") % (details,)
if ok_to_enable:
for folder in ms.GetFolderGenerator(config.watch_folder_ids,
***************
*** 845,857 ****
bad_folder_type = None
if unsure_folder is not None and unsure_folder == folder:
! bad_folder_type = "unsure"
bad_folder_name = unsure_folder.GetFQName()
if spam_folder == folder:
! bad_folder_type = "spam"
bad_folder_name = spam_folder.GetFQName()
if bad_folder_type is not None:
! return "You can not specify folder '%s' as both the " \
! "%s folder, and as being watched." \
! % (bad_folder_name, bad_folder_type)
return None
--- 866,878 ----
bad_folder_type = None
if unsure_folder is not None and unsure_folder == folder:
! bad_folder_type = _("unsure")
bad_folder_name = unsure_folder.GetFQName()
if spam_folder == folder:
! bad_folder_type = _("spam")
bad_folder_name = spam_folder.GetFQName()
if bad_folder_type is not None:
! return _("You can not specify folder '%s' as both the " \
! "%s folder, and as being watched.") \
! % (bad_folder_name, bad_folder_type)
return None
Index: oastats.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/Outlook2000/oastats.py,v
retrieving revision 1.7
retrieving revision 1.8
diff -C2 -d -r1.7 -r1.8
*** oastats.py 20 Oct 2004 00:03:47 -0000 1.7
--- oastats.py 2 Nov 2004 21:33:46 -0000 1.8
***************
*** 91,95 ****
totals["num_unsure"])
if num_seen==0:
! return ["SpamBayes has processed zero messages"]
chunks = []
push = chunks.append
--- 91,95 ----
totals["num_unsure"])
if num_seen==0:
! return [_("SpamBayes has processed zero messages")]
chunks = []
push = chunks.append
***************
*** 130,151 ****
% (decimal_points,)
format_dict["perc"] = "%"
! push(("SpamBayes has processed %(num_seen)d messages - " \
"%(num_ham)d (%(perc_ham_s)s) good, " \
"%(num_spam)d (%(perc_spam_s)s) spam " \
! "and %(num_unsure)d (%(perc_unsure_s)s) unsure" \
% format_dict) % format_dict)
if num_recovered_good:
! push("%(num_recovered_good)d message(s) were manually " \
"classified as good (with %(num_recovered_good_fp)d " \
! "being false positives)" % format_dict)
else:
! push("No messages were manually classified as good")
if num_deleted_spam:
! push("%(num_deleted_spam)d message(s) were manually " \
"classified as spam (with %(num_deleted_spam_fn)d " \
! "being false negatives)" % format_dict)
else:
! push("No messages were manually classified as spam")
return chunks
--- 130,151 ----
% (decimal_points,)
format_dict["perc"] = "%"
! push((_("SpamBayes has processed %(num_seen)d messages - " \
"%(num_ham)d (%(perc_ham_s)s) good, " \
"%(num_spam)d (%(perc_spam_s)s) spam " \
! "and %(num_unsure)d (%(perc_unsure_s)s) unsure") \
% format_dict) % format_dict)
if num_recovered_good:
! push(_("%(num_recovered_good)d message(s) were manually " \
"classified as good (with %(num_recovered_good_fp)d " \
! "being false positives)") % format_dict)
else:
! push(_("No messages were manually classified as good"))
if num_deleted_spam:
! push(_("%(num_deleted_spam)d message(s) were manually " \
"classified as spam (with %(num_deleted_spam_fn)d " \
! "being false negatives)") % format_dict)
else:
! push(_("No messages were manually classified as spam"))
return chunks
From anadelonbrin at users.sourceforge.net Tue Nov 2 22:34:59 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Tue Nov 2 22:35:01 2004
Subject: [Spambayes-checkins] spambayes/Outlook2000 msgstore.py,1.87,1.88
Message-ID:
Update of /cvsroot/spambayes/spambayes/Outlook2000
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv7324/Outlook2000
Modified Files:
msgstore.py
Log Message:
Wrap strings to translate in _(). Leave all log messages untranslated, because logs
are more use to the spambayes@python.org people than they are to users, really, and
I don't want to have to translate logs into English to try and help people.
Also add "X-Exchange-Delivery-Time" to the faked up Exchange headers.
Index: msgstore.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/Outlook2000/msgstore.py,v
retrieving revision 1.87
retrieving revision 1.88
diff -C2 -d -r1.87 -r1.88
*** msgstore.py 16 Jul 2004 15:23:10 -0000 1.87
--- msgstore.py 2 Nov 2004 21:34:56 -0000 1.88
***************
*** 149,163 ****
hr, exc_msg, exc, arg_err = exc_val
if hr == mapi.MAPI_E_TABLE_TOO_BIG:
! err_msg = what + " failed as one of your\r\n" \
"Outlook folders is full. Futher operations are\r\n" \
"likely to fail until you clean up this folder.\r\n\r\n" \
"This message will not be reported again until SpamBayes\r\n"\
! "is restarted."
else:
! err_msg = what + " failed due to an unexpected Outlook error.\r\n" \
! + GetCOMExceptionString(exc_val) + "\r\n\r\n" \
! "It is recommended you restart Outlook at the earliest opportunity\r\n\r\n" \
! "This message will not be reported again until SpamBayes\r\n"\
! "is restarted."
manager.ReportErrorOnce(err_msg)
--- 149,163 ----
hr, exc_msg, exc, arg_err = exc_val
if hr == mapi.MAPI_E_TABLE_TOO_BIG:
! err_msg = what + _(" failed as one of your\r\n" \
"Outlook folders is full. Futher operations are\r\n" \
"likely to fail until you clean up this folder.\r\n\r\n" \
"This message will not be reported again until SpamBayes\r\n"\
! "is restarted.")
else:
! err_msg = what + _(" failed due to an unexpected Outlook error.\r\n") \
! + GetCOMExceptionString(exc_val) + "\r\n\r\n" + \
! _("It is recommended you restart Outlook at the earliest opportunity\r\n\r\n" \
! "This message will not be reported again until SpamBayes\r\n"\
! "is restarted.")
manager.ReportErrorOnce(err_msg)
***************
*** 976,980 ****
# This is designed to fake up some SMTP headers for messages
# on an exchange server that do not have such headers of their own
! prop_ids = PR_SUBJECT_A, PR_DISPLAY_NAME_A, PR_DISPLAY_TO_A, PR_DISPLAY_CC_A
hr, data = self.mapi_object.GetProps(prop_ids,0)
subject = self._GetPotentiallyLargeStringProp(prop_ids[0], data[0])
--- 976,981 ----
# This is designed to fake up some SMTP headers for messages
# on an exchange server that do not have such headers of their own
! prop_ids = PR_SUBJECT_A, PR_DISPLAY_NAME_A, PR_DISPLAY_TO_A, \
! PR_DISPLAY_CC_A, PR_MESSAGE_DELIVERY_TIME
hr, data = self.mapi_object.GetProps(prop_ids,0)
subject = self._GetPotentiallyLargeStringProp(prop_ids[0], data[0])
***************
*** 982,985 ****
--- 983,987 ----
to = self._GetPotentiallyLargeStringProp(prop_ids[2], data[2])
cc = self._GetPotentiallyLargeStringProp(prop_ids[3], data[3])
+ delivery_time = data[4][1]
headers = ["X-Exchange-Message: true"]
if subject: headers.append("Subject: "+subject)
***************
*** 987,990 ****
--- 989,997 ----
if to: headers.append("To: "+to)
if cc: headers.append("CC: "+cc)
+ if delivery_time:
+ from time import timezone
+ from email.Utils import formatdate
+ headers.append("X-Exchange-Delivery-Time: "+\
+ formatdate(int(delivery_time)-timezone, True))
return "\n".join(headers) + "\n"
***************
*** 1211,1220 ****
self.MoveTo(folder)
except MsgStoreException, details:
! ReportMAPIError(manager, "Moving a message", details.mapi_exception)
def CopyToReportingError(self, manager, folder):
try:
self.MoveTo(folder)
except MsgStoreException, details:
! ReportMAPIError(manager, "Copying a message", details.mapi_exception)
def GetFolder(self):
--- 1218,1229 ----
self.MoveTo(folder)
except MsgStoreException, details:
! ReportMAPIError(manager, _("Moving a message"),
! details.mapi_exception)
def CopyToReportingError(self, manager, folder):
try:
self.MoveTo(folder)
except MsgStoreException, details:
! ReportMAPIError(manager, _("Copying a message"),
! details.mapi_exception)
def GetFolder(self):
From anadelonbrin at users.sourceforge.net Tue Nov 2 22:36:57 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Tue Nov 2 22:37:01 2004
Subject: [Spambayes-checkins] spambayes/Outlook2000 train.py,1.38,1.39
Message-ID:
Update of /cvsroot/spambayes/spambayes/Outlook2000
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv7668/Outlook2000
Modified Files:
train.py
Log Message:
Wrap strings to translate in _(). Leave all log messages untranslated, because logs
are more use to the spambayes@python.org people than they are to users, really, and
I don't want to have to translate logs into English to try and help people.
Index: train.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/Outlook2000/train.py,v
retrieving revision 1.38
retrieving revision 1.39
diff -C2 -d -r1.38 -r1.39
*** train.py 15 Oct 2004 02:04:55 -0000 1.38
--- train.py 2 Nov 2004 21:36:54 -0000 1.39
***************
*** 93,97 ****
def real_trainer(classifier_data, config, message_store, progress):
! progress.set_status("Counting messages")
num_msgs = 0
--- 93,97 ----
def real_trainer(classifier_data, config, message_store, progress):
! progress.set_status(_("Counting messages"))
num_msgs = 0
***************
*** 104,108 ****
for f in message_store.GetFolderGenerator(config.training.ham_folder_ids, config.training.ham_include_sub):
! progress.set_status("Processing good folder '%s'" % (f.name,))
train_folder(f, 0, classifier_data, progress)
if progress.stop_requested():
--- 104,108 ----
for f in message_store.GetFolderGenerator(config.training.ham_folder_ids, config.training.ham_include_sub):
! progress.set_status(_("Processing good folder '%s'") % (f.name,))
train_folder(f, 0, classifier_data, progress)
if progress.stop_requested():
***************
*** 110,114 ****
for f in message_store.GetFolderGenerator(config.training.spam_folder_ids, config.training.spam_include_sub):
! progress.set_status("Processing spam folder '%s'" % (f.name,))
train_folder(f, 1, classifier_data, progress)
if progress.stop_requested():
--- 110,114 ----
for f in message_store.GetFolderGenerator(config.training.spam_folder_ids, config.training.spam_include_sub):
! progress.set_status(_("Processing spam folder '%s'") % (f.name,))
train_folder(f, 1, classifier_data, progress)
if progress.stop_requested():
***************
*** 121,125 ****
# Setup the next "stage" in the progress dialog.
progress.set_max_ticks(1)
! progress.set_status("Writing the database...")
classifier_data.Save()
--- 121,125 ----
# Setup the next "stage" in the progress dialog.
progress.set_max_ticks(1)
! progress.set_status(_("Writing the database..."))
classifier_data.Save()
***************
*** 130,134 ****
if not config.training.ham_folder_ids and not config.training.spam_folder_ids:
! progress.error("You must specify at least one spam or one good folder")
return
--- 130,134 ----
if not config.training.ham_folder_ids and not config.training.spam_folder_ids:
! progress.error(_("You must specify at least one spam or one good folder"))
return
***************
*** 155,161 ****
# Saving is really slow sometimes, but we only have 1 tick for that anyway
if rescore:
! stages = ("Training", .3), ("Saving", .1), ("Scoring", .6)
else:
! stages = ("Training", .9), ("Saving", .1)
progress.set_stages(stages)
--- 155,161 ----
# Saving is really slow sometimes, but we only have 1 tick for that anyway
if rescore:
! stages = (_("Training"), .3), (_("Saving"), .1), (_("Scoring"), .6)
else:
! stages = (_("Training"), .9), (_("Saving"), .1)
progress.set_stages(stages)
***************
*** 190,194 ****
bayes = classifier_data.bayes
! progress.set_status("Completed training with %d spam and %d good messages" % (bayes.nspam, bayes.nham))
def main():
--- 190,195 ----
bayes = classifier_data.bayes
! progress.set_status(_("Completed training with %d spam and %d good messages") % (bayes.nspam, bayes.nham))
!
def main():
From anadelonbrin at users.sourceforge.net Tue Nov 2 22:37:39 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Tue Nov 2 22:37:48 2004
Subject: [Spambayes-checkins] spambayes/Outlook2000 config.py,1.32,1.33
Message-ID:
Update of /cvsroot/spambayes/spambayes/Outlook2000
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv7804/Outlook2000
Modified Files:
config.py
Log Message:
Back out an experimental option I committed by mistake.
Index: config.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/Outlook2000/config.py,v
retrieving revision 1.32
retrieving revision 1.33
diff -C2 -d -r1.32 -r1.33
*** config.py 2 Nov 2004 21:33:46 -0000 1.32
--- config.py 2 Nov 2004 21:37:37 -0000 1.33
***************
*** 119,125 ****
("timer_interval", "obsolete", 1000, "", INTEGER, RESTORE),
("timer_only_receive_folders", "obsolete", True, "", BOOLEAN, RESTORE),
- # Rather than fpfnunsure, do tte. DeleteAs/RecoverFrom just move
- # the message, and a tte update is done on close.
- ("train_to_exhaustion", "Train to exhaustion", False, "", BOOLEAN, RESTORE),
),
"Training" : (
--- 119,122 ----
From anadelonbrin at users.sourceforge.net Wed Nov 3 02:15:07 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Wed Nov 3 02:15:10 2004
Subject: [Spambayes-checkins] spambayes/spambayes classifier.py,1.28,1.29
Message-ID:
Update of /cvsroot/spambayes/spambayes/spambayes
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv20461/spambayes
Modified Files:
classifier.py
Log Message:
Fix [ 922063 ] Intermittent sb_filter.py faliure with URL pickle
This is still ugly experimental code, but it might as well be robust ugly experimental
code . If something goes wrong loading the URL pickles, start with fresh ones
(they are only caches, so that shouldn't hurt). When saving, save to a temp file
first.
Index: classifier.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/spambayes/classifier.py,v
retrieving revision 1.28
retrieving revision 1.29
diff -C2 -d -r1.28 -r1.29
*** classifier.py 29 Oct 2004 00:14:42 -0000 1.28
--- classifier.py 3 Nov 2004 01:15:04 -0000 1.29
***************
*** 617,621 ****
if os.path.exists(self.bad_url_cache_name):
b_file = file(self.bad_url_cache_name, "r")
! self.bad_urls = pickle.load(b_file)
b_file.close()
else:
--- 617,630 ----
if os.path.exists(self.bad_url_cache_name):
b_file = file(self.bad_url_cache_name, "r")
! try:
! self.bad_urls = pickle.load(b_file)
! except IOError, ValueError:
! # Something went wrong loading it (bad pickle,
! # probably). Start afresh.
! if options["globals", "verbose"]:
! print >>sys.stderr, "Bad URL pickle, using new."
! self.bad_urls = {"url:non_resolving": (),
! "url:non_html": (),
! "url:unknown_error": ()}
b_file.close()
else:
***************
*** 627,631 ****
if os.path.exists(self.http_error_cache_name):
h_file = file(self.http_error_cache_name, "r")
! self.http_error_urls = pickle.load(h_file)
h_file.close()
else:
--- 636,647 ----
if os.path.exists(self.http_error_cache_name):
h_file = file(self.http_error_cache_name, "r")
! try:
! self.http_error_urls = pickle.load(h_file)
! except IOError, ValueError:
! # Something went wrong loading it (bad pickle,
! # probably). Start afresh.
! if options["globals", "verbose"]:
! print >>sys.stderr, "Bad HHTP error pickle, using new."
! self.http_error_urls = {}
h_file.close()
else:
***************
*** 636,645 ****
# XXX be a good thing long-term (if a previously invalid URL
# XXX becomes valid, for example).
! b_file = file(self.bad_url_cache_name, "w")
! pickle.dump(self.bad_urls, b_file)
! b_file.close()
! h_file = file(self.http_error_cache_name, "w")
! pickle.dump(self.http_error_urls, h_file)
! h_file.close()
def slurp(self, proto, url):
--- 652,668 ----
# XXX be a good thing long-term (if a previously invalid URL
# XXX becomes valid, for example).
! for name, data in [(self.bad_url_cache_name, self.bad_urls),
! (self.http_error_cache_name, self.http_error_urls),]:
! # Save to a temp file first, in case something goes wrong.
! cache = open(name + ".tmp", "w")
! pickle.dump(data, cache)
! cache.close()
! try:
! os.rename(name + ".tmp", name)
! except OSError:
! # Atomic replace isn't possible with win32, so just
! # remove and rename.
! os.remove(name)
! os.rename(name + ".tmp", name)
def slurp(self, proto, url):
From anadelonbrin at users.sourceforge.net Wed Nov 3 03:05:52 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Wed Nov 3 03:05:54 2004
Subject: [Spambayes-checkins] spambayes/scripts sb_mboxtrain.py,1.13,1.14
Message-ID:
Update of /cvsroot/spambayes/spambayes/scripts
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv29252/scripts
Modified Files:
sb_mboxtrain.py
Log Message:
Fix [ 831864 ] sb_mboxtrain.py: flock vs. lockf
This appears (from the
Python documentation) to be a harmless fix, and shouldn't cause any
problems. Tested on Mandrake 9.1, cygwin and some sort of Redhat system, although
in a very limited way.
Index: sb_mboxtrain.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/scripts/sb_mboxtrain.py,v
retrieving revision 1.13
retrieving revision 1.14
diff -C2 -d -r1.13 -r1.14
*** sb_mboxtrain.py 23 Jul 2004 05:00:06 -0000 1.13
--- sb_mboxtrain.py 3 Nov 2004 02:05:49 -0000 1.14
***************
*** 210,214 ****
raise
! fcntl.lockf(f, fcntl.LOCK_UN)
f.close()
if loud:
--- 210,214 ----
raise
! fcntl.flock(f, fcntl.LOCK_UN)
f.close()
if loud:
From anadelonbrin at users.sourceforge.net Wed Nov 3 03:49:33 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Wed Nov 3 03:49:37 2004
Subject: [Spambayes-checkins] spambayes/scripts sb_dbexpimp.py,1.14,1.15
Message-ID:
Update of /cvsroot/spambayes/spambayes/scripts
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv3646/scripts
Modified Files:
sb_dbexpimp.py
Log Message:
Fix [ 1022848 ] sb_dbexpimp.py crashes while importing into pickle file
One little bit of the code (which just printed out the number of words in the db)
assumed that useDBM would be True/False, when it may be other things these days.
Fix that.
Index: sb_dbexpimp.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/scripts/sb_dbexpimp.py,v
retrieving revision 1.14
retrieving revision 1.15
diff -C2 -d -r1.14 -r1.15
*** sb_dbexpimp.py 30 May 2004 17:01:38 -0000 1.14
--- sb_dbexpimp.py 3 Nov 2004 02:49:30 -0000 1.15
***************
*** 230,234 ****
print "Finished storing database"
! if useDBM:
words = bayes.db.keys()
words.remove(bayes.statekey)
--- 230,234 ----
print "Finished storing database"
! if useDBM == "dbm" or useDBM == True:
words = bayes.db.keys()
words.remove(bayes.statekey)
***************
*** 250,254 ****
sys.exit()
! useDBM = False
newDBM = True
dbFN = None
--- 250,254 ----
sys.exit()
! useDBM = "pickle"
newDBM = True
dbFN = None
From anadelonbrin at users.sourceforge.net Fri Nov 5 03:34:31 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Fri Nov 5 03:34:35 2004
Subject: [Spambayes-checkins] spambayes/spambayes/test test_sb_server.py,
NONE, 1.1 test_sb-server.py, 1.7, NONE
Message-ID:
Update of /cvsroot/spambayes/spambayes/spambayes/test
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv19184/spambayes/test
Added Files:
test_sb_server.py
Removed Files:
test_sb-server.py
Log Message:
I knew this day would come :) I want to import some of the stuff in test_sb-server,
but can't (because there is a '-' in the name). I could factor it all out, but this
file doesn't have much CVS history, so might as well rename it to match the actual
script it's testing.
--- NEW FILE: test_sb_server.py ---
#! /usr/bin/env python
"""Test the POP3 proxy is working correctly.
When using the -z command line option, carries out a test that the
POP3 proxy can be connected to, that incoming mail is classified,
that pipelining is removed from the CAPA[bility] query, and that the
web ui is present.
The -t option runs a fake POP3 server on port 8110. This is the
same server that the -z option uses, and may be separately run for
other testing purposes.
Usage:
test_sb-server.py [options]
options:
-z : Runs a self-test and exits.
-t : Runs a fake POP3 server on port 8110 (for testing).
-h : Displays this help message.
"""
# This module is part of the spambayes project, which is Copyright 2002
# The Python Software Foundation and is covered by the Python Software
# Foundation license.
__author__ = "Richie Hindle "
__credits__ = "All the Spambayes folk."
try:
True, False
except NameError:
# Maintain compatibility with Python 2.2
True, False = 1, 0
# This code originally formed a part of pop3proxy.py. If you are examining
# the history of this file, you may need to go back to there.
todo = """
Web training interface:
o Functional tests.
"""
# One example of spam and one of ham - both are used to train, and are
# then classified. Not a good test of the classifier, but a perfectly
# good test of the POP3 proxy. The bodies of these came from the
# spambayes project, and Richie added the headers because the
# originals had no headers.
spam1 = """From: friend@public.com
Subject: Make money fast
Hello tim_chandler , Want to save money ?
Now is a good time to consider refinancing. Rates are low so you can cut
your current payments and save money.
http://64.251.22.101/interest/index%38%30%300%2E%68t%6D
Take off list on site [s5]
"""
good1 = """From: chris@example.com
Subject: ZPT and DTML
Jean Jordaan wrote:
> 'Fraid so ;> It contains a vintage dtml-calendar tag.
> http://www.zope.org/Members/teyc/CalendarTag
>
> Hmm I think I see what you mean: one needn't manually pass on the
> namespace to a ZPT?
Yeah, Page Templates are a bit more clever, sadly, DTML methods aren't :-(
Chris
"""
import asyncore
import socket
import operator
import re
import getopt
import sys, os
import sb_test_support
sb_test_support.fix_sys_path()
from spambayes import Dibbler
from spambayes import tokenizer
from spambayes.UserInterface import UserInterfaceServer
from spambayes.ProxyUI import ProxyUserInterface
from sb_server import BayesProxyListener
from sb_server import state, _recreateState
from spambayes.Options import options
# HEADER_EXAMPLE is the longest possible header - the length of this one
# is added to the size of each message.
HEADER_EXAMPLE = '%s: xxxxxxxxxxxxxxxxxxxx\r\n' % \
options["Headers", "classification_header_name"]
class TestListener(Dibbler.Listener):
"""Listener for TestPOP3Server. Works on port 8110, to co-exist
with real POP3 servers."""
def __init__(self, socketMap=asyncore.socket_map):
Dibbler.Listener.__init__(self, 8110, TestPOP3Server,
(socketMap,), socketMap=socketMap)
class TestPOP3Server(Dibbler.BrighterAsyncChat):
"""Minimal POP3 server, for testing purposes. Doesn't support
UIDL. USER, PASS, APOP, DELE and RSET simply return "+OK"
without doing anything. Also understands the 'KILL' command, to
kill it. The mail content is the example messages above.
"""
def __init__(self, clientSocket, socketMap):
# Grumble: asynchat.__init__ doesn't take a 'map' argument,
# hence the two-stage construction.
Dibbler.BrighterAsyncChat.__init__(self, map=socketMap)
Dibbler.BrighterAsyncChat.set_socket(self, clientSocket, socketMap)
self.maildrop = [spam1, good1]
self.set_terminator('\r\n')
self.okCommands = ['USER', 'PASS', 'APOP', 'NOOP',
'DELE', 'RSET', 'QUIT', 'KILL']
self.handlers = {'CAPA': self.onCapa,
'STAT': self.onStat,
'LIST': self.onList,
'RETR': self.onRetr,
'TOP': self.onTop}
self.push("+OK ready\r\n")
self.request = ''
def collect_incoming_data(self, data):
"""Asynchat override."""
self.request = self.request + data
def found_terminator(self):
"""Asynchat override."""
if ' ' in self.request:
command, args = self.request.split(None, 1)
else:
command, args = self.request, ''
command = command.upper()
if command in self.okCommands:
self.push("+OK (we hope)\r\n")
if command == 'QUIT':
self.close_when_done()
if command == 'KILL':
self.socket.shutdown(2)
self.close()
raise SystemExit
else:
handler = self.handlers.get(command, self.onUnknown)
self.push(handler(command, args)) # Or push_slowly for testing
self.request = ''
def push_slowly(self, response):
"""Useful for testing."""
for c in response:
self.push(c)
time.sleep(0.02)
def onCapa(self, command, args):
"""POP3 CAPA command. This lies about supporting pipelining for
test purposes - the POP3 proxy *doesn't* support pipelining, and
we test that it correctly filters out that capability from the
proxied capability list. Ditto for STLS."""
lines = ["+OK Capability list follows",
"PIPELINING",
"STLS",
"TOP",
".",
""]
return '\r\n'.join(lines)
def onStat(self, command, args):
"""POP3 STAT command."""
maildropSize = reduce(operator.add, map(len, self.maildrop))
maildropSize += len(self.maildrop) * len(HEADER_EXAMPLE)
return "+OK %d %d\r\n" % (len(self.maildrop), maildropSize)
def onList(self, command, args):
"""POP3 LIST command, with optional message number argument."""
if args:
try:
number = int(args)
except ValueError:
number = -1
if 0 < number <= len(self.maildrop):
return "+OK %d\r\n" % len(self.maildrop[number-1])
else:
return "-ERR no such message\r\n"
else:
returnLines = ["+OK"]
for messageIndex in range(len(self.maildrop)):
size = len(self.maildrop[messageIndex])
returnLines.append("%d %d" % (messageIndex + 1, size))
returnLines.append(".")
return '\r\n'.join(returnLines) + '\r\n'
def _getMessage(self, number, maxLines):
"""Implements the POP3 RETR and TOP commands."""
if 0 < number <= len(self.maildrop):
message = self.maildrop[number-1]
headers, body = message.split('\n\n', 1)
bodyLines = body.split('\n')[:maxLines]
message = headers + '\r\n\r\n' + '\n'.join(bodyLines)
return "+OK\r\n%s\r\n.\r\n" % message
else:
return "-ERR no such message\r\n"
def onRetr(self, command, args):
"""POP3 RETR command."""
try:
number = int(args)
except ValueError:
number = -1
return self._getMessage(number, 12345)
def onTop(self, command, args):
"""POP3 RETR command."""
try:
number, lines = map(int, args.split())
except ValueError:
number, lines = -1, -1
return self._getMessage(number, lines)
def onUnknown(self, command, args):
"""Unknown POP3 command."""
return "-ERR Unknown command: %s\r\n" % repr(command)
def test():
"""Runs a self-test using TestPOP3Server, a minimal POP3 server
that serves the example emails above.
"""
# Run a proxy and a test server in separate threads with separate
# asyncore environments.
import threading
state.isTest = True
testServerReady = threading.Event()
def runTestServer():
testSocketMap = {}
TestListener(socketMap=testSocketMap)
testServerReady.set()
asyncore.loop(map=testSocketMap)
proxyReady = threading.Event()
def runUIAndProxy():
httpServer = UserInterfaceServer(8881)
proxyUI = ProxyUserInterface(state, _recreateState)
httpServer.register(proxyUI)
BayesProxyListener('localhost', 8110, ('', 8111))
state.bayes.learn(tokenizer.tokenize(spam1), True)
state.bayes.learn(tokenizer.tokenize(good1), False)
proxyReady.set()
Dibbler.run()
threading.Thread(target=runTestServer).start()
testServerReady.wait()
threading.Thread(target=runUIAndProxy).start()
proxyReady.wait()
# Connect to the proxy and the test server.
proxy = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
proxy.connect(('localhost', 8111))
response = proxy.recv(100)
assert response == "+OK ready\r\n"
pop3Server = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
pop3Server.connect(('localhost', 8110))
response = pop3Server.recv(100)
assert response == "+OK ready\r\n"
# Verify that the test server claims to support pipelining.
pop3Server.send("capa\r\n")
response = pop3Server.recv(1000)
assert response.find("PIPELINING") >= 0
# Ask for the capabilities via the proxy, and verify that the proxy
# is filtering out the PIPELINING capability.
proxy.send("capa\r\n")
response = proxy.recv(1000)
assert response.find("PIPELINING") == -1
# Verify that the test server claims to support STLS.
pop3Server.send("capa\r\n")
response = pop3Server.recv(1000)
assert response.find("STLS") >= 0
# Ask for the capabilities via the proxy, and verify that the proxy
# is filtering out the PIPELINING capability.
proxy.send("capa\r\n")
response = proxy.recv(1000)
assert response.find("STLS") == -1
# Stat the mailbox to get the number of messages.
proxy.send("stat\r\n")
response = proxy.recv(100)
count, totalSize = map(int, response.split()[1:3])
assert count == 2
# Loop through the messages ensuring that they have judgement
# headers.
for i in range(1, count+1):
response = ""
proxy.send("retr %d\r\n" % i)
while response.find('\n.\r\n') == -1:
response = response + proxy.recv(1000)
assert response.find(options["Headers", "classification_header_name"]) >= 0
# Smoke-test the HTML UI.
httpServer = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
httpServer.connect(('localhost', 8881))
httpServer.sendall("get / HTTP/1.0\r\n\r\n")
response = ''
while 1:
packet = httpServer.recv(1000)
if not packet: break
response += packet
assert re.search(r"(?s).*SpamBayes proxy.*", response)
# Kill the proxy and the test server.
proxy.sendall("kill\r\n")
proxy.recv(100)
pop3Server.sendall("kill\r\n")
pop3Server.recv(100)
def run():
# Read the arguments.
try:
opts, args = getopt.getopt(sys.argv[1:], 'htz')
except getopt.error, msg:
print >>sys.stderr, str(msg) + '\n\n' + __doc__
sys.exit()
runSelfTest = False
for opt, arg in opts:
if opt == '-h':
print >>sys.stderr, __doc__
sys.exit()
elif opt == '-t':
state.isTest = True
state.runTestServer = True
elif opt == '-z':
state.isTest = True
runSelfTest = True
state.createWorkers()
if runSelfTest:
print "\nRunning self-test...\n"
state.buildServerStrings()
test()
print "Self-test passed." # ...else it would have asserted.
elif state.runTestServer:
print "Running a test POP3 server on port 8110..."
TestListener()
asyncore.loop()
else:
print >>sys.stderr, __doc__
if __name__ == '__main__':
run()
--- test_sb-server.py DELETED ---
From anadelonbrin at users.sourceforge.net Fri Nov 5 03:36:28 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Fri Nov 5 03:36:32 2004
Subject: [Spambayes-checkins]
spambayes/spambayes/test test_sb_imapfilter.py, 1.4, 1.5
Message-ID:
Update of /cvsroot/spambayes/spambayes/spambayes/test
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv19506/spambayes/test
Modified Files:
test_sb_imapfilter.py
Log Message:
Fix typo in import.
The test failed with Python 2.2 - not because sb_imapfilter did, but because there
were lots of "if substr in string" instances in the test script. Replace those with
find() != -1 so that the test can be run with 2.2.
Index: test_sb_imapfilter.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/spambayes/test/test_sb_imapfilter.py,v
retrieving revision 1.4
retrieving revision 1.5
diff -C2 -d -r1.4 -r1.5
*** test_sb_imapfilter.py 14 Oct 2004 04:01:10 -0000 1.4
--- test_sb_imapfilter.py 5 Nov 2004 02:36:25 -0000 1.5
***************
*** 15,19 ****
from spambayes import Dibbler
from spambayes.Options import options
! from spambayes.classifier import Classifer
from sb_imapfilter import BadIMAPResponseError
from spambayes.message import message_from_string
--- 15,19 ----
from spambayes import Dibbler
from spambayes.Options import options
! from spambayes.classifier import Classifier
from sb_imapfilter import BadIMAPResponseError
from spambayes.message import message_from_string
***************
*** 89,94 ****
class TestListener(Dibbler.Listener):
! """Listener for TestIMAP4Server. Works on port 8143, to co-exist
! with real IMAP4 servers."""
def __init__(self, socketMap=asyncore.socket_map):
Dibbler.Listener.__init__(self, IMAP_PORT, TestIMAP4Server,
--- 89,93 ----
class TestListener(Dibbler.Listener):
! """Listener for TestIMAP4Server."""
def __init__(self, socketMap=asyncore.socket_map):
Dibbler.Listener.__init__(self, IMAP_PORT, TestIMAP4Server,
***************
*** 239,243 ****
args = args.upper()
results = ()
! if "UNDELETED" in args:
for msg_id in UNDELETED_IDS:
if uid:
--- 238,242 ----
args = args.upper()
results = ()
! if args.find("UNDELETED") != -1:
for msg_id in UNDELETED_IDS:
if uid:
***************
*** 259,263 ****
for msg in msg_nums:
response[msg] = []
! if "UID" in msg_parts:
if uid:
for msg in msg_nums:
--- 258,262 ----
for msg in msg_nums:
response[msg] = []
! if msg_parts.find("UID") != -1:
if uid:
for msg in msg_nums:
***************
*** 267,271 ****
response[msg].append("FETCH (UID %s)" %
(IMAP_UIDS[int(msg)]))
! if "BODY.PEEK[]" in msg_parts:
for msg in msg_nums:
if uid:
--- 266,270 ----
response[msg].append("FETCH (UID %s)" %
(IMAP_UIDS[int(msg)]))
! if msg_parts.find("BODY.PEEK[]") != -1:
for msg in msg_nums:
if uid:
***************
*** 276,280 ****
(len(IMAP_MESSAGES[msg_uid])),
IMAP_MESSAGES[msg_uid]))
! if "RFC822.HEADER" in msg_parts:
for msg in msg_nums:
if uid:
--- 275,279 ----
(len(IMAP_MESSAGES[msg_uid])),
IMAP_MESSAGES[msg_uid]))
! if msg_parts.find("RFC822.HEADER") != -1:
for msg in msg_nums:
if uid:
***************
*** 286,290 ****
response[msg].append(("FETCH (RFC822.HEADER {%s}" %
(len(headers),), headers))
! if "FLAGS INTERNALDATE" in msg_parts:
# We make up flags & dates.
for msg in msg_nums:
--- 285,289 ----
response[msg].append(("FETCH (RFC822.HEADER {%s}" %
(len(headers),), headers))
! if msg_parts.find("FLAGS INTERNALDATE") != -1:
# We make up flags & dates.
for msg in msg_nums:
***************
*** 514,518 ****
# message 103 is replaced with one that does, this will fail with
# Python 2.4/email 3.0.
! has_header = "X-Spambayes-Exception: " in new_msg.as_string()
has_defect = hasattr(new_msg, "defects") and len(new_msg.defects) > 0
self.assert_(has_header or has_defect)
--- 513,517 ----
# message 103 is replaced with one that does, this will fail with
# Python 2.4/email 3.0.
! has_header = new_msg.as_string().find("X-Spambayes-Exception: ") != -1
has_defect = hasattr(new_msg, "defects") and len(new_msg.defects) > 0
self.assert_(has_header or has_defect)
From anadelonbrin at users.sourceforge.net Fri Nov 5 03:37:36 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Fri Nov 5 03:37:39 2004
Subject: [Spambayes-checkins] spambayes/spambayes/test test_sb_pop3dnd.py,
NONE, 1.1
Message-ID:
Update of /cvsroot/spambayes/spambayes/spambayes/test
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv19708/spambayes/test
Added Files:
test_sb_pop3dnd.py
Log Message:
Initial unittests for sb_pop3dnd.
--- NEW FILE: test_sb_pop3dnd.py ---
# Test sb_pop3dnd script.
import sys
import email
import time
import thread
import imaplib
import unittest
import sb_test_support
sb_test_support.fix_sys_path()
from spambayes import Dibbler
from spambayes.Options import options
from spambayes.classifier import Classifier
from spambayes.message import message_from_string
from sb_pop3dnd import IMAPMessage, DynamicIMAPMessage, IMAPFileMessage
from sb_pop3dnd import IMAPFileMessageFactory
# We borrow the dummy POP3 server that test_sb_server uses.
# And also the test messages.
from test_sb_server import TestListener, good1, spam1
POP_PORT = 8110
class IMAPMessageTest(unittest.TestCase):
def testIMAPMessage(self):
msg = IMAPMessage()
self.assertEqual(msg.date, None)
msg = IMAPMessage("fake date")
self.assertEqual(msg.date, "fake date")
for att in ["date", "deleted", "flagged", "seen", "draft",
"recent", "answered"]:
self.assert_(att in msg.stored_attributes)
for flag in ["deleted", "answered", "flagged", "seen", "draft",
"recent"]:
self.assertEqual(getattr(msg, flag), False)
def testGetAllHeaders(self):
msg = email.message_from_string(good1, _class=IMAPMessage)
correct_msg = email.message_from_string(good1)
# Without passing a list, we should get them all.
# We get them in lowercase, because this is a twisted
# requirement.
headers = msg.getHeaders(False)
for k, v in correct_msg.items():
self.assertEqual(headers[k.lower()], v)
# Should work the same with negate
headers = msg.getHeaders(True)
for k, v in correct_msg.items():
self.assertEqual(headers[k.lower()], v)
def testGetIndividualHeaders(self):
msg = email.message_from_string(good1, _class=IMAPMessage)
correct_msg = email.message_from_string(good1)
# We get them in lowercase, because this is a twisted
# requirement. We pass them in uppercard, because this
# is a twisted requirement. It's not called twisted for
# nothing!
headers = msg.getHeaders(False, "SUBJECT")
self.assertEqual(headers["subject"], correct_msg["Subject"])
# Negate should get all the other headers.
headers = msg.getHeaders(True, "SUBJECT")
self.assert_("subject" not in headers)
for k, v in correct_msg.items():
if k == "Subject":
continue
self.assertEqual(headers[k.lower()], v)
def testGetFlags(self):
msg = IMAPMessage()
all_flags = ["deleted", "answered", "flagged", "seen", "draft",
"recent"]
for flag in all_flags:
setattr(msg, flag, True)
flags = list(msg.getFlags())
for flag in all_flags:
self.assert_("\\%s" % (flag.upper(),) in flags)
for flag in all_flags:
setattr(msg, flag, False)
flags = list(msg.getFlags())
self.assertEqual(flags, [])
def testGetInternalDate(self):
msg = IMAPMessage()
self.assertRaises(AssertionError, msg.getInternalDate)
msg = IMAPMessage("fake date")
self.assertEqual(msg.getInternalDate(), "fake date")
def testGetBodyFile(self):
msg = email.message_from_string(spam1, _class=IMAPMessage)
correct_msg = email.message_from_string(spam1)
body = msg.getBodyFile()
# Our messages are designed for transmittal, so have
# \r\n rather than \n as end-of-line.
self.assertEqual(body.read().replace('\r\n', '\n'),
correct_msg.get_payload())
def testGetSize(self):
msg = email.message_from_string(spam1, _class=IMAPMessage)
correct_msg = email.message_from_string(spam1)
# Our messages are designed for transmittal, so have
# \r\n rather than \n as end-of-line.
self.assertEqual(msg.getSize(),
len(correct_msg.as_string().replace('\n', '\r\n')))
def testGetUID(self):
msg = IMAPMessage()
msg.id = "fake id" # Heh
self.assertEqual(msg.getUID(), "fake id")
def testIsMultipart(self):
msg = IMAPMessage()
self.assertEqual(msg.isMultipart(), False)
def testGetSubPart(self):
msg = IMAPMessage()
self.assertRaises(NotImplementedError, msg.getSubPart, None)
def testClearFlags(self):
msg = IMAPMessage()
all_flags = ["deleted", "answered", "flagged", "seen", "draft",
"recent"]
for flag in all_flags:
setattr(msg, flag, True)
msg.clear_flags()
for flag in all_flags:
self.assertEqual(getattr(msg, flag), False)
def testFlags(self):
msg = IMAPMessage()
all_flags = ["deleted", "answered", "flagged", "seen", "draft",
"recent"]
for flag in all_flags:
setattr(msg, flag, True)
flags = list(msg.flags())
for flag in all_flags:
self.assert_("\\%s" % (flag.upper(),) in flags)
for flag in all_flags:
setattr(msg, flag, False)
flags = list(msg.flags())
self.assertEqual(flags, [])
def testTrain(self):
# XXX To do
pass
def testStructure(self):
# XXX To do
pass
def testBody(self):
msg = email.message_from_string(good1, _class=IMAPMessage)
correct_msg = email.message_from_string(good1)
body = msg.body()
# Our messages are designed for transmittal, so have
# \r\n rather than \n as end-of-line.
self.assertEqual(body.replace('\r\n', '\n'),
correct_msg.get_payload())
def testHeaders(self):
msg = email.message_from_string(good1, _class=IMAPMessage)
correct_msg = email.message_from_string(good1)
headers = msg.headers()
correct_headers = "\r\b".join(["%s: %s" % (k, v) \
for k, v in correct_msg.items()])
class DynamicIMAPMessageTest(unittest.TestCase):
def setUp(self):
def fakemsg(body=False, headers=False):
msg = []
if headers:
msg.append("Header: Fake")
if body:
msg.append("\r\n")
if body:
msg.append("Fake Body")
return "\r\n".join(msg)
self.msg = DynamicIMAPMessage(fakemsg)
def testDate(self):
date = imaplib.Time2Internaldate(time.time())[1:-1]
self.assertEqual(self.msg.date, date)
def testLoad(self):
self.assertEqual(self.msg.as_string(),
"Header: Fake\r\n\r\nFake Body")
class IMAPFileMessageTest(unittest.TestCase):
def setUp(self):
self.msg = IMAPFileMessage("filename", "directory")
def testID(self):
self.assertEqual(self.msg.id, "filename")
def testDate(self):
date = imaplib.Time2Internaldate(time.time())[1:-1]
self.assertEqual(self.msg.date, date)
class IMAPFileMessageFactoryTest(unittest.TestCase):
def testCreateNoContent(self):
factory = IMAPFileMessageFactory()
msg = factory.create("key", "directory")
self.assertEqual(msg.id, key)
self.assert_(isinstance(msg, type(IMAPFileMessage())))
def suite():
suite = unittest.TestSuite()
for cls in (IMAPMessageTest,
DynamicIMAPMessageTest,
IMAPFileMessageTest,
):
suite.addTest(unittest.makeSuite(cls))
return suite
if __name__=='__main__':
def runTestServer():
import asyncore
asyncore.loop()
TestListener()
thread.start_new_thread(runTestServer, ())
sb_test_support.unittest_main(argv=sys.argv + ['suite'])
From anadelonbrin at users.sourceforge.net Fri Nov 5 04:00:00 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Fri Nov 5 04:00:04 2004
Subject: [Spambayes-checkins] spambayes/scripts sb_upload.py,1.4,1.5
Message-ID:
Update of /cvsroot/spambayes/spambayes/scripts
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv27894/scripts
Modified Files:
sb_upload.py
Log Message:
Correct docstring.
Index: sb_upload.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/scripts/sb_upload.py,v
retrieving revision 1.4
retrieving revision 1.5
diff -C2 -d -r1.4 -r1.5
*** sb_upload.py 7 Oct 2004 06:09:57 -0000 1.4
--- sb_upload.py 5 Nov 2004 02:59:51 -0000 1.5
***************
*** 8,16 ****
interface, which will save the message in the 'unknown' cache, ready
for you to classify it. It does not do any training, just saves it
! ready for you to classify.
usage: %(progname)s [-h] [-n] [-s server] [-p port] [-r N]
[-o section:option:value]
! [-t (ham|spam)] [-o section:option:value]
Options:
--- 8,16 ----
interface, which will save the message in the 'unknown' cache, ready
for you to classify it. It does not do any training, just saves it
! ready for you to classify (unless you use the -t switch).
usage: %(progname)s [-h] [-n] [-s server] [-p port] [-r N]
[-o section:option:value]
! [-t (ham|spam)]
Options:
From anadelonbrin at users.sourceforge.net Fri Nov 5 04:03:15 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Fri Nov 5 04:03:19 2004
Subject: [Spambayes-checkins] spambayes/spambayes Stats.py, 1.7,
1.8 mboxutils.py, 1.8, 1.9 message.py, 1.55, 1.56
Message-ID:
Update of /cvsroot/spambayes/spambayes/spambayes
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv28691/spambayes
Modified Files:
Stats.py mboxutils.py message.py
Log Message:
Make message.insert_exception_string optionally put in the id header, for compatibility
with sb_server and sb_pop3dnd.
Make use of HTML in Stats.GetStats optional.
Index: Stats.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/spambayes/Stats.py,v
retrieving revision 1.7
retrieving revision 1.8
diff -C2 -d -r1.7 -r1.8
*** Stats.py 2 Nov 2004 06:33:23 -0000 1.7
--- Stats.py 5 Nov 2004 03:03:00 -0000 1.8
***************
*** 93,97 ****
self.trn_ham += 1
! def GetStats(self):
if self.total == 0:
return ["SpamBayes has processed zero messages"]
--- 93,97 ----
self.trn_ham += 1
! def GetStats(self, use_html=True):
if self.total == 0:
return ["SpamBayes has processed zero messages"]
***************
*** 176,179 ****
--- 176,186 ----
else:
format_dict[key] = 'were'
+ # Possibly use HTML for breaks/tabs.
+ if use_html:
+ format_dict["br"] = "
"
+ format_dict["tab"] = " "
+ else:
+ format_dict["br"] = "\r\n"
+ format_dict["tab"] = "\t"
## Our result should look something like this:
***************
*** 200,232 ****
push("SpamBayes has classified a total of " \
"%(num_seen)d message%(sp1)s:" \
! "
%(cls_ham)d " \
"(%(perc_cls_ham).0f%% of total) good" \
! "
%(cls_spam)d " \
"(%(perc_cls_spam).0f%% of total) spam" \
! "
%(cls_unsure)d " \
"(%(perc_cls_unsure).0f%% of total) unsure." % \
format_dict)
push("%(correct)d message%(sp2)s %(wp1)s classified correctly " \
"(%(perc_correct).0f%% of total)" \
! "
%(incorrect)d message%(sp3)s %(wp2)s classified " \
"incorrectly " \
"(%(perc_incorrect).0f%% of total)" \
! "
%(fp)d false positive%(sp4)s " \
"(%(perc_fp).0f%% of total)" \
! "
%(fn)d false negative%(sp5)s " \
"(%(perc_fn).0f%% of total)" % \
format_dict)
push("%(trn_unsure_ham)d unsure%(sp6)s trained as good " \
"(%(unsure_ham_perc).0f%% of unsures)" \
! "
%(trn_unsure_spam)d unsure%(sp7)s trained as spam " \
"(%(unsure_spam_perc).0f%% of unsures)" \
! "
%(not_trn_unsure)d unsure%(sp8)s %(wp3)s not trained " \
"(%(unsure_not_perc).0f%% of unsures)" % \
format_dict)
push("A total of %(trn_total)d message%(sp9)s have been trained:" \
! "
%(trn_ham)d good " \
"(%(trn_perc_ham)0.f%% good, %(trn_perc_unsure_ham)0.f%% " \
"unsure, %(trn_perc_fp).0f%% false positives)" \
! "
%(trn_spam)d spam " \
"(%(trn_perc_spam)0.f%% spam, %(trn_perc_unsure_spam)0.f%% " \
"unsure, %(trn_perc_fn).0f%% false negatives)" % \
--- 207,239 ----
push("SpamBayes has classified a total of " \
"%(num_seen)d message%(sp1)s:" \
! "%(br)s%(tab)s%(cls_ham)d " \
"(%(perc_cls_ham).0f%% of total) good" \
! "%(br)s%(tab)s%(cls_spam)d " \
"(%(perc_cls_spam).0f%% of total) spam" \
! "%(br)s%(tab)s%(cls_unsure)d " \
"(%(perc_cls_unsure).0f%% of total) unsure." % \
format_dict)
push("%(correct)d message%(sp2)s %(wp1)s classified correctly " \
"(%(perc_correct).0f%% of total)" \
! "%(br)s%(incorrect)d message%(sp3)s %(wp2)s classified " \
"incorrectly " \
"(%(perc_incorrect).0f%% of total)" \
! "%(br)s%(tab)s%(fp)d false positive%(sp4)s " \
"(%(perc_fp).0f%% of total)" \
! "%(br)s%(tab)s%(fn)d false negative%(sp5)s " \
"(%(perc_fn).0f%% of total)" % \
format_dict)
push("%(trn_unsure_ham)d unsure%(sp6)s trained as good " \
"(%(unsure_ham_perc).0f%% of unsures)" \
! "%(br)s%(trn_unsure_spam)d unsure%(sp7)s trained as spam " \
"(%(unsure_spam_perc).0f%% of unsures)" \
! "%(br)s%(not_trn_unsure)d unsure%(sp8)s %(wp3)s not trained " \
"(%(unsure_not_perc).0f%% of unsures)" % \
format_dict)
push("A total of %(trn_total)d message%(sp9)s have been trained:" \
! "%(br)s%(tab)s%(trn_ham)d good " \
"(%(trn_perc_ham)0.f%% good, %(trn_perc_unsure_ham)0.f%% " \
"unsure, %(trn_perc_fp).0f%% false positives)" \
! "%(br)s%(tab)s%(trn_spam)d spam " \
"(%(trn_perc_spam)0.f%% spam, %(trn_perc_unsure_spam)0.f%% " \
"unsure, %(trn_perc_fn).0f%% false negatives)" % \
Index: mboxutils.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/spambayes/mboxutils.py,v
retrieving revision 1.8
retrieving revision 1.9
diff -C2 -d -r1.8 -r1.9
*** mboxutils.py 25 May 2004 23:16:40 -0000 1.8
--- mboxutils.py 5 Nov 2004 03:03:00 -0000 1.9
***************
*** 117,121 ****
function is imported by tokenizer, and our message class imports
tokenizer, so we get a circular import problem. In any case, this
! function does need anything that our message class offers, so that
shouldn't matter.
"""
--- 117,121 ----
function is imported by tokenizer, and our message class imports
tokenizer, so we get a circular import problem. In any case, this
! function does not need anything that our message class offers, so that
shouldn't matter.
"""
Index: message.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/spambayes/message.py,v
retrieving revision 1.55
retrieving revision 1.56
diff -C2 -d -r1.55 -r1.56
*** message.py 1 Oct 2004 00:03:19 -0000 1.55
--- message.py 5 Nov 2004 03:03:00 -0000 1.56
***************
*** 495,499 ****
# This is used by both sb_server and sb_imapfilter, so it's handy to have
# it available separately.
! def insert_exception_header(string_msg, msg_id):
"""Insert an exception header into the given RFC822 message (as text).
--- 495,499 ----
# This is used by both sb_server and sb_imapfilter, so it's handy to have
# it available separately.
! def insert_exception_header(string_msg, msg_id=None):
"""Insert an exception header into the given RFC822 message (as text).
***************
*** 510,520 ****
header = email.Header.Header(dottedDetails, header_name=headerName)
! # Insert the exception header, and also insert the id header,
# otherwise we might keep doing this message over and over again.
# We also ensure that the line endings are /r/n as RFC822 requires.
headers, body = re.split(r'\n\r?\n', string_msg, 1)
header = re.sub(r'\r?\n', '\r\n', str(header))
! headers += "\n%s: %s\r\n%s: %s\r\n\r\n" % \
! (headerName, header,
! options["Headers", "mailid_header_name"], msg_id)
return (headers + body, details)
--- 510,522 ----
header = email.Header.Header(dottedDetails, header_name=headerName)
! # Insert the exception header, and optionally also insert the id header,
# otherwise we might keep doing this message over and over again.
# We also ensure that the line endings are /r/n as RFC822 requires.
headers, body = re.split(r'\n\r?\n', string_msg, 1)
header = re.sub(r'\r?\n', '\r\n', str(header))
! headers += "\n%s: %s\r\n" % \
! (headerName, header)
! if msg_id:
! headers += "%s: %s\r\n" % \
! (options["Headers", "mailid_header_name"], msg_id)
return (headers + body, details)
From anadelonbrin at users.sourceforge.net Fri Nov 5 04:10:06 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Fri Nov 5 04:10:11 2004
Subject: [Spambayes-checkins] spambayes/scripts sb_pop3dnd.py,1.10,1.11
Message-ID:
Update of /cvsroot/spambayes/spambayes/scripts
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv29921/scripts
Modified Files:
sb_pop3dnd.py
Log Message:
Fix docstring.
Stop using the web interface. This is against the point of the module, and everything
exception configuration is now provided via the IMAP server itself.
Fix bug in getHeaders where negation wouldn't work correctly.
Fix use of assert.
Remove code duplication in flags()
Fix loading of dynamic messages to correctly generate the headers, so that envelope
works.
Change Factory and FileMessage to fit the new style.
Change the fake email addresses to the same format as the notate_to option (i.e. @spambayes.invalid)
Improve the "about" message to include the docstring.
Add a dynamic stats message.
Improve the dynamic status message to include everything that would normally be on
the web interface.
Add a "train as spam" folder, to separate out training and classifying as spam.
Use the message.insert_exception_header utility function.
Use twisted.Application in the new style to avoid deprecation warnings.
Index: sb_pop3dnd.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/scripts/sb_pop3dnd.py,v
retrieving revision 1.10
retrieving revision 1.11
diff -C2 -d -r1.10 -r1.11
*** sb_pop3dnd.py 14 Jul 2004 07:16:59 -0000 1.10
--- sb_pop3dnd.py 5 Nov 2004 03:10:04 -0000 1.11
***************
*** 1,6 ****
#!/usr/bin/env python
- from __future__ import generators
-
"""POP3DND - provides drag'n'drop training ability for POP3 clients.
--- 1,4 ----
***************
*** 9,15 ****
other POP3 proxy). While messages classified as ham are simply passed
through the proxy, messages that are classified as spam or unsure are
! intercepted and passed to the IMAP server. The IMAP server offers three
folders - one where messages classified as spam end up, one for messages
! it is unsure about, and one for training ham.
In other words, to use this application, setup your mail client to connect
--- 7,13 ----
other POP3 proxy). While messages classified as ham are simply passed
through the proxy, messages that are classified as spam or unsure are
! intercepted and passed to the IMAP server. The IMAP server offers four
folders - one where messages classified as spam end up, one for messages
! it is unsure about, one for training ham, and one for training spam.
In other words, to use this application, setup your mail client to connect
***************
*** 20,64 ****
spam and one for unsure messages.
! To train SpamBayes, use the spam folder, and the 'train_as_ham' folder.
! Any messages in these folders will be trained appropriately. This means
! that all messages that SpamBayes classifies as spam will also be trained
! as such. If you receive any 'false positives' (ham classified as spam),
! you *must* copy the message into the 'train_as_ham' folder to correct the
! training. You may also place any saved spam messages you have into this
! folder.
!
! So that SpamBayes knows about ham as well as spam, you will also need to
! move or copy mail into the 'train_as_ham' folder. These may come from
! the unsure folder, or from any other mail you have saved. It is a good
! idea to leave messages in the 'train_as_ham' and 'spam' folders, so that
! you can retrain from scratch if required. (However, you should always
! clear out your unsure folder, preferably moving or copying the messages
! into the appropriate training folder).
This SpamBayes application is designed to work with Outlook Express, and
provide the same sort of ease of use as the Outlook plugin. Although the
! majority of development and testing has been done with Outlook Express and
! Eudora, any mail client that supports both IMAP and POP3 should be able to
! use this application - if the client enables the user to work with an IMAP
! account and POP3 account side-by-side (and move messages between them),
! then it should work equally as well.
!
! This module includes the following classes:
! o IMAPMessage
! o DynamicIMAPMessage
! o IMAPFileMessage
! o IMAPFileMessageFactory
! o IMAPMailbox
! o SpambayesMailbox
! o SpambayesInbox
! o Trainer
! o SpambayesAccount
! o SpambayesIMAPServer
! o OneParameterFactory
! o MyBayesProxy
! o MyBayesProxyListener
! o IMAPState
"""
todo = """
o The RECENT flag should be unset at some point, but when? The
--- 18,35 ----
spam and one for unsure messages.
! To train SpamBayes, use the 'train_as_spam' and 'train_as_ham' folders.
! Any messages in these folders will be trained appropriately.
This SpamBayes application is designed to work with Outlook Express, and
provide the same sort of ease of use as the Outlook plugin. Although the
! majority of development and testing has been done with Outlook Express,
! Eudora and Thunderbird, any mail client that supports both IMAP and POP3
! should be able to use this application - if the client enables the user to
! work with an IMAP account and POP3 account side-by-side (and move messages
! between them), then it should work equally as well.
"""
+ from __future__ import generators
+
todo = """
o The RECENT flag should be unset at some point, but when? The
***************
*** 75,88 ****
(with the <> operands), or get a part of a MIME message (by
prepending a number). This should be added!
- o If the user clicks the 'save and shutdown' button on the web
- interface, this will only kill the POP3 proxy and web interface
- threads, and not the IMAP server. We need to monitor the thread
- that we kick off, and if it dies, we should die too. Need to figure
- out how to do this in twisted.
- o Apparently, twisted.internet.app is deprecated, and we should
- use twisted.application instead. Need to figure out what that means!
- o We could have a distinction between messages classified as spam
- and messages to train as spam. At the moment we force users into
- the 'incremental training' system available with the Outlook plug-in.
o Suggestions?
"""
--- 46,49 ----
***************
*** 108,122 ****
import errno
import types
import thread
import getopt
import imaplib
import operator
- import StringIO
import email.Utils
from twisted import cred
from twisted.internet import defer
from twisted.internet import reactor
! from twisted.internet.app import Application
from twisted.internet.defer import maybeDeferred
from twisted.internet.protocol import ServerFactory
--- 69,89 ----
import errno
import types
+ import email
import thread
import getopt
import imaplib
import operator
import email.Utils
+ try:
+ import cStringIO as StringIO
+ except NameError:
+ import StringIO
+
from twisted import cred
+ import twisted.application.app
from twisted.internet import defer
from twisted.internet import reactor
! from twisted.internet import win32eventreactor
from twisted.internet.defer import maybeDeferred
from twisted.internet.protocol import ServerFactory
***************
*** 129,138 ****
from spambayes import message
from spambayes.Options import options
from spambayes.tokenizer import tokenize
from spambayes import FileCorpus, Dibbler
from spambayes.Version import get_version_string
- from spambayes.ServerUI import ServerUserInterface
- from spambayes.UserInterface import UserInterfaceServer
from sb_server import POP3ProxyBase, State, _addressPortStr, _recreateState
--- 96,104 ----
from spambayes import message
+ from spambayes.Stats import Stats
from spambayes.Options import options
from spambayes.tokenizer import tokenize
from spambayes import FileCorpus, Dibbler
from spambayes.Version import get_version_string
from sb_server import POP3ProxyBase, State, _addressPortStr, _recreateState
***************
*** 168,172 ****
headers = {}
for header, value in self.items():
! if (header.lower() in names and not negate) or names == ():
headers[header.lower()] = value
return headers
--- 134,139 ----
headers = {}
for header, value in self.items():
! if (header.upper() in names and not negate) or \
! (header.upper() not in names and negate) or names == ():
headers[header.lower()] = value
return headers
***************
*** 192,202 ****
def getInternalDate(self):
"""Retrieve the date internally associated with this message."""
! assert(self.date is not None,
! "Must set date to use IMAPMessage instance.")
return self.date
def getBodyFile(self):
"""Retrieve a file object containing the body of this message."""
! # Note only body, not headers!
s = StringIO.StringIO()
s.write(self.body())
--- 159,169 ----
def getInternalDate(self):
"""Retrieve the date internally associated with this message."""
! assert self.date is not None, \
! "Must set date to use IMAPMessage instance."
return self.date
def getBodyFile(self):
"""Retrieve a file object containing the body of this message."""
! # Note: only body, not headers!
s = StringIO.StringIO()
s.write(self.body())
***************
*** 256,273 ****
def flags(self):
"""Return the message flags."""
! all_flags = []
! if self.deleted:
! all_flags.append("\\DELETED")
! if self.answered:
! all_flags.append("\\ANSWERED")
! if self.flagged:
! all_flags.append("\\FLAGGED")
! if self.seen:
! all_flags.append("\\SEEN")
! if self.draft:
! all_flags.append("\\DRAFT")
! if self.draft:
! all_flags.append("\\RECENT")
! return all_flags
def train(self, classifier, isSpam):
--- 223,227 ----
def flags(self):
"""Return the message flags."""
! return list(self._flags_iter())
def train(self, classifier, isSpam):
***************
*** 340,344 ****
self.load()
def load(self):
! self.set_payload(self.func(body=True, headers=True))
--- 294,303 ----
self.load()
def load(self):
! # This only works for simple messages (non multi-part).
! self.set_payload(self.func(body=True))
! # This only works for simple headers (no continuations).
! for headerstr in self.func(headers=True).split('\r\n'):
! header, value = headerstr.split(':')
! self[header] = value.strip()
***************
*** 346,350 ****
'''IMAP Message that persists as a file system artifact.'''
! def __init__(self, file_name, directory):
"""Constructor(message file name, corpus directory name)."""
date = imaplib.Time2Internaldate(time.time())[1:-1]
--- 305,309 ----
'''IMAP Message that persists as a file system artifact.'''
! def __init__(self, file_name=None, directory=None):
"""Constructor(message file name, corpus directory name)."""
date = imaplib.Time2Internaldate(time.time())[1:-1]
***************
*** 352,363 ****
FileCorpus.FileMessage.__init__(self, file_name, directory)
self.id = file_name
- self.directory = directory
class IMAPFileMessageFactory(FileCorpus.FileMessageFactory):
'''MessageFactory for IMAPFileMessage objects'''
! def create(self, key, directory):
'''Create a message object from a filename in a directory'''
! return IMAPFileMessage(key, directory)
--- 311,328 ----
FileCorpus.FileMessage.__init__(self, file_name, directory)
self.id = file_name
class IMAPFileMessageFactory(FileCorpus.FileMessageFactory):
'''MessageFactory for IMAPFileMessage objects'''
! def create(self, key, directory, content=None):
'''Create a message object from a filename in a directory'''
! if content is None:
! return IMAPFileMessage(key, directory)
! msg = email.message_from_string(content, _class=IMAPFileMessage,
! strict=False)
! msg.id = key
! msg.file_name = key
! msg.directory = directory
! return msg
***************
*** 396,401 ****
self.nextUID = long(self.storage.keys()[-1]) + 1
# Calculate initial recent and unseen counts
- # XXX Note that this will always end up with zero counts
- # XXX until the flags are persisted.
self.unseen_count = 0
self.recent_count = 0
--- 361,364 ----
***************
*** 416,420 ****
def getUID(self, msg):
"""Return the UID of a message in the mailbox."""
! # Note that IMAP messages are 1-based, our messages are 0-based
d = self.storage
return long(d.keys()[msg - 1])
--- 379,383 ----
def getUID(self, msg):
"""Return the UID of a message in the mailbox."""
! # Note that IMAP messages are 1-based, our messages are 0-based.
d = self.storage
return long(d.keys()[msg - 1])
***************
*** 528,531 ****
--- 491,496 ----
def _messagesIter(self, messages, uid):
if uid:
+ if not self.storage.keys():
+ return
messages.last = long(self.storage.keys()[-1])
else:
***************
*** 591,601 ****
msg = []
if headers:
! msg.append("Subject:SpamBayes Status")
! msg.append('From:"SpamBayes" ')
if body:
msg.append('\r\n')
if body:
state.buildStatusStrings()
! msg.append(state.warning or "SpamBayes operating correctly.")
return "\r\n".join(msg)
--- 556,602 ----
msg = []
if headers:
! msg.append("Subject: SpamBayes Status")
! msg.append('From: "SpamBayes" ')
if body:
msg.append('\r\n')
if body:
state.buildStatusStrings()
! msg.append("POP3 proxy running on %s, proxying to %s." % \
! (state.proxyPortsString, state.serversString))
! msg.append("Active POP3 conversations: %s." % \
! (state.activeSessions,))
! msg.append("POP3 conversations this session: %s." % \
! (state.totalSessions,))
! msg.append("IMAP server running on %s." % \
! (state.serverPortString,))
! msg.append("Active IMAP4 conversations: %s." % \
! (state.activeIMAPSessions,))
! msg.append("IMAP4 conversations this session: %s." % \
! (state.totalIMAPSessions,))
! msg.append("Emails classified this session: %s spam, %s ham, "
! "%s unsure." % (state.numSpams, state.numHams,
! state.numUnsure))
! msg.append("Total emails trained: Spam: %s Ham: %s" % \
! (state.bayes.nspam, state.bayes.nham))
! msg.append(state.warning or "SpamBayes is operating correctly.\r\n")
! return "\r\n".join(msg)
!
! def buildStatisticsMessage(self, body=False, headers=False):
! """Build a mesasge containing the current statistics.
!
! If body is True, then return the body; if headers is True
! return the headers. If both are true, then return both
! (and insert a newline between them).
! """
! msg = []
! if headers:
! msg.append("Subject: SpamBayes Statistics")
! msg.append('From: "SpamBayes" \r\n\r\n' \
! 'See .\r\n'
date = imaplib.Time2Internaldate(time.time())[1:-1]
msg = email.message_from_string(about, _class=IMAPMessage,
--- 604,611 ----
"""Create the special messages that live in this mailbox."""
state.buildStatusStrings()
! state.buildServerStrings()
! about = 'Subject: About SpamBayes / POP3DND\r\n' \
! 'From: "SpamBayes" \r\n\r\n' \
! '%s\r\nSee .\r\n' % (__doc__,)
date = imaplib.Time2Internaldate(time.time())[1:-1]
msg = email.message_from_string(about, _class=IMAPMessage,
***************
*** 614,621 ****
msg = DynamicIMAPMessage(self.buildStatusMessage)
self.addMessage(msg)
# XXX Add other messages here, for example
- # XXX statistics
- # XXX information from sb_server homepage about number
- # XXX of messages classified etc.
# XXX one with a link to the configuration page
# XXX (or maybe even the configuration page itself,
--- 615,621 ----
msg = DynamicIMAPMessage(self.buildStatusMessage)
self.addMessage(msg)
+ msg = DynamicIMAPMessage(self.buildStatisticsMessage)
+ self.addMessage(msg)
# XXX Add other messages here, for example
# XXX one with a link to the configuration page
# XXX (or maybe even the configuration page itself,
***************
*** 679,687 ****
"""Account for Spambayes server."""
! def __init__(self, id, ham, spam, unsure, inbox):
MemoryAccount.__init__(self, id)
self.mailboxes = {"SPAM" : spam,
"UNSURE" : unsure,
"TRAIN_AS_HAM" : ham,
"INBOX" : inbox}
--- 679,688 ----
"""Account for Spambayes server."""
! def __init__(self, id, ham, spam, unsure, train_spam, inbox):
MemoryAccount.__init__(self, id)
self.mailboxes = {"SPAM" : spam,
"UNSURE" : unsure,
"TRAIN_AS_HAM" : ham,
+ "TRAIN_AS_SPAM" : train_spam,
"INBOX" : inbox}
***************
*** 745,749 ****
! class MyBayesProxy(POP3ProxyBase):
"""Proxies between an email client and a POP3 server, redirecting
mail to the imap server as necessary. It acts on the following
--- 746,750 ----
! class RedirectingBayesProxy(POP3ProxyBase):
"""Proxies between an email client and a POP3 server, redirecting
mail to the imap server as necessary. It acts on the following
***************
*** 759,763 ****
# information about who the message was from, or what the subject
# was, if people thought that would be a good idea.
! intercept_message = 'From: "Spambayes" \r\n' \
'Subject: Spambayes Intercept\r\n\r\nA message ' \
'was intercepted by Spambayes (it scored %s).\r\n' \
--- 760,764 ----
# information about who the message was from, or what the subject
# was, if people thought that would be a good idea.
! intercept_message = 'From: "Spambayes" \r\n' \
'Subject: Spambayes Intercept\r\n\r\nA message ' \
'was intercepted by Spambayes (it scored %s).\r\n' \
***************
*** 831,834 ****
--- 832,839 ----
evidence=True)
+ # Note that the X-SpamBayes-MailID header will be worthless
+ # because we don't know the message id at this point. It's
+ # not necessary for anything anyway, so just don't set the
+ # [Headers] add_unique_id option.
msg.addSBHeaders(prob, clues)
***************
*** 864,878 ****
messageText = self.intercept_message % (prob,)
except:
! stream = cStringIO.StringIO()
! traceback.print_exc(None, stream)
! details = stream.getvalue()
! detailLines = details.strip().split('\n')
! dottedDetails = '\n.'.join(detailLines)
! headerName = 'X-Spambayes-Exception'
! header = Header(dottedDetails, header_name=headerName)
! headers, body = re.split(r'\n\r?\n', messageText, 1)
! header = re.sub(r'\r?\n', '\r\n', str(header))
! headers += "\n%s: %s\r\n\r\n" % (headerName, header)
! messageText = headers + body
print >>sys.stderr, details
retval = ok + "\n" + messageText
--- 869,876 ----
messageText = self.intercept_message % (prob,)
except:
! messageText, details = \
! message.insert_exception_header(messageText)
!
! # Print the exception and a traceback.
print >>sys.stderr, details
retval = ok + "\n" + messageText
***************
*** 889,900 ****
! class MyBayesProxyListener(Dibbler.Listener):
"""Listens for incoming email client connections and spins off
! MyBayesProxy objects to serve them.
"""
-
def __init__(self, serverName, serverPort, proxyPort, spam, unsure):
proxyArgs = (serverName, serverPort, spam, unsure)
! Dibbler.Listener.__init__(self, proxyPort, MyBayesProxy, proxyArgs)
print 'Listener on port %s is proxying %s:%d' % \
(_addressPortStr(proxyPort), serverName, serverPort)
--- 887,898 ----
! class RedirectingBayesProxyListener(Dibbler.Listener):
"""Listens for incoming email client connections and spins off
! RedirectingBayesProxy objects to serve them.
"""
def __init__(self, serverName, serverPort, proxyPort, spam, unsure):
proxyArgs = (serverName, serverPort, spam, unsure)
! Dibbler.Listener.__init__(self, proxyPort, RedirectingBayesProxy,
! proxyArgs)
print 'Listener on port %s is proxying %s:%d' % \
(_addressPortStr(proxyPort), serverName, serverPort)
***************
*** 923,985 ****
def setup():
! # Setup state, app, boxes, trainers and account
state.createWorkers()
proxyListeners = []
- app = Application("SpambayesIMAPServer")
! spam_box = SpambayesMailbox("Spam", 0, options["Storage",
! "spam_cache"])
! unsure_box = SpambayesMailbox("Unsure", 1, options["Storage",
! "unknown_cache"])
ham_train_box = SpambayesMailbox("TrainAsHam", 2,
options["Storage", "ham_cache"])
! inbox = SpambayesInbox(3)
! spam_trainer = Trainer(spam_box, True)
ham_trainer = Trainer(ham_train_box, False)
! spam_box.addListener(spam_trainer)
ham_train_box.addListener(ham_trainer)
user_account = SpambayesAccount(options["imapserver", "username"],
ham_train_box, spam_box, unsure_box,
! inbox)
! # add IMAP4 server
f = OneParameterFactory()
f.protocol = SpambayesIMAPServer
f.parameter = user_account
! state.imap_port = options["imapserver", "port"]
! app.listenTCP(state.imap_port, f)
! # add POP3 proxy
for (server, serverPort), proxyPort in zip(state.servers,
state.proxyPorts):
! listener = MyBayesProxyListener(server, serverPort, proxyPort,
! spam_box, unsure_box)
proxyListeners.append(listener)
state.buildServerStrings()
- # add web interface
- httpServer = UserInterfaceServer(state.uiPort)
- serverUI = ServerUserInterface(state, _recreateState)
- httpServer.register(serverUI)
-
- return app
-
def run():
# Read the arguments.
try:
! opts, args = getopt.getopt(sys.argv[1:], 'hbd:D:u:o:')
except getopt.error, msg:
print >>sys.stderr, str(msg) + '\n\n' + __doc__
sys.exit()
- launchUI = False
for opt, arg in opts:
if opt == '-h':
print >>sys.stderr, __doc__
sys.exit()
- elif opt == '-b':
- launchUI = True
elif opt == '-o':
options.set_from_cmdline(arg, sys.stderr)
--- 921,977 ----
def setup():
! # Setup state, server, boxes, trainers and account.
! state.imap_port = options["imapserver", "port"]
state.createWorkers()
proxyListeners = []
! spam_box = SpambayesMailbox("Spam", 0,
! options["Storage", "spam_cache"])
! unsure_box = SpambayesMailbox("Unsure", 1,
! options["Storage", "unknown_cache"])
ham_train_box = SpambayesMailbox("TrainAsHam", 2,
options["Storage", "ham_cache"])
! # We don't have a third cache location in the directory, so make one up.
! spam_train_cache = os.path.join(options["Storage", "ham_cache"], "..",
! "spam_to_train")
! spam_train_box = SpambayesMailbox("TrainAsSpam", 3, spam_train_cache)
! inbox = SpambayesInbox(4)
! spam_trainer = Trainer(spam_train_box, True)
ham_trainer = Trainer(ham_train_box, False)
! spam_train_box.addListener(spam_trainer)
ham_train_box.addListener(ham_trainer)
user_account = SpambayesAccount(options["imapserver", "username"],
ham_train_box, spam_box, unsure_box,
! spam_train_box, inbox)
! # Add IMAP4 server.
f = OneParameterFactory()
f.protocol = SpambayesIMAPServer
f.parameter = user_account
! reactor.listenTCP(state.imap_port, f)
! # Add POP3 proxy.
for (server, serverPort), proxyPort in zip(state.servers,
state.proxyPorts):
! listener = RedirectingBayesProxyListener(server, serverPort,
! proxyPort, spam_box,
! unsure_box)
proxyListeners.append(listener)
state.buildServerStrings()
def run():
# Read the arguments.
try:
! opts, args = getopt.getopt(sys.argv[1:], 'ho:')
except getopt.error, msg:
print >>sys.stderr, str(msg) + '\n\n' + __doc__
sys.exit()
for opt, arg in opts:
if opt == '-h':
print >>sys.stderr, __doc__
sys.exit()
elif opt == '-o':
options.set_from_cmdline(arg, sys.stderr)
***************
*** 988,1001 ****
print get_version_string("IMAP Server")
print get_version_string("POP3 Proxy")
! print "and engine %s," % (get_version_string(),)
from twisted.copyright import version as twisted_version
! print "with twisted version %s.\n" % (twisted_version,)
! # setup everything
! app = setup()
! # kick things off
! thread.start_new_thread(Dibbler.run, (launchUI,))
! app.run(save=False)
if __name__ == "__main__":
--- 980,994 ----
print get_version_string("IMAP Server")
print get_version_string("POP3 Proxy")
! print get_version_string()
from twisted.copyright import version as twisted_version
! print "Twisted version %s.\n" % (twisted_version,)
! # Setup everything.
! setup()
! # Kick things off. The asyncore stuff doesn't play nicely
! # with twisted (or vice-versa), so put them in separate threads.
! thread.start_new_thread(Dibbler.run, ())
! reactor.run()
if __name__ == "__main__":
From anadelonbrin at users.sourceforge.net Mon Nov 8 02:20:39 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Mon Nov 8 02:20:42 2004
Subject: [Spambayes-checkins] website quotes.ht,1.10,1.11
Message-ID:
Update of /cvsroot/spambayes/website
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv15058
Modified Files:
quotes.ht
Log Message:
Add another quote.
Index: quotes.ht
===================================================================
RCS file: /cvsroot/spambayes/website/quotes.ht,v
retrieving revision 1.10
retrieving revision 1.11
diff -C2 -d -r1.10 -r1.11
*** quotes.ht 9 Aug 2004 06:18:41 -0000 1.10
--- quotes.ht 8 Nov 2004 01:20:35 -0000 1.11
***************
*** 90,93 ****
--- 90,100 ----
+
+ If you use Outlook, drop everything and get SpamBayes.
+ Scott Spanbauer with sage advice in a
+ PCWorld
+ article.
+
+
Spamotomy users have a
bit to say, too!
From anadelonbrin at users.sourceforge.net Mon Nov 8 02:23:02 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Mon Nov 8 02:23:12 2004
Subject: [Spambayes-checkins] website faq.txt,1.81,1.82
Message-ID:
Update of /cvsroot/spambayes/website
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv15516
Modified Files:
faq.txt
Log Message:
Update FAQ in various ways:
1. Get rid of things specific to certain alpha versions. People shouldn't be using
those any more, and you can't download them anywhere.
2. Update training material to point to the wiki, and stop encouraging people to do
lots of pretraining.
3. Correct a few entries that refer to ways things were done in old versions that
have since changed.
Index: faq.txt
===================================================================
RCS file: /cvsroot/spambayes/website/faq.txt,v
retrieving revision 1.81
retrieving revision 1.82
diff -C2 -d -r1.81 -r1.82
*** faq.txt 11 Aug 2004 04:50:42 -0000 1.81
--- faq.txt 8 Nov 2004 01:22:59 -0000 1.82
***************
*** 33,48 ****
train it on representative samples of email you receive. After it's been
trained, you use SpamBayes to classify new mail according to its spamminess
! and hamminess qualities.
!
! To train SpamBayes (which you don't need to do if you're going to be using
! the POP3 proxy to classify messages, but you'll get better results from the
! outset if you do) you need to save your incoming email for awhile,
! segregating it into two piles, known spam and known ham (ham is our nickname
! for good mail). It's best to train on recent email, because your interests
! and the nature of what spam looks like change over time. Once you've
! collected a fair portion of each, you can tell SpamBayes, "Here's my
! ham and my spam". It will then process that mail and save information about
! different patterns which appear in ham and spam. That information is then
! used during the filtering stage.
When SpamBayes filters your email, it compares each unclassified message
--- 33,38 ----
train it on representative samples of email you receive. After it's been
trained, you use SpamBayes to classify new mail according to its spamminess
! and hamminess qualities. It's best to train on recent email, because your
! interests and the nature of what spam looks like change over time.
When SpamBayes filters your email, it compares each unclassified message
***************
*** 163,168 ****
give it messages, tell it whether those messages are ham or spam, and it
adjusts its probabilities accordingly. How to train it is covered below.
! By default it lives in a file called "hammie.db", "statistics_database.db"
! or (for the Outlook plugin) "default_bayes_database".
2. The tokenizer/classifier. This is the core engine of the system. The
--- 153,158 ----
give it messages, tell it whether those messages are ham or spam, and it
adjusts its probabilities accordingly. How to train it is covered below.
! By default it lives in a file called "hammie.db", or (for the Outlook
! plugin) "default_bayes_database".
2. The tokenizer/classifier. This is the core engine of the system. The
***************
*** 547,552 ****
! Will SpamBayes work with Outlook 2000 connecting to an Exchange 2000 server?
! ----------------------------------------------------------------------------
Yes.
--- 537,542 ----
! Will SpamBayes work with Outlook connecting to an Exchange server?
! ------------------------------------------------------------------
Yes.
***************
*** 555,561 ****
--------------------------------------------------------
! Yes, in version 008 and above of the plugin. You can find this on the
! filtering tab of the SpamBayes manager dialog. However, you should also
! see the `envelope icon question`_.
.. _`envelope icon question`: #how-can-I-get-rid-of-the-envelope-tray-icon-for-spam
--- 545,550 ----
--------------------------------------------------------
! Yes. You can find this on the filtering tab of the SpamBayes manager
! dialog. However, you should also see the `envelope icon question`_.
.. _`envelope icon question`: #how-can-I-get-rid-of-the-envelope-tray-icon-for-spam
***************
*** 575,583 ****
back in to recover from a corrupted database, or for any other reason.
! This directory is located in the "Application Data" directory. If you have
! version 008 of the plug-in, or higher, you can locate this directory by
! using the `Show Data Folder` button on the `Advanced` tab of the main
! `SpamBayes` manager dialog. If you need to locate it by hand, on Windows
! 2000/XP, it will probably be:
C:\\Documents and Settings\\[username]\\Application Data\\Spambayes
--- 564,571 ----
back in to recover from a corrupted database, or for any other reason.
! This directory is located in the "Application Data" directory. You can
! locate this directory by using the `Show Data Folder` button on the
! `Advanced` tab of the main `SpamBayes` manager dialog. If you need to
! locate it by hand, on Windows 2000/XP, it will probably be:
C:\\Documents and Settings\\[username]\\Application Data\\Spambayes
***************
*** 612,624 ****
you need to have done these things to enable the button:
! 1. Trained at least 5 ham and 5 spam
!
! 2. Set at least one folder to watch
!
! 3. Set folders to move spam to, and to move unsures to
! 4. Changed the action to "copy" or "move", rather than "untouched"
! 5. Ticked the "enable SpamBayes" checkbox on the first tab of the dialog.
--- 600,608 ----
you need to have done these things to enable the button:
! 1. Set at least one folder (not your unsure or spam folder) to watch
! 2. Set folders to move spam to, and to move unsures to
! 3. Ticked the "enable SpamBayes" checkbox on the first tab of the dialog.
***************
*** 764,770 ****
Basically, you need to create a file "default_configuration.ini", and
! put it either in the directory that SpamBayes was installed into, or in
! the default data directory (the `backup question`_ has instructions for
! finding this directory).
Inside this file, you need to have a section "General", and an option
--- 748,754 ----
Basically, you need to create a file "default_configuration.ini", and
! put it either in the bin directory in the directory that SpamBayes was
! installed into, or in the default data directory (the `backup question`_
! has instructions for finding this directory).
Inside this file, you need to have a section "General", and an option
***************
*** 831,840 ****
select the SpamBayes toolbar and click "Delete".
- With the 008.1 and earlier versions of the plug-in, some entries may be left
- in the registry. These should be harmless, but if they bother you (and you
- are confident mucking about with the registry, which we do *not* recommend),
- then you can remove those keys yourself. Newer versions of the installer
- correctly remove these entries.
-
.. _`backup question`: #can-i-back-up-the-outlook-database-should-i-do-this
.. _`a bug with the plug-in`: http://sourceforge.net/tracker/index.php?func=detail&aid=675811&group_id=61702&atid=498103
--- 815,818 ----
***************
*** 894,907 ****
Follow the "Review messages" link and you'll see a list of the emails that
the system has seen so far. Check the appropriate boxes and hit Train. The
! messages disappear (eventually you'll be able to get back to them, for
! instance to correct any training mistakes) and if you go back to the home
! page you'll see that the "Total emails trained" has increased.
Once you've done this on a few spams and a few hams, you'll find that the
X-Spambayes-Classification header is getting it right most of the time. The
! more you train it the more accurate it gets. There's no need to train it on
! every message you receive, but you should train on a few spams and a few
! hams on a regular basis. You should also try to train it on about the same
! number of spams as hams.
You can train it on lots of messages in one go by either using the sb_filter
--- 872,883 ----
Follow the "Review messages" link and you'll see a list of the emails that
the system has seen so far. Check the appropriate boxes and hit Train. The
! messages disappear and if you go back to the home page you'll see that the
! "Total emails trained" has increased.
Once you've done this on a few spams and a few hams, you'll find that the
X-Spambayes-Classification header is getting it right most of the time. The
! more you train it the more accurate it gets, but not that you should try to
! train it on about the same number of spams as hams. The `SpamBayes wiki`_
! has some `information about training`_ that you may wish to read.
You can train it on lots of messages in one go by either using the sb_filter
***************
*** 911,914 ****
--- 887,893 ----
using Outlook Express dbx files.
+ .. _`SpamBayes wiki`: http://entrian.com/sbwiki
+ .. _`information about training`: http://entrian.com/sbwiki/TrainingIdeas
+
How do I train SpamBayes (forward/bounce method)?
***************
*** 936,940 ****
containing nothing but ham, you can train SpamBayes using a command like::
! sb_mboxtrain.py -g ~/tmp/newham -s ~/tmp/newspam
The above command is OS-centric (e.g., UNIX, or Windows command prompt).
--- 915,919 ----
containing nothing but ham, you can train SpamBayes using a command like::
! python sb_mboxtrain.py -g ~/tmp/newham -s ~/tmp/newspam
The above command is OS-centric (e.g., UNIX, or Windows command prompt).
***************
*** 1016,1020 ****
2. It is quite important that you have trained on roughly equal numbers of
! ham and spam (don't go above a 2::1 ratio, for example).
3. Have you trained on a reasonable number of hams and spams? You should
--- 995,999 ----
2. It is quite important that you have trained on roughly equal numbers of
! ham and spam (don't go above a 4::1 ratio, for example).
3. Have you trained on a reasonable number of hams and spams? You should
***************
*** 1170,1174 ****
Sadly, not much is done in the way of testing these days. Hopefully this
! will change, though, and if you're interested it's definately an option.
Check out the README-DEVEL for information about how to get started. This is the
way to go if you have a new idea, too - even if you convince someone else to
--- 1149,1153 ----
Sadly, not much is done in the way of testing these days. Hopefully this
! will change, though, and if you're interested it's definitely an option.
Check out the README-DEVEL for information about how to get started. This is the
way to go if you have a new idea, too - even if you convince someone else to
***************
*** 1213,1218 ****
couple other tools, `POPFile `_ and `CRM114
`_. A demonstration script which performs
! n-way classification was also recently added to the ``contrib`` directory of
! the SpamBayes CVS repository.
--- 1192,1197 ----
couple other tools, `POPFile `_ and `CRM114
`_. A demonstration script which performs
! n-way classification in also in the ``contrib`` directory of the SpamBayes
! source.
***************
*** 1228,1246 ****
To use a pickle, set the option "persistent_use_database" to False in your
`configuration file <#how-do-i-configure-spambayes>`_,
! in the section "Storage" (if you have been using SpamBayes for a while,
! check that you don't have an old version of this option elsewhere in your
! configuration file, in the pop3proxy or hammiefilter sections). You may
! also wish to change the name of the storage file (to end with "pck", for
! example), but this is not necessary - to do so, change the
! "persistent_storage_file" option (also in the "Storage" section).
If you specify your database on the command line ("sb_server.py -d hammie.db",
! for example), then you should use the "-D" switch instead. Note, however,
! that it is likely that these switches will change in a future release, and
! using the configuration file is a much safer option.
Note that if you have an existing database, which is not a pickle, you can
not keep using it - this will cause errors. You need to either retrain
! from scratch, or use the dbExpImp script to convert it to a pickle.
--- 1207,1221 ----
To use a pickle, set the option "persistent_use_database" to False in your
`configuration file <#how-do-i-configure-spambayes>`_,
! in the section "Storage". You may also wish to change the name of the
! storage file (to end with "pck", for example), but this is not necessary
! - to do so, change the "persistent_storage_file" option (also in the
! "Storage" section).
If you specify your database on the command line ("sb_server.py -d hammie.db",
! for example), then you should use the "-p" switch instead.
Note that if you have an existing database, which is not a pickle, you can
not keep using it - this will cause errors. You need to either retrain
! from scratch, or use the sb_dbexpimp.py script to convert it to a pickle.
***************
*** 1255,1271 ****
for example.
! In releases up to and including 1.0a4, you need to edit the
! bayescustomize.ini script (the configuration page on the website tells you
! where this is located). In the ``[html_ui]`` section (create one if there
! isn't one already), add the line: ``allow_remote_connections:True``. If
! you can, you might want to firewall outside access to port 8880, to stop
! unauthorised users from messing with the web interface.
!
! In versions after 1.0a4, you can specify IP addresses or ranges that you
! want to be allowed access (two or three machines, for example). You can
! also do this via the web configuration, without having to alter the
! configuration file manually. The option you are after is called
! ``Allowed remote connections``. In versions after 1.0a5, you can also
! set the interface to use HTTP-AUTH, either Basic or Digest.
--- 1230,1238 ----
for example.
! You can specify IP addresses or ranges that you
! want to be allowed access (two or three machines, for example), via the web
! configuration. The option you are after is called
! ``Allowed remote connections``. You can also set the interface to use
! HTTP-AUTH, either Basic or Digest.
***************
*** 1284,1324 ****
============================
- Pop3proxy doesn't work with fetchmail.
- --------------------------------------
-
- This is a known problem in releases up to and including 1.0a4, fixed in CVS
- on 28th July 2003. To work around it, use fetchmail's ``fetchall`` option.
-
My database keeps getting corrupted.
------------------------------------
! You may be using the 'dumbdbm' system for your database.
! 'dumbdbm' is the default database system - the one that gets fallen back on
! when nothing else is available. It is not usually a good choice, and in
! SpamBayes' case, always the wrong one. Some versions of dumbdbm have a bug
! that will cause database corruption, but you shouldn't be using it anyway,
! as it is very inefficient. Instead, either
! `use a pickle <#how-do-i-use-a-pickle-for-storage>`_ or install `pybsddb`_
! (bsddb3) and use that instead.
If you are not sure which database systems
you have available, and/or which one you are currently using, there is a
script in the utilities folder called `which_database.py`_ that will display
! this information (Windows users should run it from a command prompt). This
! file is only included in releases after 1.0a4 - if you are using an earlier
! version, you can download it from cvs, or just go by the name(s) of the
! database file(s). If you have a single file, probably called ``hammie.db``,
! then you are probably not using dumbdbm. If you have three files (probably
! called ``hammie.db.dir``, ``hammie.db.dat`` and ``hammie.db.bak``), then you
! most likely are using dumbdbm, and should stop. Note that users of the
! pop3proxy_service can not currently use which_database.py.
- Support for dumbdbm has been dropped since release 1.0a6.
-
- Note that none of this applies to the Outlook plug-in, which avoids it
- on your behalf.
.. _which_database.py: http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/*checkout*/spambayes/spambayes/utilities/which_database.py?rev=HEAD&content-type=text/plain
! .. _pybsddb: http://pybsddb.sourceforge.net/
--- 1251,1281 ----
============================
My database keeps getting corrupted.
------------------------------------
! Despite the efforts of the developers, there are still occasional problems
! with database corruption. Known potential causes include:
!
! 1. Accessing the database files from more than one process concurrently.
!
! 2. Interupting SpamBayes in the midst of training (through a program or
! machine crash, for example).
!
! If you experience consisent corruption, or can provide a set of steps that
! will consisently cause the database to be corrupted, please email
! the `mailing list`_, describing your situation.
!
! Otherwise, you should simply retrain from scratch. You may wish to change
! to an alternative database system to try and avoid these problems.
If you are not sure which database systems
you have available, and/or which one you are currently using, there is a
script in the utilities folder called `which_database.py`_ that will display
! this information (Windows users should run it from a command prompt). Note
! that users of the pop3proxy_service can not currently use which_database.py.
.. _which_database.py: http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/*checkout*/spambayes/spambayes/utilities/which_database.py?rev=HEAD&content-type=text/plain
! .. _mailing list: mailto:spambayes@python.org
***************
*** 1328,1333 ****
If you get a message that looks like:
DBRunRecoveryError: (-30982, 'DB_RUNRECOVERY: Fatal error, run database
! recovery -- fatal region error detected; run recovery')
! This, sadly, means that your training database is corrupted, and you have
no choice but to delete it and train again from scratch. We don't know what
causes this to happen, but we are trying to fix it. If you find it happens
--- 1285,1290 ----
If you get a message that looks like:
DBRunRecoveryError: (-30982, 'DB_RUNRECOVERY: Fatal error, run database
! recovery -- fatal region error detected; run recovery'),
! this, sadly, means that your training database is corrupted, and you have
no choice but to delete it and train again from scratch. We don't know what
causes this to happen, but we are trying to fix it. If you find it happens
***************
*** 1337,1340 ****
--- 1294,1301 ----
reproduce the problem, so tracking it down is proving very difficult.
+ Note that the "database recovery" that you are told to run does not apply.
+ This is a message provided by the underlying bsddb database system, and
+ cannot be used in this case.
+
If you don't want to risk it happening again, switch to using the pickle
storage (web interface: Configuration / Advanced Configuration /
***************
*** 1359,1424 ****
- The readme says that I can delete the files after doing "setup.py install", but then I can't find pop3proxy_service.py or pop3proxy_tray.py.
- --------------------------------------------------------------------------------------------------------------------------------------------
-
- This is a mistake in either the readme or setup.py in the 1.0a6 release.
- It's fixed in the 1.0a7 release, so that pop3proxy_service.py and
- pop3proxy_tray.py will also be installed to the Python scripts directory
- (if you are running Windows).
-
-
- I can't train via the web interface in 1.0a6!
- ---------------------------------------------
-
- There is a known problem with the 1.0a6 release, which is fixed in 1.0a7.
- Download the newer release from the download page.
-
- To workaround the problem if you're stuck on 1.0a6: you can't use the
- database after making any changes via the web interface configuration pages.
- To work around this, either restart SpamBayes after using the configuration
- pages, or upgrade to 1.0a7.
-
- The '500' error you receive will end with "Object does not support item
- assignment". It may also show up on other pages than the review messages
- one, such as looking up a word in the database.
-
-
- sb_imapfilter prints out "Skipping unparseable message", but the message vanishes!
- ----------------------------------------------------------------------------------
-
- This is a known problem with the 1.0a9 (0.9) release, and will be fixed in
- the next release. Unless you have something set to expunge/purge the IMAP
- folder, the original message will still be there, marked as deleted, so you
- can get it back, although malformed messages are most likely to be spam,
- anyway.
-
- If you need a fix for this before the next release, you can get sb_imapfilter.py
- from CVS (revision 1.26), and use it instead of the one included with 1.0a9 (0.9).
- You should also get message.py (revision 1.46), and replace the message.py
- in your Python Lib/site-packages/spambayes folder with it.
-
- Note that in addition to the message disappearing, you'll find a new message
- (almost certainly unsure) which is blank, apart from the SpamBayes headers.
- You may safely delete these messages. If you are training and come across
- one of these messages, you'll also have the ham/spam count in your database
- increase, without any tokens increasing their count, but that shouldn't have
- any effect, as long as it doesn't happen regularly.
-
-
- The 1.0a9 (0.9) installer is missing the pop3proxy_service file.
- ----------------------------------------------------------------
-
- There are two bugs here - one is that the readme_proxy.html file installed
- by the 1.0a9 (0.9) installer talks about a directory that doesn't exist,
- namely {Program Files}/SpamBayes/Proxy. This should be {Program Files}/SpamBayes/bin,
- but that won't help you, because the executable that you need to install
- the service isn't installed.
-
- This will be fixed in the next release (in the 'bin' directory will be a
- file called "sb_server.exe"). Until then, if you want to install sb_server
- as a service, you will need to do this from source. You can, of course,
- run sb_server.exe or sb_tray.exe, without having the service installed.
-
-
Why does the spambayes@python.org mailing list get spam?
--------------------------------------------------------
--- 1320,1323 ----
***************
*** 1535,1539 ****
results than a more general approach that just generates tokens and throws
them at the classifier. See also the file NEWTRICKS.txt in the source
! distribution - we're filing neat ideas here.
If you're interested in trying out other people's cool ideas, as well as your
--- 1434,1438 ----
results than a more general approach that just generates tokens and throws
them at the classifier. See also the file NEWTRICKS.txt in the source
! distribution - we're filing neat ideas here, and also check out the `wiki`_.
If you're interested in trying out other people's cool ideas, as well as your
***************
*** 1542,1545 ****
--- 1441,1446 ----
and give us some feedback about how they work for you.
+ .. _wiki: http://entrian.com/sbwiki
+
Are there plans to develop a server-side SpamBayes solution?
From anadelonbrin at users.sourceforge.net Mon Nov 8 03:01:19 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Mon Nov 8 03:01:21 2004
Subject: [Spambayes-checkins] spambayes/spambayes Options.py,1.115,1.116
Message-ID:
Update of /cvsroot/spambayes/spambayes/spambayes
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv23992/spambayes
Modified Files:
Options.py
Log Message:
Clarify the help text for "Hammie":"train_on_filter" to make it clearer that it doesn't
apply to the POP3 proxy or IMAP filter. (It is exposed via the sb_server web interface
as you can use that to configure sb_filter if you want - particularly if you're using
sb_upload as well).
Index: Options.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/spambayes/Options.py,v
retrieving revision 1.115
retrieving revision 1.116
diff -C2 -d -r1.115 -r1.116
*** Options.py 2 Nov 2004 21:27:42 -0000 1.115
--- Options.py 8 Nov 2004 02:01:14 -0000 1.116
***************
*** 494,498 ****
with a procmail-based solution. If you do enable this, please make
sure to retrain any mistakes. Otherwise, your word database will
! slowly become useless.""",
BOOLEAN, RESTORE),
),
--- 494,500 ----
with a procmail-based solution. If you do enable this, please make
sure to retrain any mistakes. Otherwise, your word database will
! slowly become useless. Note that this option is only used by
! sb_filter, and will have no effect on sb_server's POP3 proxy, or
! the IMAP filter.""",
BOOLEAN, RESTORE),
),
From anadelonbrin at users.sourceforge.net Mon Nov 8 05:57:41 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Mon Nov 8 05:57:45 2004
Subject: [Spambayes-checkins] spambayes/Outlook2000 addin.py,1.134,1.135
Message-ID:
Update of /cvsroot/spambayes/spambayes/Outlook2000
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv28846/Outlook2000
Modified Files:
addin.py
Log Message:
Add two extra items to the "spam clues" for the message:
1. Score/class when the message was last filtered. This is useful if you don't have
the spam field displayed, and will be useful for copies received on the mailing list.
2. Whether or not the message has been trained (and if so, as what). This is possibly
useful for the user, but could definitely be useful for copies received on the mailing
list.
Index: addin.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/Outlook2000/addin.py,v
retrieving revision 1.134
retrieving revision 1.135
diff -C2 -d -r1.134 -r1.135
*** addin.py 2 Nov 2004 21:33:46 -0000 1.134
--- addin.py 8 Nov 2004 04:57:39 -0000 1.135
***************
*** 460,463 ****
--- 460,485 ----
push("# ham trained on: %d
\n" % c.nham)
push("# spam trained on: %d
\n" % c.nspam)
+ # Score when the message was classified - this will hopefully help
+ # people realise that it may not necessarily be the same, and will
+ # help diagnosing any 'wrong' scoring reported.
+ original_score = msgstore_message.GetField(mgr.config.general.field_score_name)
+ if original_score >= mgr.config.filter.spam_threshold:
+ original_class = "spam"
+ elif original_score >= mgr.config.filter.unsure_threshold:
+ original_class = "unsure"
+ else:
+ original_class = "good"
+ push("
\n")
+ if original_score is None:
+ push("This message has not been filtered.")
+ else:
+ push("When this message was last filtered, it was classified " \
+ "as %s (it scored %d%%)." % (original_class, original_score*100))
+ # Report whether this message has been trained or not.
+ push("
\n")
+ trained_as = mgr.classifier_data.message_db.get(msgstore_message.searchkey)
+ push("This message has %sbeen trained%s." % \
+ {0 : ("", "as ham"), 1 : ("", "as spam"), None : ("not ", "")}
+ [trained_as])
# Format the clues.
push("%s Significant Tokens
\n" % len(clues))
***************
*** 666,671 ****
# Must train before moving, else we lose the message!
subject = msgstore_message.GetSubject()
! print "Moving and spam training message '%s' - " % (subject,),
! TrainAsSpam(msgstore_message, self.manager, save_db = False)
# Do the new message state if necessary.
try:
--- 688,693 ----
# Must train before moving, else we lose the message!
subject = msgstore_message.GetSubject()
! print "Moving and spam training message '%s' - " % (subject,),
! TrainAsSpam(msgstore_message, self.manager, save_db = False)
# Do the new message state if necessary.
try:
***************
*** 729,734 ****
self.manager.score(msgstore_message))
# Must train before moving, else we lose the message!
! print "Recovering to folder '%s' and ham training message '%s' - " % (restore_folder.name, subject),
! TrainAsHam(msgstore_message, self.manager, save_db = False)
# Do the new message state if necessary.
try:
--- 751,756 ----
self.manager.score(msgstore_message))
# Must train before moving, else we lose the message!
! print "Recovering to folder '%s' and ham training message '%s' - " % (restore_folder.name, subject),
! TrainAsHam(msgstore_message, self.manager, save_db = False)
# Do the new message state if necessary.
try:
From anadelonbrin at users.sourceforge.net Mon Nov 8 06:02:12 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Mon Nov 8 06:02:14 2004
Subject: [Spambayes-checkins] spambayes/Outlook2000 addin.py,1.135,1.136
Message-ID:
Update of /cvsroot/spambayes/spambayes/Outlook2000
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv29733/Outlook2000
Modified Files:
addin.py
Log Message:
Add a missing space to the last checkin.
Index: addin.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/Outlook2000/addin.py,v
retrieving revision 1.135
retrieving revision 1.136
diff -C2 -d -r1.135 -r1.136
*** addin.py 8 Nov 2004 04:57:39 -0000 1.135
--- addin.py 8 Nov 2004 05:02:09 -0000 1.136
***************
*** 480,484 ****
trained_as = mgr.classifier_data.message_db.get(msgstore_message.searchkey)
push("This message has %sbeen trained%s." % \
! {0 : ("", "as ham"), 1 : ("", "as spam"), None : ("not ", "")}
[trained_as])
# Format the clues.
--- 480,484 ----
trained_as = mgr.classifier_data.message_db.get(msgstore_message.searchkey)
push("This message has %sbeen trained%s." % \
! {0 : ("", " as ham"), 1 : ("", " as spam"), None : ("not ", "")}
[trained_as])
# Format the clues.
From anadelonbrin at users.sourceforge.net Tue Nov 9 01:46:14 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Tue Nov 9 01:46:21 2004
Subject: [Spambayes-checkins] spambayes/scripts sb_imapfilter.py,1.41,1.42
Message-ID:
Update of /cvsroot/spambayes/spambayes/scripts
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv14302/scripts
Modified Files:
sb_imapfilter.py
Log Message:
Update some comments.
Improve the order of a if statement condition.
Implement [ 940547 ] imapfilter interface available when using -l switch
Index: sb_imapfilter.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/scripts/sb_imapfilter.py,v
retrieving revision 1.41
retrieving revision 1.42
diff -C2 -d -r1.41 -r1.42
*** sb_imapfilter.py 13 Oct 2004 02:42:04 -0000 1.41
--- sb_imapfilter.py 9 Nov 2004 00:46:12 -0000 1.42
***************
*** 27,33 ****
-l minutes : period of time between filtering operations
-b : Launch a web browser showing the user interface.
- (If not specified, and neither the -c or -t
- options are used, then this will default to the
- value in your configuration file).
-o section:option:value :
set [section, option] in the options database
--- 27,30 ----
***************
*** 58,69 ****
todo = """
- o IMAPMessage and IMAPFolder currently carry out very simple checks
- of responses received from IMAP commands, but if the response is not
- "OK", then the filter terminates. Handling of these errors could be
- much nicer.
- o Develop a test script, like spambayes/test/test_pop3proxy.py that
- runs through some tests (perhaps with a *real* imap server, rather
- than a dummy one). This would make it easier to carry out the tests
- against each server whenever a change is made.
o IMAP supports authentication via other methods than the plain-text
password method that we are using at the moment. Neither of the
--- 55,58 ----
***************
*** 76,85 ****
"""
! # This module is part of the spambayes project, which is Copyright 2002-4
# The Python Software Foundation and is covered by the Python Software
# Foundation license.
__author__ = "Tony Meyer , Tim Stone"
! __credits__ = "All the Spambayes folk."
from __future__ import generators
--- 65,74 ----
"""
! # This module is part of the SpamBayes project, which is Copyright 2002-4
# The Python Software Foundation and is covered by the Python Software
# Foundation license.
__author__ = "Tony Meyer , Tim Stone"
! __credits__ = "All the SpamBayes folk."
from __future__ import generators
***************
*** 98,101 ****
--- 87,91 ----
import getopt
import types
+ import thread
import traceback
import email
***************
*** 174,178 ****
SelectFolder, rather than here, for purposes of speed."""
# We may never have logged in, in which case we do nothing.
! if self.do_expunge and self.logged_in:
# Expunge messages from the ham, spam and unsure folders.
for fol in ["spam_folder",
--- 164,168 ----
SelectFolder, rather than here, for purposes of speed."""
# We may never have logged in, in which case we do nothing.
! if self.connected and self.logged_in and self.do_expunge:
# Expunge messages from the ham, spam and unsure folders.
for fol in ["spam_folder",
***************
*** 940,949 ****
print "and engine %s.\n" % (get_version_string(),)
- if (launchUI and (doClassify or doTrain)):
- print """-b option is exclusive with -c and -t options.
- The user interface will be launched, but no classification
- or training will be performed.
- """
-
if options["globals", "verbose"]:
print "Loading database %s..." % (bdbname),
--- 930,933 ----
***************
*** 988,993 ****
imap_filter = IMAPFilter(classifier)
! # Web interface
! if not (doClassify or doTrain):
if server == "":
imap = None
--- 972,988 ----
imap_filter = IMAPFilter(classifier)
! # Web interface. We have changed the rules about this many times.
! # With 1.0.x, the rule is that the interface is served if we are
! # not classifying or training. However, this runs into the problem
! # that if we run with -l, we might still want to edit the options,
! # and we don't want to start a separate instance, because then the
! # database is accessed from two processes.
! # With 1.1.x, the rule is that the interface is also served if the
! # -l option is used, which means it is only not served if we are
! # doing a one-off classification/train. In that case, there would
! # probably not be enough time to get to the interface and interact
! # with it (and we don't want it to die halfway through!), and we
! # don't want to slow classification/training down, either.
! if sleepTime or not (doClassify or doTrain):
if server == "":
imap = None
***************
*** 997,1003 ****
httpServer.register(IMAPUserInterface(classifier, imap, pwd,
IMAPSession))
! Dibbler.run(launchBrowser=launchUI or options["html_ui",
! "launch_browser"])
! else:
while True:
imap = IMAPSession(server, port, imapDebug, doExpunge)
--- 992,1003 ----
httpServer.register(IMAPUserInterface(classifier, imap, pwd,
IMAPSession))
! launchBrowser=launchUI or options["html_ui", "launch_browser"]
! if sleepTime:
! # Run in a separate thread, as we have more work to do.
! thread.start_new_thread(Dibbler.run, (),
! {"launchBrowser":launchBrowser})
! else:
! Dibbler.run(launchBrowser=launchBrowser)
! if doClassify or doTrain:
while True:
imap = IMAPSession(server, port, imapDebug, doExpunge)
From anadelonbrin at users.sourceforge.net Tue Nov 9 03:30:36 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Tue Nov 9 03:30:40 2004
Subject: [Spambayes-checkins] spambayes/scripts sb_imapfilter.py, 1.42,
1.43 sb_pop3dnd.py, 1.11, 1.12
Message-ID:
Update of /cvsroot/spambayes/spambayes/scripts
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv6343/scripts
Modified Files:
sb_imapfilter.py sb_pop3dnd.py
Log Message:
Use email.message_from_string(text, _class) rather than our wrapper functions, to
avoid the Python 2.4 DeprecationWarnings about the strict argument.
Index: sb_imapfilter.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/scripts/sb_imapfilter.py,v
retrieving revision 1.42
retrieving revision 1.43
diff -C2 -d -r1.42 -r1.43
*** sb_imapfilter.py 9 Nov 2004 00:46:12 -0000 1.42
--- sb_imapfilter.py 9 Nov 2004 02:30:33 -0000 1.43
***************
*** 604,611 ****
self.uid = new_id
- # This performs a similar function to email.message_from_string()
- def imapmessage_from_string(s, _class=IMAPMessage, strict=False):
- return email.message_from_string(s, _class, strict)
-
class IMAPFolder(object):
--- 604,607 ----
Index: sb_pop3dnd.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/scripts/sb_pop3dnd.py,v
retrieving revision 1.11
retrieving revision 1.12
diff -C2 -d -r1.11 -r1.12
*** sb_pop3dnd.py 5 Nov 2004 03:10:04 -0000 1.11
--- sb_pop3dnd.py 9 Nov 2004 02:30:33 -0000 1.12
***************
*** 827,831 ****
try:
! msg = message.sbheadermessage_from_string(messageText)
# Now find the spam disposition and add the header.
(prob, clues) = state.bayes.spamprob(msg.asTokens(),\
--- 827,832 ----
try:
! msg = email.message_from_string(messageText,
! _class=message.SBHeaderMessage)
# Now find the spam disposition and add the header.
(prob, clues) = state.bayes.spamprob(msg.asTokens(),\
From anadelonbrin at users.sourceforge.net Tue Nov 9 03:30:36 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Tue Nov 9 03:30:40 2004
Subject: [Spambayes-checkins] spambayes/spambayes message.py, 1.56,
1.57 smtpproxy.py, 1.7, 1.8
Message-ID:
Update of /cvsroot/spambayes/spambayes/spambayes
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv6343/spambayes
Modified Files:
message.py smtpproxy.py
Log Message:
Use email.message_from_string(text, _class) rather than our wrapper functions, to
avoid the Python 2.4 DeprecationWarnings about the strict argument.
Index: message.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/spambayes/message.py,v
retrieving revision 1.56
retrieving revision 1.57
diff -C2 -d -r1.56 -r1.57
*** message.py 5 Nov 2004 03:03:00 -0000 1.56
--- message.py 9 Nov 2004 02:30:33 -0000 1.57
***************
*** 237,247 ****
# non-persistent state includes all of email.Message.Message state
! # This function (and it's hackishness) can be avoided by using the
! # message_from_string and sbheadermessage_from_string functions
! # at the end of the module. i.e. instead of doing this:
# >>> msg = spambayes.message.SBHeaderMessage()
# >>> msg.setPayload(substance)
# you do this:
! # >>> msg = sbheadermessage_from_string(substance)
# imapfilter has an example of this in action
def setPayload(self, payload):
--- 237,247 ----
# non-persistent state includes all of email.Message.Message state
! # This function (and it's hackishness) can be avoided by using
! # email.message_from_string(text, _class=SBHeaderMessage)
! # i.e. instead of doing this:
# >>> msg = spambayes.message.SBHeaderMessage()
# >>> msg.setPayload(substance)
# you do this:
! # >>> msg = email.message_from_string(substance, _class=SBHeaderMessage)
# imapfilter has an example of this in action
def setPayload(self, payload):
***************
*** 485,495 ****
del self[options['Headers','trained_header_name']]
- # These perform similar functions to email.message_from_string()
- def message_from_string(s, _class=Message, strict=False):
- return email.message_from_string(s, _class, strict)
-
- def sbheadermessage_from_string(s, _class=SBHeaderMessage, strict=False):
- return email.message_from_string(s, _class, strict)
-
# Utility function to insert an exception header into the given RFC822 text.
# This is used by both sb_server and sb_imapfilter, so it's handy to have
--- 485,488 ----
Index: smtpproxy.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/spambayes/smtpproxy.py,v
retrieving revision 1.7
retrieving revision 1.8
diff -C2 -d -r1.7 -r1.8
*** smtpproxy.py 16 Mar 2004 05:08:31 -0000 1.7
--- smtpproxy.py 9 Nov 2004 02:30:33 -0000 1.8
***************
*** 128,135 ****
import sys
import os
from spambayes import Dibbler
from spambayes import storage
! from spambayes.message import sbheadermessage_from_string
from spambayes.tokenizer import textparts
from spambayes.tokenizer import try_to_repair_damaged_base64
--- 128,136 ----
import sys
import os
+ import email
from spambayes import Dibbler
from spambayes import storage
! from spambayes import message
from spambayes.tokenizer import textparts
from spambayes.tokenizer import try_to_repair_damaged_base64
***************
*** 385,389 ****
def extractSpambayesID(self, data):
! msg = sbheadermessage_from_string(data)
# The nicest MUA is one that forwards the header intact.
--- 386,390 ----
def extractSpambayesID(self, data):
! msg = email.message_from_string(data, _class=message.SBHeaderMessage)
# The nicest MUA is one that forwards the header intact.
***************
*** 436,440 ****
self.train_cached_message(id, isSpam)
# Otherwise, train on the forwarded/bounced message.
! msg = sbheadermessage_from_string(msg)
id = msg.setIdFromPayload()
msg.delSBHeaders()
--- 437,441 ----
self.train_cached_message(id, isSpam)
# Otherwise, train on the forwarded/bounced message.
! msg = email.message_from_string(msg, _class=message.SBHeaderMessage)
id = msg.setIdFromPayload()
msg.delSBHeaders()
From anadelonbrin at users.sourceforge.net Tue Nov 9 03:37:43 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Tue Nov 9 03:37:46 2004
Subject: [Spambayes-checkins] spambayes/scripts sb_server.py,1.27,1.28
Message-ID:
Update of /cvsroot/spambayes/spambayes/scripts
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv7591/scripts
Modified Files:
sb_server.py
Log Message:
Implement [ 870524 ] Make the message-proxy timeout configurable
Also add a test for it in test_sb_server (this does vastly increase the time that
that test script takes to run, because it has to wait for the timeout).
Use email.message_from_string(text, _class) rather than our wrapper functions, to
avoid the Python 2.4 DeprecationWarnings about the strict argument.
Index: sb_server.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/scripts/sb_server.py,v
retrieving revision 1.27
retrieving revision 1.28
diff -C2 -d -r1.27 -r1.28
*** sb_server.py 10 Aug 2004 06:48:09 -0000 1.27
--- sb_server.py 9 Nov 2004 02:37:40 -0000 1.28
***************
*** 65,69 ****
o Deployment: Windows executable? atlaxwin and ctypes? Or just
webbrowser?
- o Save the stats (num classified, etc.) between sessions.
o "Reload database" button.
--- 65,68 ----
***************
*** 98,102 ****
"""
! import os, sys, re, errno, getopt, time, traceback, socket, cStringIO
from thread import start_new_thread
from email.Header import Header
--- 97,101 ----
"""
! import os, sys, re, errno, getopt, time, traceback, socket, cStringIO, email
from thread import start_new_thread
from email.Header import Header
***************
*** 240,248 ****
self.response = ''
! # Time out after 30 seconds for message-retrieval commands if
! # all the headers are down. The rest of the message will proxy
! # straight through.
if self.command in ['TOP', 'RETR'] and \
! self.seenAllHeaders and time.time() > self.startTime + 30:
self.onResponse()
self.response = ''
--- 239,249 ----
self.response = ''
! # Time out after some seconds (30 by default) for message-retrieval
! # commands if all the headers are down. The rest of the message
! # will proxy straight through.
! # See also [ 870524 ] Make the message-proxy timeout configurable
if self.command in ['TOP', 'RETR'] and \
! self.seenAllHeaders and time.time() > \
! self.startTime + options["pop3proxy", "retrieval_timeout"]:
self.onResponse()
self.response = ''
***************
*** 469,473 ****
try:
! msg = spambayes.message.sbheadermessage_from_string(messageText)
msg.setId(state.getNewMessageName())
# Now find the spam disposition and add the header.
--- 470,475 ----
try:
! msg = email.message_from_string(messageText,
! _class=spambayes.message.SBHeaderMessage)
msg.setId(state.getNewMessageName())
# Now find the spam disposition and add the header.
From anadelonbrin at users.sourceforge.net Tue Nov 9 03:37:44 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Tue Nov 9 03:37:46 2004
Subject: [Spambayes-checkins] spambayes/spambayes Options.py, 1.116,
1.117 ProxyUI.py, 1.51, 1.52
Message-ID:
Update of /cvsroot/spambayes/spambayes/spambayes
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv7591/spambayes
Modified Files:
Options.py ProxyUI.py
Log Message:
Implement [ 870524 ] Make the message-proxy timeout configurable
Also add a test for it in test_sb_server (this does vastly increase the time that
that test script takes to run, because it has to wait for the timeout).
Use email.message_from_string(text, _class) rather than our wrapper functions, to
avoid the Python 2.4 DeprecationWarnings about the strict argument.
Index: Options.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/spambayes/Options.py,v
retrieving revision 1.116
retrieving revision 1.117
diff -C2 -d -r1.116 -r1.117
*** Options.py 8 Nov 2004 02:01:14 -0000 1.116
--- Options.py 9 Nov 2004 02:37:41 -0000 1.117
***************
*** 771,774 ****
--- 771,783 ----
field to trust this only address.""",
IP_LIST, RESTORE),
+
+ ("retrieval_timeout", "Retrieval timeout", 30,
+ """When proxying mesasges, time out after this length of time if
+ all the headers have been received. The rest of the mesasge will
+ proxy straight through. Some clients have a short timeout period,
+ and will give up on waiting for the message if this is too long.
+ Note that the shorter this is, the less of long messages will be
+ used for classifications (i.e. results may be effected).""",
+ REAL, RESTORE),
),
Index: ProxyUI.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/spambayes/ProxyUI.py,v
retrieving revision 1.51
retrieving revision 1.52
diff -C2 -d -r1.51 -r1.52
*** ProxyUI.py 29 Oct 2004 00:14:42 -0000 1.51
--- ProxyUI.py 9 Nov 2004 02:37:41 -0000 1.52
***************
*** 154,157 ****
--- 154,159 ----
('pop3proxy', 'allow_remote_connections'),
('smtpproxy', 'allow_remote_connections'),
+ ('POP3 Proxy Options', None),
+ ('pop3proxy', 'retrieval_timeout'),
)
From anadelonbrin at users.sourceforge.net Tue Nov 9 03:37:44 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Tue Nov 9 03:37:49 2004
Subject: [Spambayes-checkins]
spambayes/spambayes/test test_sb_server.py, 1.1, 1.2
Message-ID:
Update of /cvsroot/spambayes/spambayes/spambayes/test
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv7591/spambayes/test
Modified Files:
test_sb_server.py
Log Message:
Implement [ 870524 ] Make the message-proxy timeout configurable
Also add a test for it in test_sb_server (this does vastly increase the time that
that test script takes to run, because it has to wait for the timeout).
Use email.message_from_string(text, _class) rather than our wrapper functions, to
avoid the Python 2.4 DeprecationWarnings about the strict argument.
Index: test_sb_server.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/spambayes/test/test_sb_server.py,v
retrieving revision 1.1
retrieving revision 1.2
diff -C2 -d -r1.1 -r1.2
*** test_sb_server.py 5 Nov 2004 02:34:28 -0000 1.1
--- test_sb_server.py 9 Nov 2004 02:37:41 -0000 1.2
***************
*** 81,84 ****
--- 81,85 ----
import operator
import re
+ import time
import getopt
import sys, os
***************
*** 113,117 ****
UIDL. USER, PASS, APOP, DELE and RSET simply return "+OK"
without doing anything. Also understands the 'KILL' command, to
! kill it. The mail content is the example messages above.
"""
--- 114,119 ----
UIDL. USER, PASS, APOP, DELE and RSET simply return "+OK"
without doing anything. Also understands the 'KILL' command, to
! kill it, and a 'SLOW' command, to change to really slow retrieval.
! The mail content is the example messages above.
"""
***************
*** 123,127 ****
self.maildrop = [spam1, good1]
self.set_terminator('\r\n')
! self.okCommands = ['USER', 'PASS', 'APOP', 'NOOP',
'DELE', 'RSET', 'QUIT', 'KILL']
self.handlers = {'CAPA': self.onCapa,
--- 125,129 ----
self.maildrop = [spam1, good1]
self.set_terminator('\r\n')
! self.okCommands = ['USER', 'PASS', 'APOP', 'NOOP', 'SLOW',
'DELE', 'RSET', 'QUIT', 'KILL']
self.handlers = {'CAPA': self.onCapa,
***************
*** 132,135 ****
--- 134,138 ----
self.push("+OK ready\r\n")
self.request = ''
+ self.push_delay = 0.0 # 0.02 is a useful value for testing.
def collect_incoming_data(self, data):
***************
*** 148,165 ****
if command == 'QUIT':
self.close_when_done()
! if command == 'KILL':
self.socket.shutdown(2)
self.close()
raise SystemExit
else:
handler = self.handlers.get(command, self.onUnknown)
! self.push(handler(command, args)) # Or push_slowly for testing
self.request = ''
def push_slowly(self, response):
! """Useful for testing."""
! for c in response:
! self.push(c)
! time.sleep(0.02)
def onCapa(self, command, args):
--- 151,179 ----
if command == 'QUIT':
self.close_when_done()
! elif command == 'KILL':
self.socket.shutdown(2)
self.close()
raise SystemExit
+ elif command == 'SLOW':
+ self.push_delay = 1.0
else:
handler = self.handlers.get(command, self.onUnknown)
! self.push_slowly(handler(command, args))
self.request = ''
def push_slowly(self, response):
! """Sometimes we push out the response slowly to try and generate
! timeouts. If the delay is 0, this just does a regular push."""
! if self.push_delay:
! for c in response.split('\n'):
! if c and c[-1] == '\r':
! self.push(c + '\n')
! else:
! # We want to trigger onServerLine, so need the '\r',
! # so modify the message just a wee bit.
! self.push(c + '\r\n')
! time.sleep(self.push_delay * len(c))
! else:
! self.push(response)
def onCapa(self, command, args):
***************
*** 291,295 ****
# Ask for the capabilities via the proxy, and verify that the proxy
! # is filtering out the PIPELINING capability.
proxy.send("capa\r\n")
response = proxy.recv(1000)
--- 305,309 ----
# Ask for the capabilities via the proxy, and verify that the proxy
! # is filtering out the STLS capability.
proxy.send("capa\r\n")
response = proxy.recv(1000)
***************
*** 311,314 ****
--- 325,341 ----
assert response.find(options["Headers", "classification_header_name"]) >= 0
+ # Check that the proxy times out when it should.
+ options["pop3proxy", "retrieval_timeout"] = 30
+ options["Headers", "include_evidence"] = False
+ assert spam1.find('\n\n') > options["pop3proxy", "retrieval_timeout"]
+ print "This test is rather slow..."
+ proxy.send("slow\r\n")
+ response = proxy.recv(100)
+ assert response.find("OK") != -1
+ proxy.send("retr 1\r\n")
+ response = proxy.recv(1000)
+ assert len(response) < len(spam1)
+ print "Slow test done. Thanks for waiting!"
+
# Smoke-test the HTML UI.
httpServer = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
From anadelonbrin at users.sourceforge.net Tue Nov 9 04:13:03 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Tue Nov 9 04:13:07 2004
Subject: [Spambayes-checkins] spambayes CHANGELOG.txt,1.47,1.48
Message-ID:
Update of /cvsroot/spambayes/spambayes
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv14948
Modified Files:
CHANGELOG.txt
Log Message:
Bring up-to-date.
Index: CHANGELOG.txt
===================================================================
RCS file: /cvsroot/spambayes/spambayes/CHANGELOG.txt,v
retrieving revision 1.47
retrieving revision 1.48
diff -C2 -d -r1.47 -r1.48
*** CHANGELOG.txt 12 Oct 2004 23:53:24 -0000 1.47
--- CHANGELOG.txt 9 Nov 2004 03:13:00 -0000 1.48
***************
*** 3,6 ****
--- 3,36 ----
Release 1.1a1
=============
+ Tony Meyer 09/11/2004 Implement [ 870524 ] Make the message-proxy timeout configurable
+ Tony Meyer 09/11/2004 Use email.message_from_string(text, _class) rather than our wrapper functions.
+ Tony Meyer 09/11/2004 Implement [ 940547 ] imapfilter interface available when using -l switch
+ Tony Meyer 08/11/2004 Outlook: Add two extra items to the "spam clues" for the message: last filtered score/class and if it has been trained.
+ Tony Meyer 05/11/2004 Add unittests for sb_pop3dnd.py
+ Tony Meyer 05/11/2004 sb_pop3dnd: remove use of the web interface
+ Tony Meyer 05/11/2004 sb_pop3dnd: fix bug in getHeaders where negation wouldn't work correctly
+ Tony Meyer 05/11/2004 sb_pop3dnd: fix loading of dynamic messages to correctly generate the headers, so that envelope works.
+ Tony Meyer 05/11/2004 sb_pop3dnd: change the fake email addresses to the same format as the notate_to option (i.e. @spambayes.invalid)
+ Tony Meyer 05/11/2004 sb_pop3dnd: improve the "about" message to include the docstring.
+ Tony Meyer 05/11/2004 sb_pop3dnd: add a dynamic stats message.
+ Tony Meyer 05/11/2004 sb_pop3dnd: improve the dynamic status message to include everything that would normally be on the web interface.
+ Tony Meyer 05/11/2004 sb_pop3dnd: add a "train as spam" folder, to separate out training and classifying as spam.
+ Tony Meyer 05/11/2004 sb_pop3dnd: use twisted.Application in the new style to avoid deprecation warnings.
+ Tony Meyer 03/11/2004 Add [ 1052816 ] I18N - mostly the patch from Hernan Martinez Foffani
+ Tony Meyer 03/11/2004 Fix [ 1022848 ] sb_dbexpimp.py crashes while importing into pickle file
+ Tony Meyer 03/11/2004 Fix [ 831864 ] sb_mboxtrain.py: flock vs. lockf
+ Tony Meyer 03/11/2004 Fix [ 922063 ] Intermittent sb_filter.py faliure with URL pickle
+ Tony Meyer 03/11/2004 Outlook: Also add an "X-Exchange-Delivery-Time" header to the faked up Exchange headers.
+ Tony Meyer 02/11/2004 Improve the web interface statistics
+ Tony Meyer 29/10/2004 If possible, use the builtin (faster, C-implemented) set class, falling back to sets.Set, then back to our compatsets.Set
+ Tony Meyer 28/10/2004 Add [ 715248 ] Pickle classifier should save to a temp file first
+ Tony Meyer 28/10/2004 Add [ 938992 ] Allow longer background filtering delays
+ Tony Meyer 27/10/2004 Add a variety of improvements to sb_culler.py contributed by Andrew Dalke
+ Tony Meyer 27/10/2004 Update sb_culler.py to match current open_storage() usage
+ Tony Meyer 21/10/2004 Fix [ 1051081 ] uncaught socket timeoutexception slurping URLs
+ Tony Meyer 20/10/2004 Outlook: Let the statistics have a variable number of decimal places for the percentages (1 by default).
+ Tony Meyer 18/10/2004 Make msgs.Msg objects pickleable
+ Tony Meyer 18/10/2004 Copy Skip's -o command line option (available in all the regular scripts) to timcv.py.
+ Tony Meyer 18/10/2004 TestDriver: If show_histograms was False, then the global ham/spam histogram never had the stats computed, but this gets used later, so the script would die with an AtrributeError. Fix that.
Tony Meyer 13/10/2004 Add Classifier.use_bigrams option to the Advanced options page for sb_server and imapfilter.
Tony Meyer 13/10/2004 Fix mySQL storage option for the case where the server does not support rollbacks.
From anadelonbrin at users.sourceforge.net Tue Nov 9 22:47:01 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Tue Nov 9 22:47:05 2004
Subject: [Spambayes-checkins] spambayes/windows .cvsignore,1.3,1.3.4.1
Message-ID:
Update of /cvsroot/spambayes/spambayes/windows
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv14158/windows
Modified Files:
Tag: release_1_0-branch
.cvsignore
Log Message:
Backport ignoring *pyc
Index: .cvsignore
===================================================================
RCS file: /cvsroot/spambayes/spambayes/windows/.cvsignore,v
retrieving revision 1.3
retrieving revision 1.3.4.1
diff -C2 -d -r1.3 -r1.3.4.1
*** .cvsignore 12 Feb 2004 21:11:26 -0000 1.3
--- .cvsignore 9 Nov 2004 21:46:53 -0000 1.3.4.1
***************
*** 1,2 ****
--- 1,3 ----
SpamBayes-Setup.exe
spambayes-*.exe
+ *.pyc
From anadelonbrin at users.sourceforge.net Tue Nov 9 22:48:21 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Tue Nov 9 22:48:25 2004
Subject: [Spambayes-checkins]
spambayes/windows/docs/images .cvsignore, NONE, 1.1.2.1
Message-ID:
Update of /cvsroot/spambayes/spambayes/windows/docs/images
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv14373/windows/docs/images
Added Files:
Tag: release_1_0-branch
.cvsignore
Log Message:
Ignore Windows thumbs.db file.
--- NEW FILE: .cvsignore ---
Thumbs.db
From anadelonbrin at users.sourceforge.net Tue Nov 9 23:03:31 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Tue Nov 9 23:03:35 2004
Subject: [Spambayes-checkins] spambayes/spambayes classifier.py, 1.23,
1.23.4.1
Message-ID:
Update of /cvsroot/spambayes/spambayes/spambayes
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv19075/spambayes
Modified Files:
Tag: release_1_0-branch
classifier.py
Log Message:
Backport:
Fix [ 922063 ] Intermittent sb_filter.py faliure with URL pickle
Fix [ 1051081 ] uncaught socket timeoutexception slurping URLs
Index: classifier.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/spambayes/classifier.py,v
retrieving revision 1.23
retrieving revision 1.23.4.1
diff -C2 -d -r1.23 -r1.23.4.1
*** classifier.py 6 Feb 2004 21:43:00 -0000 1.23
--- classifier.py 9 Nov 2004 22:03:27 -0000 1.23.4.1
***************
*** 527,533 ****
'synthetic' tokens get bigram'ed, too.
! The bigram token is simply "unigram1 unigram2" - a space should
be sufficient as a separator, since spaces aren't in any other
! tokens, apart from 'synthetic' ones.
If the experimental "Classifier":"x-use_bigrams" option is
--- 527,536 ----
'synthetic' tokens get bigram'ed, too.
! The bigram token is simply "bi:unigram1 unigram2" - a space should
be sufficient as a separator, since spaces aren't in any other
! tokens, apart from 'synthetic' ones. The "bi:" prefix is added
! to avoid conflict with tokens we generate (like "subject: word",
! which could be "word" in a subject, or a bigram of "subject:" and
! "word").
If the experimental "Classifier":"x-use_bigrams" option is
***************
*** 607,611 ****
if os.path.exists(self.bad_url_cache_name):
b_file = file(self.bad_url_cache_name, "r")
! self.bad_urls = pickle.load(b_file)
b_file.close()
else:
--- 610,623 ----
if os.path.exists(self.bad_url_cache_name):
b_file = file(self.bad_url_cache_name, "r")
! try:
! self.bad_urls = pickle.load(b_file)
! except IOError, ValueError:
! # Something went wrong loading it (bad pickle,
! # probably). Start afresh.
! if options["globals", "verbose"]:
! print >>sys.stderr, "Bad URL pickle, using new."
! self.bad_urls = {"url:non_resolving": (),
! "url:non_html": (),
! "url:unknown_error": ()}
b_file.close()
else:
***************
*** 617,621 ****
if os.path.exists(self.http_error_cache_name):
h_file = file(self.http_error_cache_name, "r")
! self.http_error_urls = pickle.load(h_file)
h_file.close()
else:
--- 629,640 ----
if os.path.exists(self.http_error_cache_name):
h_file = file(self.http_error_cache_name, "r")
! try:
! self.http_error_urls = pickle.load(h_file)
! except IOError, ValueError:
! # Something went wrong loading it (bad pickle,
! # probably). Start afresh.
! if options["globals", "verbose"]:
! print >>sys.stderr, "Bad HHTP error pickle, using new."
! self.http_error_urls = {}
h_file.close()
else:
***************
*** 626,635 ****
# XXX be a good thing long-term (if a previously invalid URL
# XXX becomes valid, for example).
! b_file = file(self.bad_url_cache_name, "w")
! pickle.dump(self.bad_urls, b_file)
! b_file.close()
! h_file = file(self.http_error_cache_name, "w")
! pickle.dump(self.http_error_urls, h_file)
! h_file.close()
def slurp(self, proto, url):
--- 645,661 ----
# XXX be a good thing long-term (if a previously invalid URL
# XXX becomes valid, for example).
! for name, data in [(self.bad_url_cache_name, self.bad_urls),
! (self.http_error_cache_name, self.http_error_urls),]:
! # Save to a temp file first, in case something goes wrong.
! cache = open(name + ".tmp", "w")
! pickle.dump(data, cache)
! cache.close()
! try:
! os.rename(name + ".tmp", name)
! except OSError:
! # Atomic replace isn't possible with win32, so just
! # remove and rename.
! os.remove(name)
! os.rename(name + ".tmp", name)
def slurp(self, proto, url):
***************
*** 698,711 ****
return ["url:unknown_error"]
! # Anything that isn't text/html is ignored
! content_type = f.info().get('content-type')
! if content_type is None or \
! not content_type.startswith("text/html"):
! self.bad_urls["url:non_html"] += (url,)
! return ["url:non_html"]
! page = f.read()
! headers = str(f.info())
! f.close()
fake_message_string = headers + "\r\n" + page
--- 724,743 ----
return ["url:unknown_error"]
! try:
! # Anything that isn't text/html is ignored
! content_type = f.info().get('content-type')
! if content_type is None or \
! not content_type.startswith("text/html"):
! self.bad_urls["url:non_html"] += (url,)
! return ["url:non_html"]
! page = f.read()
! headers = str(f.info())
! f.close()
! except socket.error:
! # This is probably a temporary error, like a timeout.
! # For now, just bail out.
! return []
!
fake_message_string = headers + "\r\n" + page
From anadelonbrin at users.sourceforge.net Tue Nov 9 23:07:32 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Tue Nov 9 23:07:35 2004
Subject: [Spambayes-checkins] spambayes/spambayes storage.py,1.41,1.41.4.1
Message-ID:
Update of /cvsroot/spambayes/spambayes/spambayes
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv20118/spambayes
Modified Files:
Tag: release_1_0-branch
storage.py
Log Message:
Backport:
[ 715248 ] Pickle classifier should save to a temp file first
Fix mySQL storage option for the case where the server does not support rollbacks.
Index: storage.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/spambayes/storage.py,v
retrieving revision 1.41
retrieving revision 1.41.4.1
diff -C2 -d -r1.41 -r1.41.4.1
*** storage.py 2 Apr 2004 18:10:52 -0000 1.41
--- storage.py 9 Nov 2004 22:07:29 -0000 1.41.4.1
***************
*** 63,66 ****
--- 63,67 ----
return not not val
+ import os
import sys
import types
***************
*** 138,144 ****
print >> sys.stderr, 'Persisting',self.db_name,'as a pickle'
! fp = open(self.db_name, 'wb')
! pickle.dump(self, fp, PICKLE_TYPE)
! fp.close()
def close(self):
--- 139,167 ----
print >> sys.stderr, 'Persisting',self.db_name,'as a pickle'
! # Be as defensive as possible; keep always a safe copy.
! tmp = self.db_name + '.tmp'
! try:
! fp = open(tmp, 'wb')
! pickle.dump(self, fp, PICKLE_TYPE)
! fp.close()
! except IOError, e:
! if options["globals", "verbose"]:
! print 'Failed update: ' + str(e)
! if fp is not None:
! os.remove(tmp)
! raise
! try:
! # With *nix we can just rename, and (as long as permissions
! # are correct) the old file will vanish. With win32, this
! # won't work - the Python help says that there may not be
! # a way to do an atomic replace, so we rename the old one,
! # put the new one there, and then delete the old one. If
! # something goes wrong, there is at least a copy of the old
! # one.
! os.rename(tmp, self.db_name)
! except OSError:
! os.rename(self.db_name, self.db_name + '.bak')
! os.rename(tmp, self.db_name)
! os.remove(self.db_name + '.bak')
def close(self):
***************
*** 535,539 ****
c.execute("select count(*) from bayes")
except MySQLdb.ProgrammingError:
! self.db.rollback()
self.create_bayes()
--- 558,568 ----
c.execute("select count(*) from bayes")
except MySQLdb.ProgrammingError:
! try:
! self.db.rollback()
! except MySQLdb.NotSupportedError:
! # Server doesn't support rollback, so just assume that
! # we can keep going and create the db. This should only
! # happen once, anyway.
! pass
self.create_bayes()
From anadelonbrin at users.sourceforge.net Tue Nov 9 23:09:51 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Tue Nov 9 23:09:54 2004
Subject: [Spambayes-checkins] spambayes/spambayes TestDriver.py,1.4,1.4.6.1
Message-ID:
Update of /cvsroot/spambayes/spambayes/spambayes
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv20570/spambayes
Modified Files:
Tag: release_1_0-branch
TestDriver.py
Log Message:
Backport:
TestDriver: If show_histograms was False, then the global ham/spam histogram never had the stats computed, but this gets used later, so the script would die with an AtrributeError. Fix that.
Index: TestDriver.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/spambayes/TestDriver.py,v
retrieving revision 1.4
retrieving revision 1.4.6.1
diff -C2 -d -r1.4 -r1.4.6.1
*** TestDriver.py 5 Sep 2003 01:15:28 -0000 1.4
--- TestDriver.py 9 Nov 2004 22:09:48 -0000 1.4.6.1
***************
*** 206,209 ****
--- 206,211 ----
besthamcut = options["Categorization", "ham_cutoff"]
bestspamcut = options["Categorization", "spam_cutoff"]
+ self.global_ham_hist.compute_stats()
+ self.global_spam_hist.compute_stats()
nham = self.global_ham_hist.n
nspam = self.global_spam_hist.n
From anadelonbrin at users.sourceforge.net Tue Nov 9 23:27:27 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Tue Nov 9 23:27:29 2004
Subject: [Spambayes-checkins] spambayes/scripts sb_imapfilter.py, 1.30,
1.30.4.1
Message-ID:
Update of /cvsroot/spambayes/spambayes/scripts
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv24210/scripts
Modified Files:
Tag: release_1_0-branch
sb_imapfilter.py
Log Message:
Backport:
Fix [ 959937 ] "Invalid server" message not always correct
Index: sb_imapfilter.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/scripts/sb_imapfilter.py,v
retrieving revision 1.30
retrieving revision 1.30.4.1
diff -C2 -d -r1.30 -r1.30.4.1
*** sb_imapfilter.py 3 May 2004 02:12:32 -0000 1.30
--- sb_imapfilter.py 9 Nov 2004 22:27:23 -0000 1.30.4.1
***************
*** 203,212 ****
try:
BaseIMAP.__init__(self, server, port)
! except:
! # A more specific except would be good here, but I get
! # (in Python 2.2) a generic 'error' and a 'gaierror'
! # if I pass a valid domain that isn't an IMAP server
! # or invalid domain (respectively)
! print "Invalid server or port, please check these settings."
sys.exit(-1)
self.debug = debug
--- 203,208 ----
try:
BaseIMAP.__init__(self, server, port)
! except (BaseIMAP.error, socket.gaierror, socket.error):
! print "Cannot connect to server %s on port %s" % (server, port)
sys.exit(-1)
self.debug = debug
From anadelonbrin at users.sourceforge.net Tue Nov 9 23:38:03 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Tue Nov 9 23:38:06 2004
Subject: [Spambayes-checkins]
spambayes/windows/py2exe setup_all.py, 1.17.4.2, 1.17.4.3
Message-ID:
Update of /cvsroot/spambayes/spambayes/windows/py2exe
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv26721/windows/py2exe
Modified Files:
Tag: release_1_0-branch
setup_all.py
Log Message:
Backport:
Fix [941639] and [986353]. Use a non-standard extension for our py2exe created zip to get around Windows extensions that automatically expand zip files.
Index: setup_all.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/windows/py2exe/setup_all.py,v
retrieving revision 1.17.4.2
retrieving revision 1.17.4.3
diff -C2 -d -r1.17.4.2 -r1.17.4.3
*** setup_all.py 26 Jun 2004 03:38:41 -0000 1.17.4.2
--- setup_all.py 9 Nov 2004 22:37:47 -0000 1.17.4.3
***************
*** 161,164 ****
data_files = outlook_data_files + proxy_data_files + common_data_files,
options = {"py2exe" : py2exe_options},
! zipfile = "lib/spambayes.zip",
)
--- 161,164 ----
data_files = outlook_data_files + proxy_data_files + common_data_files,
options = {"py2exe" : py2exe_options},
! zipfile = "lib/spambayes.modules",
)
From anadelonbrin at users.sourceforge.net Tue Nov 9 23:41:16 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Tue Nov 9 23:41:18 2004
Subject: [Spambayes-checkins] spambayes/spambayes Version.py, 1.31.4.2,
1.31.4.3
Message-ID:
Update of /cvsroot/spambayes/spambayes/spambayes
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv27449/spambayes
Modified Files:
Tag: release_1_0-branch
Version.py
Log Message:
Backport:
For proxy handler for version checking, the proxy port needs to be an integer, not a string.
Index: Version.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/spambayes/Version.py,v
retrieving revision 1.31.4.2
retrieving revision 1.31.4.3
diff -C2 -d -r1.31.4.2 -r1.31.4.3
*** Version.py 8 Jul 2004 23:51:24 -0000 1.31.4.2
--- Version.py 9 Nov 2004 22:41:13 -0000 1.31.4.3
***************
*** 134,144 ****
if ':' in server:
server, port = server.split(':', 1)
else:
port = 8080
! username = options["globals", "proxy_username"]
! password = options["globals", "proxy_password"]
proxy_support = urllib2.ProxyHandler({"http" :
! "http://%s:%s@%s:%d" % \
! (username, password, server,
port)})
opener = urllib2.build_opener(proxy_support, urllib2.HTTPHandler)
--- 134,149 ----
if ':' in server:
server, port = server.split(':', 1)
+ port = int(port)
else:
port = 8080
! if options["globals", "proxy_username"]:
! user_pass_string = "%s:%s" % \
! (options["globals", "proxy_username"],
! options["globals", "proxy_password"])
! else:
! user_pass_string = ""
proxy_support = urllib2.ProxyHandler({"http" :
! "http://%s@%s:%d" % \
! (user_pass_string, server,
port)})
opener = urllib2.build_opener(proxy_support, urllib2.HTTPHandler)
From anadelonbrin at users.sourceforge.net Tue Nov 9 23:49:00 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Tue Nov 9 23:49:04 2004
Subject: [Spambayes-checkins] spambayes/scripts sb_imapfilter.py, 1.30.4.1,
1.30.4.2
Message-ID:
Update of /cvsroot/spambayes/spambayes/scripts
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv29194/scripts
Modified Files:
Tag: release_1_0-branch
sb_imapfilter.py
Log Message:
Backport:
imapfilter: Quote the search string that tries to find the message again that was just saved.
Index: sb_imapfilter.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/scripts/sb_imapfilter.py,v
retrieving revision 1.30.4.1
retrieving revision 1.30.4.2
diff -C2 -d -r1.30.4.1 -r1.30.4.2
*** sb_imapfilter.py 9 Nov 2004 22:27:23 -0000 1.30.4.1
--- sb_imapfilter.py 9 Nov 2004 22:48:41 -0000 1.30.4.2
***************
*** 520,526 ****
# have to use it for IMAP operations.
imap.SelectFolder(self.folder.name)
! response = imap.uid("SEARCH", "(UNDELETED HEADER " + \
! options["Headers", "mailid_header_name"] + \
! " " + self.id + ")")
self._check(response, 'search')
new_id = response[1][0]
--- 520,526 ----
# have to use it for IMAP operations.
imap.SelectFolder(self.folder.name)
! response = imap.uid("SEARCH", "(UNDELETED HEADER %s \"%s\")" % \
! (options["Headers", "mailid_header_name"],
! self.id.replace('\\',r'\\').replace('"',r'\"')))
self._check(response, 'search')
new_id = response[1][0]
From anadelonbrin at users.sourceforge.net Tue Nov 9 23:51:09 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Tue Nov 9 23:51:11 2004
Subject: [Spambayes-checkins] spambayes/scripts sb_mboxtrain.py, 1.11.4.2,
1.11.4.3
Message-ID:
Update of /cvsroot/spambayes/spambayes/scripts
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv29812/scripts
Modified Files:
Tag: release_1_0-branch
sb_mboxtrain.py
Log Message:
Backport:
Fix [ 831864 ] sb_mboxtrain.py: flock vs. lockf
Index: sb_mboxtrain.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/scripts/sb_mboxtrain.py,v
retrieving revision 1.11.4.2
retrieving revision 1.11.4.3
diff -C2 -d -r1.11.4.2 -r1.11.4.3
*** sb_mboxtrain.py 15 Oct 2004 05:45:41 -0000 1.11.4.2
--- sb_mboxtrain.py 9 Nov 2004 22:51:06 -0000 1.11.4.3
***************
*** 210,214 ****
raise
! fcntl.lockf(f, fcntl.LOCK_UN)
f.close()
if loud:
--- 210,214 ----
raise
! fcntl.flock(f, fcntl.LOCK_UN)
f.close()
if loud:
From anadelonbrin at users.sourceforge.net Tue Nov 9 23:53:04 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Tue Nov 9 23:53:07 2004
Subject: [Spambayes-checkins] spambayes/scripts sb_dbexpimp.py, 1.12.4.1,
1.12.4.2
Message-ID:
Update of /cvsroot/spambayes/spambayes/scripts
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv30099/scripts
Modified Files:
Tag: release_1_0-branch
sb_dbexpimp.py
Log Message:
Backport:
Fix [ 1022848 ] sb_dbexpimp.py crashes while importing into pickle file
Index: sb_dbexpimp.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/scripts/sb_dbexpimp.py,v
retrieving revision 1.12.4.1
retrieving revision 1.12.4.2
diff -C2 -d -r1.12.4.1 -r1.12.4.2
*** sb_dbexpimp.py 10 Jun 2004 05:17:12 -0000 1.12.4.1
--- sb_dbexpimp.py 9 Nov 2004 22:53:02 -0000 1.12.4.2
***************
*** 230,234 ****
print "Finished storing database"
! if useDBM:
words = bayes.db.keys()
words.remove(bayes.statekey)
--- 230,234 ----
print "Finished storing database"
! if useDBM == "dbm" or useDBM == True:
words = bayes.db.keys()
words.remove(bayes.statekey)
***************
*** 250,254 ****
sys.exit()
! useDBM = False
newDBM = True
dbFN = None
--- 250,254 ----
sys.exit()
! useDBM = "pickle"
newDBM = True
dbFN = None
From anadelonbrin at users.sourceforge.net Tue Nov 9 23:53:58 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Tue Nov 9 23:54:02 2004
Subject: [Spambayes-checkins] spambayes CHANGELOG.txt,1.44.4.2,1.44.4.3
Message-ID:
Update of /cvsroot/spambayes/spambayes
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv30267
Modified Files:
Tag: release_1_0-branch
CHANGELOG.txt
Log Message:
Bring up-to-date.
Index: CHANGELOG.txt
===================================================================
RCS file: /cvsroot/spambayes/spambayes/CHANGELOG.txt,v
retrieving revision 1.44.4.2
retrieving revision 1.44.4.3
diff -C2 -d -r1.44.4.2 -r1.44.4.3
*** CHANGELOG.txt 19 Jul 2004 03:21:45 -0000 1.44.4.2
--- CHANGELOG.txt 9 Nov 2004 22:53:55 -0000 1.44.4.3
***************
*** 1,4 ****
--- 1,26 ----
[Note that all dates are in English, not American format - i.e. day/month/year]
+ Release 1.0.1
+ =============
+ Tony Meyer 03/11/2004 Fix [ 1022848 ] sb_dbexpimp.py crashes while importing into pickle file
+ Tony Meyer 03/11/2004 Fix [ 831864 ] sb_mboxtrain.py: flock vs. lockf
+ Tony Meyer 03/11/2004 Fix [ 922063 ] Intermittent sb_filter.py failure with URL pickle
+ Tony Meyer 29/09/2004 Fix [ 1036601 ] typo on advanced config web page
+ Tony Meyer 28/10/2004 Add [ 715248 ] Pickle classifier should save to a temp file first
+ Tony Meyer 21/10/2004 Fix [ 1051081 ] uncaught socket timeoutexception slurping URLs
+ Tony Meyer 18/10/2004 TestDriver: If show_histograms was False, then the global ham/spam histogram never had the stats computed, but this gets used later, so the script would die with an AtrributeError. Fix that.
+ Tony Meyer 13/10/2004 Fix mySQL storage option for the case where the server does not support rollbacks.
+ Sjoerd Mullender 02/10/2004 imapfilter: Quote the search string that tries to find the message again that was just saved.
+ Tony Meyer 30/09/2004 Fix [ 903905 ] IMAP Configuration Error
+ Tony Meyer 15/09/2004 sb_upload: Clarify docstring so that it's more clear what this script does. The -n / --null command line option didn't actually do anything; change it so that it does.
+ Tony Meyer 23/07/2004 For proxy handler for version checking, the proxy port needs to be an integer, not a string.
+ Tony Meyer 19/07/2004 Fix [ 990700 ] Changes to asyncore in Python 2.4 break ServerLineReader
+ Kenny Pitt 17/07/2004 Fix [941639] and [986353]. Use a non-standard extension for our py2exe created zip to get around Windows extensions that automatically expand zip files.
+ Tony Meyer 14/07/2004 Fix [ 790757 ] signal handler created with wrong # of args
+ Tony Meyer 14/07/2004 Fix [ 944109 ] notate_to/subject option valid values should be dynamic
+ Tony Meyer 14/07/2004 Fix [ 959937 ] "Invalid server" message not always correct
+ Skip Montanaro 10/07/2004 tte.py: 2.3 compatibility: add reversed() function
+ Tony Meyer 09/07/2004 Using -u with sb_server had been broken. Fix this.
+
1.0 Final
=========
From anadelonbrin at users.sourceforge.net Wed Nov 10 23:08:47 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Wed Nov 10 23:08:52 2004
Subject: [Spambayes-checkins] spambayes/windows spambayes.iss,1.17,1.18
Message-ID:
Update of /cvsroot/spambayes/spambayes/windows
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv12448/windows
Modified Files:
spambayes.iss
Log Message:
I emailed spambayes-dev about this on Oct 22, but then clean forgot about checking
the change in! Thankfully a spambayes@python.org message reminded me, or this bug
would have made it through to 1.0.1...
It appears that our installers aren't offering to create a startup icon for
sb_server users as it should be.
The problem is the "Check: InstallingProxy" line in the Inno script.
Although the selection code has been run by the time the tasks are offered,
it still has the default (False) value. I've played around with the code,
but can't figure a way around this (although my Pascal is extremely rusty). Maybe
it's an Inno bug or something.
We can fix it by removing that Check. Outlook users don't get that page
anyway, so they won't see the option (this is what happens with the desktop
icon). However, since we want it checked by default, it will appear in the
text box of additional tasks, even though it doesn't happen. I say this doesn't matter,
and the vast quiet on spambayes-dev tells me people either agree or don't care .
Index: spambayes.iss
===================================================================
RCS file: /cvsroot/spambayes/spambayes/windows/spambayes.iss,v
retrieving revision 1.17
retrieving revision 1.18
diff -C2 -d -r1.17 -r1.18
*** spambayes.iss 10 Jun 2004 04:38:26 -0000 1.17
--- spambayes.iss 10 Nov 2004 22:08:44 -0000 1.18
***************
*** 53,57 ****
[Tasks]
! Name: startup; Description: "Execute SpamBayes each time Windows starts"; Check: InstallingProxy
Name: desktop; Description: "Add an icon to the desktop"; Flags: unchecked;
--- 53,57 ----
[Tasks]
! Name: startup; Description: "Execute SpamBayes each time Windows starts";
Name: desktop; Description: "Add an icon to the desktop"; Flags: unchecked;
From anadelonbrin at users.sourceforge.net Wed Nov 10 23:15:39 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Wed Nov 10 23:15:43 2004
Subject: [Spambayes-checkins] spambayes/windows spambayes.iss, 1.15.4.3,
1.15.4.4
Message-ID:
Update of /cvsroot/spambayes/spambayes/windows
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv13954/windows
Modified Files:
Tag: release_1_0-branch
spambayes.iss
Log Message:
Backport fix for installation of startup icon.
Index: spambayes.iss
===================================================================
RCS file: /cvsroot/spambayes/spambayes/windows/spambayes.iss,v
retrieving revision 1.15.4.3
retrieving revision 1.15.4.4
diff -C2 -d -r1.15.4.3 -r1.15.4.4
*** spambayes.iss 21 Sep 2004 08:04:39 -0000 1.15.4.3
--- spambayes.iss 10 Nov 2004 22:15:37 -0000 1.15.4.4
***************
*** 5,11 ****
[Setup]
; Version specific constants
! AppVerName=SpamBayes 1.0
! AppVersion=1.0
! OutputBaseFilename=spambayes-1.0
; Normal constants. Be careful about changing 'AppName'
AppName=SpamBayes
--- 5,11 ----
[Setup]
; Version specific constants
! AppVerName=SpamBayes 1.0.1
! AppVersion=1.0.1
! OutputBaseFilename=spambayes-1.0.1
; Normal constants. Be careful about changing 'AppName'
AppName=SpamBayes
***************
*** 53,57 ****
[Tasks]
! Name: startup; Description: "Execute SpamBayes each time Windows starts"; Check: InstallingProxy
Name: desktop; Description: "Add an icon to the desktop"; Flags: unchecked;
--- 53,57 ----
[Tasks]
! Name: startup; Description: "Execute SpamBayes each time Windows starts";
Name: desktop; Description: "Add an icon to the desktop"; Flags: unchecked;
***************
*** 118,122 ****
'If this message persists, you may need to log off from Windows, and try again.'
Result := CheckNoAppMutex('InternetMailTransport', closeit);
- end;
// And finally, the SpamBayes server
if Result then begin
--- 118,121 ----
***************
*** 149,153 ****
Prompts, Values: array of String;
begin
-
// First open the custom wizard page
ScriptDlgPageOpen();
--- 148,151 ----
From anadelonbrin at users.sourceforge.net Thu Nov 11 02:47:03 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Thu Nov 11 02:47:06 2004
Subject: [Spambayes-checkins] spambayes/src sb_bnfilter.c,1.1,1.2
Message-ID:
Update of /cvsroot/spambayes/spambayes/src
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv27946/src
Added Files:
sb_bnfilter.c
Log Message:
Merge Toby's bnfilter_in_c branch, since I can't see any reason why this can't be
in 1.1.
From anadelonbrin at users.sourceforge.net Thu Nov 11 02:48:45 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Thu Nov 11 02:48:47 2004
Subject: [Spambayes-checkins] spambayes MANIFEST.in,1.9,1.10
Message-ID:
Update of /cvsroot/spambayes/spambayes
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv28238
Modified Files:
MANIFEST.in
Log Message:
Add the new src directory (for c files) to the manifest.
Index: MANIFEST.in
===================================================================
RCS file: /cvsroot/spambayes/spambayes/MANIFEST.in,v
retrieving revision 1.9
retrieving revision 1.10
diff -C2 -d -r1.9 -r1.10
*** MANIFEST.in 5 Nov 2003 12:45:23 -0000 1.9
--- MANIFEST.in 11 Nov 2004 01:48:43 -0000 1.10
***************
*** 1,3 ****
--- 1,4 ----
recursive-include spambayes/resources *.html *.psp *.gif
+ recursive-include spambayes/src *.c
recursive-include spambayes *.py *.txt
recursive-include pspam *.py *.txt *.ini *.sh
From anadelonbrin at users.sourceforge.net Thu Nov 11 22:21:59 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Thu Nov 11 22:22:03 2004
Subject: [Spambayes-checkins] spambayes/Outlook2000 addin.py,1.136,1.137
Message-ID:
Update of /cvsroot/spambayes/spambayes/Outlook2000
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv17984/Outlook2000
Modified Files:
addin.py
Log Message:
Update timer checks to match what the dialog currently allows.
Correct addition to 'show spam clues' to give the right classification.
Fix indentation error that I introduced recently - sorry!
Index: addin.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/Outlook2000/addin.py,v
retrieving revision 1.136
retrieving revision 1.137
diff -C2 -d -r1.136 -r1.137
*** addin.py 8 Nov 2004 05:02:09 -0000 1.136
--- addin.py 11 Nov 2004 21:21:55 -0000 1.137
***************
*** 288,292 ****
elif start_delay < 0.4 or interval < 0.4:
too = "too often"
! elif start_delay > 30 or interval > 30:
too = "too infrequently"
if too:
--- 288,292 ----
elif start_delay < 0.4 or interval < 0.4:
too = "too often"
! elif start_delay > 60 or interval > 60:
too = "too infrequently"
if too:
***************
*** 463,467 ****
# people realise that it may not necessarily be the same, and will
# help diagnosing any 'wrong' scoring reported.
! original_score = msgstore_message.GetField(mgr.config.general.field_score_name)
if original_score >= mgr.config.filter.spam_threshold:
original_class = "spam"
--- 463,468 ----
# people realise that it may not necessarily be the same, and will
# help diagnosing any 'wrong' scoring reported.
! original_score = 100 * msgstore_message.GetField(\
! mgr.config.general.field_score_name)
if original_score >= mgr.config.filter.spam_threshold:
original_class = "spam"
***************
*** 475,479 ****
else:
push("When this message was last filtered, it was classified " \
! "as %s (it scored %d%%)." % (original_class, original_score*100))
# Report whether this message has been trained or not.
push("
\n")
--- 476,480 ----
else:
push("When this message was last filtered, it was classified " \
! "as %s (it scored %d%%)." % (original_class, original_score))
# Report whether this message has been trained or not.
push("
\n")
***************
*** 688,693 ****
# Must train before moving, else we lose the message!
subject = msgstore_message.GetSubject()
! print "Moving and spam training message '%s' - " % (subject,),
! TrainAsSpam(msgstore_message, self.manager, save_db = False)
# Do the new message state if necessary.
try:
--- 689,694 ----
# Must train before moving, else we lose the message!
subject = msgstore_message.GetSubject()
! print "Moving and spam training message '%s' - " % (subject,),
! TrainAsSpam(msgstore_message, self.manager, save_db = False)
# Do the new message state if necessary.
try:
***************
*** 751,756 ****
self.manager.score(msgstore_message))
# Must train before moving, else we lose the message!
! print "Recovering to folder '%s' and ham training message '%s' - " % (restore_folder.name, subject),
! TrainAsHam(msgstore_message, self.manager, save_db = False)
# Do the new message state if necessary.
try:
--- 752,757 ----
self.manager.score(msgstore_message))
# Must train before moving, else we lose the message!
! print "Recovering to folder '%s' and ham training message '%s' - " % (restore_folder.name, subject),
! TrainAsHam(msgstore_message, self.manager, save_db = False)
# Do the new message state if necessary.
try:
From kpitt at users.sourceforge.net Thu Nov 11 22:55:49 2004
From: kpitt at users.sourceforge.net (Kenny Pitt)
Date: Thu Nov 11 22:55:52 2004
Subject: [Spambayes-checkins] spambayes/Outlook2000/dialogs dialog_map.py,
1.40, 1.41
Message-ID:
Update of /cvsroot/spambayes/spambayes/Outlook2000/dialogs
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv25701/Outlook2000/dialogs
Modified Files:
dialog_map.py
Log Message:
Add a separate Statistics tab to make room for more detailed statistics.
Index: dialog_map.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/Outlook2000/dialogs/dialog_map.py,v
retrieving revision 1.40
retrieving revision 1.41
diff -C2 -d -r1.40 -r1.41
*** dialog_map.py 28 Oct 2004 04:29:00 -0000 1.40
--- dialog_map.py 11 Nov 2004 21:55:40 -0000 1.41
***************
*** 404,408 ****
(TabProcessor, "IDC_TAB",
"""IDD_GENERAL IDD_FILTER IDD_TRAINING
! IDD_ADVANCED"""),
(CommandButtonProcessor, "IDC_ABOUT_BTN", ShowAbout, ()),
),
--- 404,408 ----
(TabProcessor, "IDC_TAB",
"""IDD_GENERAL IDD_FILTER IDD_TRAINING
! IDD_STATISTICS IDD_ADVANCED"""),
(CommandButtonProcessor, "IDC_ABOUT_BTN", ShowAbout, ()),
),
***************
*** 473,476 ****
--- 473,479 ----
),
+ "IDD_STATISTICS" : (
+ (StatsProcessor, "IDC_STATISTICS"),
+ ),
"IDD_ADVANCED" : (
(BoolButtonProcessor, "IDC_BUT_TIMER_ENABLED", "Filter.timer_enabled",
***************
*** 481,485 ****
(EditNumberProcessor, "IDC_DELAY2_TEXT IDC_DELAY2_SLIDER", "Filter.timer_interval", 0, 10, 20, 60),
(BoolButtonProcessor, "IDC_INBOX_TIMER_ONLY", "Filter.timer_only_receive_folders"),
- (StatsProcessor, "IDC_STATISTICS"),
(CommandButtonProcessor, "IDC_SHOW_DATA_FOLDER", ShowDataFolder, ()),
(DialogCommand, "IDC_BUT_SHOW_DIAGNOSTICS", "IDD_DIAGNOSTIC"),
--- 484,487 ----
From kpitt at users.sourceforge.net Thu Nov 11 22:55:49 2004
From: kpitt at users.sourceforge.net (Kenny Pitt)
Date: Thu Nov 11 22:55:52 2004
Subject: [Spambayes-checkins] spambayes/Outlook2000/dialogs/resources
dialogs.h, 1.21, 1.22 dialogs.rc, 1.47, 1.48
Message-ID:
Update of /cvsroot/spambayes/spambayes/Outlook2000/dialogs/resources
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv25701/Outlook2000/dialogs/resources
Modified Files:
dialogs.h dialogs.rc
Log Message:
Add a separate Statistics tab to make room for more detailed statistics.
Index: dialogs.h
===================================================================
RCS file: /cvsroot/spambayes/spambayes/Outlook2000/dialogs/resources/dialogs.h,v
retrieving revision 1.21
retrieving revision 1.22
diff -C2 -d -r1.21 -r1.22
*** dialogs.h 29 Sep 2003 02:14:26 -0000 1.21
--- dialogs.h 11 Nov 2004 21:55:46 -0000 1.22
***************
*** 9,12 ****
--- 9,13 ----
#define IDD_FOLDER_SELECTOR 105
#define IDD_ADVANCED 106
+ #define IDD_STATISTICS 107
#define IDD_GENERAL 108
#define IDD_FILTER_SPAM 110
Index: dialogs.rc
===================================================================
RCS file: /cvsroot/spambayes/spambayes/Outlook2000/dialogs/resources/dialogs.rc,v
retrieving revision 1.47
retrieving revision 1.48
diff -C2 -d -r1.47 -r1.48
*** dialogs.rc 1 Oct 2004 14:37:37 -0000 1.47
--- dialogs.rc 11 Nov 2004 21:55:46 -0000 1.48
***************
*** 52,58 ****
"Button",BS_AUTOCHECKBOX | WS_TABSTOP,16,12,162,10
PUSHBUTTON "Diagnostics...",IDC_BUT_SHOW_DIAGNOSTICS,171,190,70,14
! GROUPBOX "Statistics",IDC_STATIC,7,125,234,58
LTEXT "some stats\nand some more\nline 3\nline 4\nline 5",
! IDC_STATISTICS,12,134,223,43,SS_SUNKEN
END
--- 52,65 ----
"Button",BS_AUTOCHECKBOX | WS_TABSTOP,16,12,162,10
PUSHBUTTON "Diagnostics...",IDC_BUT_SHOW_DIAGNOSTICS,171,190,70,14
! END
!
! IDD_STATISTICS DIALOGEX 0, 0, 248, 209
! STYLE DS_SETFONT | WS_CHILD
! CAPTION "Statistics"
! FONT 8, "Tahoma", 400, 0, 0x0
! BEGIN
! GROUPBOX "Statistics",IDC_STATIC,7,3,234,201
LTEXT "some stats\nand some more\nline 3\nline 4\nline 5",
! IDC_STATISTICS,12,12,223,186
END
From anadelonbrin at users.sourceforge.net Fri Nov 12 03:48:29 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Fri Nov 12 03:48:32 2004
Subject: [Spambayes-checkins] spambayes/spambayes/test test_sb_dbexpimp.py,
NONE, 1.1
Message-ID:
Update of /cvsroot/spambayes/spambayes/spambayes/test
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv30228/spambayes/test
Added Files:
test_sb_dbexpimp.py
Log Message:
Unit tests for the sb_dbexpimp.py script.
--- NEW FILE: test_sb_dbexpimp.py ---
# Test sb_dbexpimp script.
import os
import sys
import unittest
from spambayes.tokenizer import tokenize
from spambayes.storage import open_storage
from spambayes.storage import PickledClassifier, DBDictClassifier
import sb_test_support
sb_test_support.fix_sys_path()
import sb_dbexpimp
# We borrow the test messages that test_sb_server uses.
from test_sb_server import good1, spam1
# WARNING!
# If these files exist when running this test, they will be deleted.
TEMP_PICKLE_NAME = os.path.join(os.path.dirname(__file__), "temp.pik")
TEMP_CSV_NAME = os.path.join(os.path.dirname(__file__), "temp.csv")
TEMP_DBM_NAME = os.path.join(os.path.dirname(__file__), "temp.dbm")
class dbexpimpTest(unittest.TestCase):
def tearDown(self):
try:
os.remove(TEMP_PICKLE_NAME)
os.remove(TEMP_CSV_NAME)
os.remove(TEMP_DBM_NAME)
except OSError:
pass
def test_csv_import(self):
"""Check that we don't import the old object craft csv module."""
self.assert_(hasattr(sb_dbexpimp.csv, "reader"))
def test_pickle_export(self):
# Create a pickled classifier to export.
bayes = PickledClassifier(TEMP_PICKLE_NAME)
# Stuff some messages in it so it's not empty.
bayes.learn(tokenize(spam1), True)
bayes.learn(tokenize(good1), False)
# Save.
bayes.store()
# Export.
sb_dbexpimp.runExport(TEMP_PICKLE_NAME, "pickle", TEMP_CSV_NAME)
# Verify that the CSV holds all the original data (and, by using
# the CSV module to open it, that it is valid CSV data).
fp = open(TEMP_CSV_NAME, "rb")
reader = sb_dbexpimp.csv.reader(fp)
(nham, nspam) = reader.next()
self.assertEqual(int(nham), bayes.nham)
self.assertEqual(int(nspam), bayes.nspam)
for (word, hamcount, spamcount) in reader:
word = sb_dbexpimp.uunquote(word)
self.assert_(word in bayes._wordinfokeys())
wi = bayes._wordinfoget(word)
self.assertEqual(int(hamcount), wi.hamcount)
self.assertEqual(int(spamcount), wi.spamcount)
def test_dbm_export(self):
# Create a dbm classifier to export.
bayes = DBDictClassifier(TEMP_DBM_NAME)
# Stuff some messages in it so it's not empty.
bayes.learn(tokenize(spam1), True)
bayes.learn(tokenize(good1), False)
# Save & Close.
bayes.store()
bayes.close()
# Export.
sb_dbexpimp.runExport(TEMP_DBM_NAME, "dbm", TEMP_CSV_NAME)
# Reopen the original.
bayes = open_storage(TEMP_DBM_NAME, "dbm")
# Verify that the CSV holds all the original data (and, by using
# the CSV module to open it, that it is valid CSV data).
fp = open(TEMP_CSV_NAME, "rb")
reader = sb_dbexpimp.csv.reader(fp)
(nham, nspam) = reader.next()
self.assertEqual(int(nham), bayes.nham)
self.assertEqual(int(nspam), bayes.nspam)
for (word, hamcount, spamcount) in reader:
word = sb_dbexpimp.uunquote(word)
self.assert_(word in bayes._wordinfokeys())
wi = bayes._wordinfoget(word)
self.assertEqual(int(hamcount), wi.hamcount)
self.assertEqual(int(spamcount), wi.spamcount)
def test_import_to_pickle(self):
# Create a CSV file to import.
temp = open(TEMP_CSV_NAME, "wb")
temp.write("3,4\n")
csv_data = {"this":(2,1), "is":(0,1), "a":(3,4), 'test':(1,1),
"of":(1,0), "the":(1,2), "import":(3,1)}
for word, (ham, spam) in csv_data.items():
temp.write("%s,%s,%s\n" % (word, ham, spam))
temp.close()
sb_dbexpimp.runImport(TEMP_PICKLE_NAME, "pickle", True,
TEMP_CSV_NAME)
# Open the converted file and verify that it has all the data from
# the CSV file (and by opening it, that it is a valid pickle).
bayes = open_storage(TEMP_PICKLE_NAME, "pickle")
self.assertEqual(bayes.nham, 3)
self.assertEqual(bayes.nspam, 4)
for word, (ham, spam) in csv_data.items():
word = sb_dbexpimp.uquote(word)
self.assert_(word in bayes._wordinfokeys())
wi = bayes._wordinfoget(word)
self.assertEqual(wi.hamcount, ham)
self.assertEqual(wi.spamcount, spam)
def test_import_to_dbm(self):
# Create a CSV file to import.
temp = open(TEMP_CSV_NAME, "wb")
temp.write("3,4\n")
csv_data = {"this":(2,1), "is":(0,1), "a":(3,4), 'test':(1,1),
"of":(1,0), "the":(1,2), "import":(3,1)}
for word, (ham, spam) in csv_data.items():
temp.write("%s,%s,%s\n" % (word, ham, spam))
temp.close()
sb_dbexpimp.runImport(TEMP_DBM_NAME, "dbm", True, TEMP_CSV_NAME)
# Open the converted file and verify that it has all the data from
# the CSV file (and by opening it, that it is a valid dbm file).
bayes = open_storage(TEMP_DBM_NAME, "dbm")
self.assertEqual(bayes.nham, 3)
self.assertEqual(bayes.nspam, 4)
for word, (ham, spam) in csv_data.items():
word = sb_dbexpimp.uquote(word)
self.assert_(word in bayes._wordinfokeys())
wi = bayes._wordinfoget(word)
self.assertEqual(wi.hamcount, ham)
self.assertEqual(wi.spamcount, spam)
def suite():
suite = unittest.TestSuite()
for cls in (dbexpimpTest,
):
suite.addTest(unittest.makeSuite(cls))
return suite
if __name__=='__main__':
sb_test_support.unittest_main(argv=sys.argv + ['suite'])
From anadelonbrin at users.sourceforge.net Mon Nov 15 07:19:16 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Mon Nov 15 07:19:19 2004
Subject: [Spambayes-checkins] spambayes/spambayes/test test_sb_dbexpimp.py,
1.1, 1.2
Message-ID:
Update of /cvsroot/spambayes/spambayes/spambayes/test
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv13734/spambayes/test
Modified Files:
test_sb_dbexpimp.py
Log Message:
Add tests for merging.
Rather than just a comment in the script, ensure that the temp testing files don't
exist before running the test script.
Index: test_sb_dbexpimp.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/spambayes/test/test_sb_dbexpimp.py,v
retrieving revision 1.1
retrieving revision 1.2
diff -C2 -d -r1.1 -r1.2
*** test_sb_dbexpimp.py 12 Nov 2004 02:48:27 -0000 1.1
--- test_sb_dbexpimp.py 15 Nov 2004 06:19:14 -0000 1.2
***************
*** 15,25 ****
# We borrow the test messages that test_sb_server uses.
from test_sb_server import good1, spam1
- # WARNING!
- # If these files exist when running this test, they will be deleted.
TEMP_PICKLE_NAME = os.path.join(os.path.dirname(__file__), "temp.pik")
TEMP_CSV_NAME = os.path.join(os.path.dirname(__file__), "temp.csv")
TEMP_DBM_NAME = os.path.join(os.path.dirname(__file__), "temp.dbm")
class dbexpimpTest(unittest.TestCase):
--- 15,38 ----
# We borrow the test messages that test_sb_server uses.
+ # I doubt it really makes much difference, but if we wanted more than
+ # one message of each type (the tests should all handle this ok) then
+ # Richie's hammer.py script has code for generating any number of
+ # randomly composed email messages.
from test_sb_server import good1, spam1
TEMP_PICKLE_NAME = os.path.join(os.path.dirname(__file__), "temp.pik")
TEMP_CSV_NAME = os.path.join(os.path.dirname(__file__), "temp.csv")
TEMP_DBM_NAME = os.path.join(os.path.dirname(__file__), "temp.dbm")
+ # The chances of anyone having files with these names in the test
+ # directory is minute, but we don't want to wipe anything, so make
+ # sure that they don't already exist. Our tearDown code gets rid
+ # of our copies (whether the tests pass or fail) so they shouldn't
+ # be ours.
+ for fn in [TEMP_PICKLE_NAME, TEMP_CSV_NAME, TEMP_DBM_NAME]:
+ if os.path.exists(fn):
+ print fn, "already exists. Please remove this file before " \
+ "running these tests (a file by that name will be " \
+ "created and destroyed as part of the tests)."
+ sys.exit(1)
class dbexpimpTest(unittest.TestCase):
***************
*** 32,36 ****
pass
! def test_csv_import(self):
"""Check that we don't import the old object craft csv module."""
self.assert_(hasattr(sb_dbexpimp.csv, "reader"))
--- 45,49 ----
pass
! def test_csv_module_import(self):
"""Check that we don't import the old object craft csv module."""
self.assert_(hasattr(sb_dbexpimp.csv, "reader"))
***************
*** 132,135 ****
--- 145,232 ----
self.assertEqual(wi.spamcount, spam)
+ def test_merge_to_pickle(self):
+ # Create a pickled classifier to merge with.
+ bayes = PickledClassifier(TEMP_PICKLE_NAME)
+ # Stuff some messages in it so it's not empty.
+ bayes.learn(tokenize(spam1), True)
+ bayes.learn(tokenize(good1), False)
+ # Save.
+ bayes.store()
+ # Create a CSV file to import.
+ nham, nspam = 3,4
+ temp = open(TEMP_CSV_NAME, "wb")
+ temp.write("%d,%d\n" % (nham, nspam))
+ csv_data = {"this":(2,1), "is":(0,1), "a":(3,4), 'test':(1,1),
+ "of":(1,0), "the":(1,2), "import":(3,1)}
+ for word, (ham, spam) in csv_data.items():
+ temp.write("%s,%s,%s\n" % (word, ham, spam))
+ temp.close()
+ sb_dbexpimp.runImport(TEMP_PICKLE_NAME, "pickle", False,
+ TEMP_CSV_NAME)
+ # Open the converted file and verify that it has all the data from
+ # the CSV file (and by opening it, that it is a valid pickle),
+ # and the data from the original pickle.
+ bayes2 = open_storage(TEMP_PICKLE_NAME, "pickle")
+ self.assertEqual(bayes2.nham, nham + bayes.nham)
+ self.assertEqual(bayes2.nspam, nspam + bayes.nspam)
+ words = bayes._wordinfokeys()
+ words.extend(csv_data.keys())
+ for word in words:
+ word = sb_dbexpimp.uquote(word)
+ self.assert_(word in bayes2._wordinfokeys())
+ h, s = csv_data.get(word, (0,0))
+ wi = bayes._wordinfoget(word)
+ if wi:
+ h += wi.hamcount
+ s += wi.spamcount
+ wi2 = bayes2._wordinfoget(word)
+ self.assertEqual(h, wi2.hamcount)
+ self.assertEqual(s, wi2.spamcount)
+
+ def test_merge_to_dbm(self):
+ # Create a dbm classifier to merge with.
+ bayes = DBDictClassifier(TEMP_DBM_NAME)
+ # Stuff some messages in it so it's not empty.
+ bayes.learn(tokenize(spam1), True)
+ bayes.learn(tokenize(good1), False)
+ # Save data to check against.
+ original_nham = bayes.nham
+ original_nspam = bayes.nspam
+ original_data = {}
+ for key in bayes._wordinfokeys():
+ original_data[key] = bayes._wordinfoget(key)
+ # Save & Close.
+ bayes.store()
+ bayes.close()
+ # Create a CSV file to import.
+ nham, nspam = 3,4
+ temp = open(TEMP_CSV_NAME, "wb")
+ temp.write("%d,%d\n" % (nham, nspam))
+ csv_data = {"this":(2,1), "is":(0,1), "a":(3,4), 'test':(1,1),
+ "of":(1,0), "the":(1,2), "import":(3,1)}
+ for word, (ham, spam) in csv_data.items():
+ temp.write("%s,%s,%s\n" % (word, ham, spam))
+ temp.close()
+ sb_dbexpimp.runImport(TEMP_DBM_NAME, "dbm", False, TEMP_CSV_NAME)
+ # Open the converted file and verify that it has all the data from
+ # the CSV file (and by opening it, that it is a valid dbm file),
+ # and the data from the original dbm database.
+ bayes2 = open_storage(TEMP_DBM_NAME, "dbm")
+ self.assertEqual(bayes2.nham, nham + original_nham)
+ self.assertEqual(bayes2.nspam, nspam + original_nspam)
+ words = original_data.keys()[:]
+ words.extend(csv_data.keys())
+ for word in words:
+ word = sb_dbexpimp.uquote(word)
+ self.assert_(word in bayes2._wordinfokeys())
+ h, s = csv_data.get(word, (0,0))
+ wi = original_data.get(word, None)
+ if wi:
+ h += wi.hamcount
+ s += wi.spamcount
+ wi2 = bayes2._wordinfoget(word)
+ self.assertEqual(h, wi2.hamcount)
+ self.assertEqual(s, wi2.spamcount)
+
def suite():
From anadelonbrin at users.sourceforge.net Mon Nov 15 07:22:07 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Mon Nov 15 07:22:10 2004
Subject: [Spambayes-checkins] spambayes/scripts sb_dbexpimp.py,1.15,1.16
Message-ID:
Update of /cvsroot/spambayes/spambayes/scripts
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv14184/scripts
Modified Files:
sb_dbexpimp.py
Log Message:
Unittest is paying for itself already! (I really didn't expect to actually find a
bug!).
Because wordinfo might be just a cache with dbm classifiers, a merged import would
lose data for any 'singletons'. Need to use _wordinfoget() instead.
Also fail if the csv file doesn't exist that we are trying to import from rather than
keeping going, which made no sense.
And while I'm here, also stop bothering to remove the .dat and .dir files that dumbdbm
create (long time since they were supported), and remove the verbose flag, which
doesn't actually do anything.
Index: sb_dbexpimp.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/scripts/sb_dbexpimp.py,v
retrieving revision 1.15
retrieving revision 1.16
diff -C2 -d -r1.15 -r1.16
*** sb_dbexpimp.py 3 Nov 2004 02:49:30 -0000 1.15
--- sb_dbexpimp.py 15 Nov 2004 06:22:05 -0000 1.16
***************
*** 47,51 ****
-e : export
-i : import
- -v : verbose mode (some additional diagnostic messages)
-f: FN : flat file to export to or import from
-p: FN : name of pickled database file to use
--- 47,50 ----
***************
*** 177,198 ****
pass
- try:
- os.unlink(dbFN+".dat")
- except OSError:
- pass
-
- try:
- os.unlink(dbFN+".dir")
- except OSError:
- pass
-
bayes = spambayes.storage.open_storage(dbFN, useDBM)
! try:
! fp = open(inFN, 'rb')
! except IOError, e:
! if e.errno != errno.ENOENT:
! raise
!
rdr = csv.reader(fp)
(nham, nspam) = rdr.next()
--- 176,182 ----
pass
bayes = spambayes.storage.open_storage(dbFN, useDBM)
! fp = open(inFN, 'rb')
rdr = csv.reader(fp)
(nham, nspam) = rdr.next()
***************
*** 215,221 ****
word = uunquote(word)
! try:
! wi = bayes.wordinfo[word]
! except KeyError:
wi = bayes.WordInfoClass()
--- 199,206 ----
word = uunquote(word)
! # Can't use wordinfo[word] here, because wordinfo
! # is only a cache with dbm! Need to use _wordinfoget instead.
! wi = bayes._wordinfoget(word)
! if wi is None:
wi = bayes.WordInfoClass()
***************
*** 269,274 ****
elif opt == '-m':
newDBM = False
- elif opt == '-v':
- options["globals", "verbose"] = True
elif opt in ('-o', '--option'):
options.set_from_cmdline(arg, sys.stderr)
--- 254,257 ----
From anadelonbrin at users.sourceforge.net Wed Nov 17 01:01:23 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Wed Nov 17 01:01:26 2004
Subject: [Spambayes-checkins] spambayes/Outlook2000 addin.py,1.137,1.138
Message-ID:
Update of /cvsroot/spambayes/spambayes/Outlook2000
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv11031/Outlook2000
Modified Files:
addin.py
Log Message:
Fix bug identified by 'DUI-DWI'.
Not sure how this got past the testing, but the messageinfo database
uses '0' and '1' as keys, not 0 and 1, so showing clues for a trained
message would fail.
Index: addin.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/Outlook2000/addin.py,v
retrieving revision 1.137
retrieving revision 1.138
diff -C2 -d -r1.137 -r1.138
*** addin.py 11 Nov 2004 21:21:55 -0000 1.137
--- addin.py 17 Nov 2004 00:01:06 -0000 1.138
***************
*** 481,485 ****
trained_as = mgr.classifier_data.message_db.get(msgstore_message.searchkey)
push("This message has %sbeen trained%s." % \
! {0 : ("", " as ham"), 1 : ("", " as spam"), None : ("not ", "")}
[trained_as])
# Format the clues.
--- 481,485 ----
trained_as = mgr.classifier_data.message_db.get(msgstore_message.searchkey)
push("This message has %sbeen trained%s." % \
! {'0' : ("", " as ham"), '1' : ("", " as spam"), None : ("not ", "")}
[trained_as])
# Format the clues.
From anadelonbrin at users.sourceforge.net Mon Nov 22 01:02:46 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Mon Nov 22 01:02:49 2004
Subject: [Spambayes-checkins] spambayes/scripts sb_imapfilter.py,1.43,1.44
Message-ID:
Update of /cvsroot/spambayes/spambayes/scripts
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv15368/scripts
Modified Files:
sb_imapfilter.py
Log Message:
Fix typo found by Thomas Heller.
Switch from using msg.asTokens to msg.tokenize.
Index: sb_imapfilter.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/scripts/sb_imapfilter.py,v
retrieving revision 1.43
retrieving revision 1.44
diff -C2 -d -r1.43 -r1.44
*** sb_imapfilter.py 9 Nov 2004 02:30:33 -0000 1.43
--- sb_imapfilter.py 22 Nov 2004 00:02:28 -0000 1.44
***************
*** 520,524 ****
command = "append %s %s %s %s" % (self.folder.name, flgs, tme,
self.as_string)
! raise BadIMAPReponseError(command)
if self.previous_folder is None:
--- 520,524 ----
command = "append %s %s %s %s" % (self.folder.name, flgs, tme,
self.as_string)
! raise BadIMAPResponseError(command)
if self.previous_folder is None:
***************
*** 710,714 ****
continue
msg.delSBHeaders()
! classifier.unlearn(msg.asTokens(), not isSpam)
# Once the message has been untrained, it's training memory
--- 710,714 ----
continue
msg.delSBHeaders()
! classifier.unlearn(msg.tokenize(), not isSpam)
# Once the message has been untrained, it's training memory
***************
*** 723,727 ****
saved_headers = msg.currentSBHeaders()
msg.delSBHeaders()
! classifier.learn(msg.asTokens(), isSpam)
num_trained += 1
msg.RememberTrained(isSpam)
--- 723,727 ----
saved_headers = msg.currentSBHeaders()
msg.delSBHeaders()
! classifier.learn(msg.tokenize(), isSpam)
num_trained += 1
msg.RememberTrained(isSpam)
***************
*** 754,758 ****
# the errors and move it soon enough.
continue
! (prob, clues) = classifier.spamprob(msg.asTokens(),
evidence=True)
# Add headers and remember classification.
--- 754,758 ----
# the errors and move it soon enough.
continue
! (prob, clues) = classifier.spamprob(msg.tokenize(),
evidence=True)
# Add headers and remember classification.
From anadelonbrin at users.sourceforge.net Mon Nov 22 01:10:19 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Mon Nov 22 01:10:21 2004
Subject: [Spambayes-checkins] spambayes/contrib README,1.2,1.3
Message-ID:
Update of /cvsroot/spambayes/spambayes/contrib
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv17293/contrib
Modified Files:
README
Log Message:
Bring a bit more up-to-date.
Index: README
===================================================================
RCS file: /cvsroot/spambayes/spambayes/contrib/README,v
retrieving revision 1.2
retrieving revision 1.3
diff -C2 -d -r1.2 -r1.3
*** README 25 Mar 2004 19:53:15 -0000 1.2
--- README 22 Nov 2004 00:10:15 -0000 1.3
***************
*** 18,30 ****
mod_spambayes.py - Plugin for Amit Patel's proxy3 web proxy.
- mkzip.py - ???
-
spamcounts.py - print spam and ham counts and spam probability for a
messages for for select tokens
! sb_bnfilter.py - alternative to sb_filter that avoids re-initialising
! spambayes for consecutive requests using a short-lived server process.
! This is intended to give the performance advantages of sb_xmlrpcserver,
! without the administrative complications.
! sb_bnserver.py - component of sb_bnfilter.py
--- 18,29 ----
mod_spambayes.py - Plugin for Amit Patel's proxy3 web proxy.
spamcounts.py - print spam and ham counts and spam probability for a
messages for for select tokens
! findbest.py - Find the next "best" unsure message to train on.
! pycksum.py - A fuzzy checksum program designed for email messages.
!
! sb_culler.py - Andrew Dalke's POP3 culler.
!
! tte.py - A utility script for 'train to exhaustion'.
\ No newline at end of file
From anadelonbrin at users.sourceforge.net Mon Nov 22 01:11:55 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Mon Nov 22 01:11:58 2004
Subject: [Spambayes-checkins] spambayes/scripts sb_dbexpimp.py,1.16,1.17
Message-ID:
Update of /cvsroot/spambayes/spambayes/scripts
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv17695/scripts
Modified Files:
sb_dbexpimp.py
Log Message:
Update docstring.
Index: sb_dbexpimp.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/scripts/sb_dbexpimp.py,v
retrieving revision 1.16
retrieving revision 1.17
diff -C2 -d -r1.16 -r1.17
*** sb_dbexpimp.py 15 Nov 2004 06:22:05 -0000 1.16
--- sb_dbexpimp.py 22 Nov 2004 00:11:52 -0000 1.17
***************
*** 3,17 ****
"""sb_dbexpimp.py - Bayes database export/import
- Classes:
-
-
- Abstract:
-
This utility has the primary function of exporting and importing
! a spambayes database into/from a flat file. This is useful in a number
of scenarios.
! Platform portability of database - flat files can be exported and
! imported across platforms (winduhs and linux, for example)
Database implementation changes - databases can survive database
--- 3,12 ----
"""sb_dbexpimp.py - Bayes database export/import
This utility has the primary function of exporting and importing
! a spambayes database into/from a CSV file. This is useful in a number
of scenarios.
! Platform portability of database - CSV files can be exported and
! imported across platforms (Windows and Linux, for example).
Database implementation changes - databases can survive database
***************
*** 21,25 ****
Database reorganization - an export followed by an import reorgs an
existing database, improving performance, at least in
! some database implementations
Database sharing - it is possible to distribute particular databases
--- 16,20 ----
Database reorganization - an export followed by an import reorgs an
existing database, improving performance, at least in
! some database implementations.
Database sharing - it is possible to distribute particular databases
***************
*** 29,43 ****
Database merging - multiple databases can be merged into one quite
easily by specifying -m on an import. This will add the two database
! nham and nspams together (assuming the two databases do not share
! corpora) and for wordinfo conflicts, will add spamcount and hamcount
! together.
!
! Spambayes software release migration - an export can be executed before
! a release upgrade, as part of the installation script. Then, after the
! new software is installed, an import can be executed, which will
! effectively preserve existing training. This eliminates the need for
! retraining every time a release is installed.
!
! Others? I'm sure I haven't thought of everything...
Usage:
--- 24,29 ----
Database merging - multiple databases can be merged into one quite
easily by specifying -m on an import. This will add the two database
! nham and nspams together and for wordinfo conflicts, will add spamcount
! and hamcount together.
Usage:
***************
*** 60,66 ****
-h : help
Examples:
! Export pickled mybayes.db into mybayes.db.export as a csv flat file
sb_dbexpimp -e -p mybayes.db -f mybayes.db.export
--- 46,56 ----
-h : help
+ If neither -p nor -d is specified, then the values in your configuration
+ file (or failing that, the defaults) will be used. In this way, you may
+ convert to and from storage formats other than pickle and dbm.
+
Examples:
! Export pickled mybayes.db into mybayes.db.export as a CSV file
sb_dbexpimp -e -p mybayes.db -f mybayes.db.export
***************
*** 78,88 ****
sb_dbexpimp -i -d newbayes.db -f abayes.export
sb_dbexpimp -i -m -d newbayes.db -f bbayes.export
-
- To Do:
- o Suggestions?
-
"""
! # This module is part of the spambayes project, which is Copyright 2002
# The Python Software Foundation and is covered by the Python Software
# Foundation license.
--- 68,74 ----
sb_dbexpimp -i -d newbayes.db -f abayes.export
sb_dbexpimp -i -m -d newbayes.db -f bbayes.export
"""
! # This module is part of the spambayes project, which is Copyright 2002-5
# The Python Software Foundation and is covered by the Python Software
# Foundation license.
***************
*** 225,230 ****
-
-
if __name__ == '__main__':
--- 211,214 ----
From anadelonbrin at users.sourceforge.net Mon Nov 22 01:13:46 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Mon Nov 22 01:13:50 2004
Subject: [Spambayes-checkins] spambayes/scripts sb_filter.py,1.14,1.15
Message-ID:
Update of /cvsroot/spambayes/spambayes/scripts
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv18240/scripts
Modified Files:
sb_filter.py
Log Message:
Remove the "experimental" marking in the docstring for the training functions. Various
people have used these for some time, and I can't see anything in the code that is
particularly worrying. If I'm wrong and these should still be experiemental with
1.1, please let me know why and I'll try and modify the tests to remove concern.
Index: sb_filter.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/scripts/sb_filter.py,v
retrieving revision 1.14
retrieving revision 1.15
diff -C2 -d -r1.14 -r1.15
*** sb_filter.py 4 May 2004 13:02:51 -0000 1.14
--- sb_filter.py 22 Nov 2004 00:13:43 -0000 1.15
***************
*** 31,46 ****
filter (default if no processing options are given)
* -g
! [EXPERIMENTAL] (re)train as a good (ham) message
* -s
! [EXPERIMENTAL] (re)train as a bad (spam) message
* -t
! [EXPERIMENTAL] filter and train based on the result -- you must
make sure to untrain all mistakes later. Not recommended.
* -G
! [EXPERIMENTAL] untrain ham (only use if you've already trained
! this message)
* -S
! [EXPERIMENTAL] untrain spam (only use if you've already trained
! this message)
-o section:option:value
--- 31,44 ----
filter (default if no processing options are given)
* -g
! (re)train as a good (ham) message
* -s
! (re)train as a bad (spam) message
* -t
! filter and train based on the result -- you must
make sure to untrain all mistakes later. Not recommended.
* -G
! untrain ham (only use if you've already trained this message)
* -S
! untrain spam (only use if you've already trained this message)
-o section:option:value
From anadelonbrin at users.sourceforge.net Mon Nov 22 01:16:44 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Mon Nov 22 01:16:47 2004
Subject: [Spambayes-checkins] spambayes/scripts sb_pop3dnd.py,1.12,1.13
Message-ID:
Update of /cvsroot/spambayes/spambayes/scripts
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv18873/scripts
Modified Files:
sb_pop3dnd.py
Log Message:
Switch from using msg.asTokens to msg.tokenize.
Play nicer with win32 gui (preparation for a taskbar app for this script).
Don't use the deprecated 'strict' kwarg for email messages.
Add appropriate state createworkers function & call.
Modify to have the prepare/start/stop API that sb_server has, to make a taskbar app
more straightforward.
Index: sb_pop3dnd.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/scripts/sb_pop3dnd.py,v
retrieving revision 1.12
retrieving revision 1.13
diff -C2 -d -r1.12 -r1.13
*** sb_pop3dnd.py 9 Nov 2004 02:30:33 -0000 1.12
--- sb_pop3dnd.py 22 Nov 2004 00:16:39 -0000 1.13
***************
*** 72,75 ****
--- 72,76 ----
import thread
import getopt
+ import socket
import imaplib
import operator
***************
*** 85,89 ****
from twisted.internet import defer
from twisted.internet import reactor
- from twisted.internet import win32eventreactor
from twisted.internet.defer import maybeDeferred
from twisted.internet.protocol import ServerFactory
--- 86,89 ----
***************
*** 95,98 ****
--- 95,99 ----
from twisted.protocols.imap4 import IMailboxListener, collapseNestedLists
+ from spambayes import storage
from spambayes import message
from spambayes.Stats import Stats
***************
*** 227,234 ****
def train(self, classifier, isSpam):
if self.GetTrained() == (not isSpam):
! classifier.unlearn(self.asTokens(), not isSpam)
self.RememberTrained(None)
if self.GetTrained() is None:
! classifier.learn(self.asTokens(), isSpam)
self.RememberTrained(isSpam)
classifier.store()
--- 228,235 ----
def train(self, classifier, isSpam):
if self.GetTrained() == (not isSpam):
! classifier.unlearn(self.tokenize(), not isSpam)
self.RememberTrained(None)
if self.GetTrained() is None:
! classifier.learn(self.tokenize(), isSpam)
self.RememberTrained(isSpam)
classifier.store()
***************
*** 319,324 ****
if content is None:
return IMAPFileMessage(key, directory)
! msg = email.message_from_string(content, _class=IMAPFileMessage,
! strict=False)
msg.id = key
msg.file_name = key
--- 320,324 ----
if content is None:
return IMAPFileMessage(key, directory)
! msg = email.message_from_string(content, _class=IMAPFileMessage)
msg.id = key
msg.file_name = key
***************
*** 609,614 ****
'%s\r\nSee .\r\n' % (__doc__,)
date = imaplib.Time2Internaldate(time.time())[1:-1]
! msg = email.message_from_string(about, _class=IMAPMessage,
! strict=False)
msg.date = date
self.addMessage(msg)
--- 609,613 ----
'%s\r\nSee .\r\n' % (__doc__,)
date = imaplib.Time2Internaldate(time.time())[1:-1]
! msg = email.message_from_string(about, _class=IMAPMessage)
msg.date = date
self.addMessage(msg)
***************
*** 618,624 ****
self.addMessage(msg)
# XXX Add other messages here, for example
! # XXX one with a link to the configuration page
! # XXX (or maybe even the configuration page itself,
! # XXX in html!)
def isWriteable(self):
--- 617,621 ----
self.addMessage(msg)
# XXX Add other messages here, for example
! # XXX help and other documentation.
def isWriteable(self):
***************
*** 830,834 ****
_class=message.SBHeaderMessage)
# Now find the spam disposition and add the header.
! (prob, clues) = state.bayes.spamprob(msg.asTokens(),\
evidence=True)
--- 827,831 ----
_class=message.SBHeaderMessage)
# Now find the spam disposition and add the header.
! (prob, clues) = state.bayes.spamprob(msg.tokenize(),\
evidence=True)
***************
*** 908,911 ****
--- 905,917 ----
self.activeIMAPSessions = 0
+ def createWorkers(self):
+ """There aren't many workers in an IMAP State - most of the
+ work is done elsewhere. We do need to load the classifier,
+ though, and build the status strings."""
+ if not hasattr(self, "DBName"):
+ self.DBName, self.useDB = storage.database_type([])
+ self.bayes = storage.open_storage(self.DBName, self.useDB)
+ self.buildStatusStrings()
+
def buildServerStrings(self):
"""After the server details have been set up, this creates string
***************
*** 921,925 ****
# ===================================================================
! def setup():
# Setup state, server, boxes, trainers and account.
state.imap_port = options["imapserver", "port"]
--- 927,931 ----
# ===================================================================
! def prepare():
# Setup state, server, boxes, trainers and account.
state.imap_port = options["imapserver", "port"]
***************
*** 961,965 ****
unsure_box)
proxyListeners.append(listener)
! state.buildServerStrings()
def run():
--- 967,998 ----
unsure_box)
proxyListeners.append(listener)
! state.prepare()
!
! def start():
! assert state.prepared, "Must prepare before starting"
! # The asyncore stuff doesn't play nicely with twisted (or vice-versa),
! # so put them in separate threads.
! thread.start_new_thread(Dibbler.run, ())
! reactor.run()
!
! def stop():
! # Save the classifier, although that should not be necessary.
! state.bayes.store()
! # Explicitly closing the db is a good idea, though.
! state.bayes.close()
!
! # Stop the POP3 proxy.
! if state.proxyPorts:
! killer = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
! try:
! killer.connect(('localhost', state.proxyPorts[0][1]))
! killer.send('KILL\r\n')
! killer.close()
! except socket.error:
! # Well, we did our best to shut down gracefully. Warn the user
! # and just die when the thread we are in does.
! print "Could not shut down POP3 proxy gracefully."
! # Stop the IMAP4 server.
! reactor.stop()
def run():
***************
*** 986,995 ****
# Setup everything.
! setup()
- # Kick things off. The asyncore stuff doesn't play nicely
- # with twisted (or vice-versa), so put them in separate threads.
- thread.start_new_thread(Dibbler.run, ())
- reactor.run()
if __name__ == "__main__":
--- 1019,1027 ----
# Setup everything.
! prepare()
!
! # Kick things off.
! start()
if __name__ == "__main__":
From anadelonbrin at users.sourceforge.net Mon Nov 22 01:22:57 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Mon Nov 22 01:23:00 2004
Subject: [Spambayes-checkins] spambayes/spambayes/test .cvsignore, NONE,
1.1 test_message.py, NONE, 1.1 test_sb_filter.py, NONE,
1.1 test_sb_dbexpimp.py, 1.2, 1.3 test_sb_imapfilter.py, 1.5, 1.6
Message-ID:
Update of /cvsroot/spambayes/spambayes/spambayes/test
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv20443/spambayes/test
Modified Files:
test_sb_dbexpimp.py test_sb_imapfilter.py
Added Files:
.cvsignore test_message.py test_sb_filter.py
Log Message:
Ignore .pyc and .pyo and the _pop3proxy.log file.
Add tests for message.py and sb_filter.py
Use email.message_from_string not our versions in imapfilter test.
Add shells of SF bugs tests for imapfilter.
Fix removing of temp files in test for sb_dbexpimp.py
--- NEW FILE: .cvsignore ---
*.py[co]
_pop3proxy.log
--- NEW FILE: test_message.py ---
# Test spambayes.message module.
import os
import sys
import math
import email
import unittest
import sb_test_support
sb_test_support.fix_sys_path()
from spambayes.Options import options
from spambayes.tokenizer import tokenize
from spambayes.classifier import Classifier
from spambayes.message import MessageInfoDB, insert_exception_header
from spambayes.message import Message, SBHeaderMessage, MessageInfoPickle
# We borrow the test messages that test_sb_server uses.
# I doubt it really makes much difference, but if we wanted more than
# one message of each type (the tests should all handle this ok) then
# Richie's hammer.py script has code for generating any number of
# randomly composed email messages.
from test_sb_server import good1, spam1
TEMP_PICKLE_NAME = os.path.join(os.path.dirname(__file__), "temp.pik")
TEMP_DBM_NAME = os.path.join(os.path.dirname(__file__), "temp.dbm")
# The chances of anyone having files with these names in the test
# directory is minute, but we don't want to wipe anything, so make
# sure that they don't already exist. Our tearDown code gets rid
# of our copies (whether the tests pass or fail) so they shouldn't
# be ours.
for fn in [TEMP_PICKLE_NAME, TEMP_DBM_NAME]:
if os.path.exists(fn):
print fn, "already exists. Please remove this file before " \
"running these tests (a file by that name will be " \
"created and destroyed as part of the tests)."
sys.exit(1)
class MessageTest(unittest.TestCase):
def setUp(self):
self.msg = email.message_from_string(spam1, _class=Message)
def test_persistent_state(self):
self.assertEqual(self.msg.stored_attributes, ['c', 't'])
def test_initialisation(self):
self.assertEqual(self.msg.id, None)
self.assertEqual(self.msg.c, None)
self.assertEqual(self.msg.t, None)
def test_setId(self):
# Verify that you can't change the id.
self.msg.id = "test"
self.assertRaises(ValueError, self.msg.setId, "test2")
# Verify that you can't set the id to None.
self.msg.id = None
self.assertRaises(ValueError, self.msg.setId, None)
# Verify that id must be a string.
self.assertRaises(TypeError, self.msg.setId, 1)
self.assertRaises(TypeError, self.msg.setId, False)
self.assertRaises(TypeError, self.msg.setId, [])
id = "Test"
self.msg.setId(id)
self.assertEqual(self.msg.id, id)
# Check info db load_msg is called.
self.msg.id = None
saved = self.msg.message_info_db.load_msg
self.done = False
try:
self.msg.message_info_db.load_msg = self._fake_setState
self.msg.setId(id)
self.assertEqual(self.done, True)
finally:
self.msg.message_info_db.load_msg = saved
def test_getId(self):
self.assertEqual(self.msg.getId(), None)
id = "test"
self.msg.id = id
self.assertEqual(self.msg.getId(), id)
def test_tokenize(self):
toks = self.msg.tokenize()
self.assertEqual(tuple(tokenize(spam1)), tuple(toks))
def test_force_CRLF(self):
self.assert_('\r' not in good1)
lines = self.msg._force_CRLF(good1).split('\n')
for line in lines:
if line:
self.assert_(line.endswith('\r'))
def test_as_string_endings(self):
self.assert_('\r' not in spam1)
lines = self.msg.as_string().split('\n')
for line in lines:
if line:
self.assert_(line.endswith('\r'))
def _fake_setState(self, state):
self.done = True
def test_modified(self):
saved = self.msg.message_info_db.store_msg
try:
self.msg.message_info_db.store_msg = self._fake_setState
self.done = False
self.msg.modified()
self.assertEqual(self.done, False)
self.msg.id = "Test"
self.msg.modified()
self.assertEqual(self.done, True)
finally:
self.msg.message_info_db.store_msg = saved
def test_GetClassification(self):
self.msg.c = 's'
self.assertEqual(self.msg.GetClassification(),
options['Headers','header_spam_string'])
self.msg.c = 'h'
self.assertEqual(self.msg.GetClassification(),
options['Headers','header_ham_string'])
self.msg.c = 'u'
self.assertEqual(self.msg.GetClassification(),
options['Headers','header_unsure_string'])
self.msg.c = 'a'
self.assertEqual(self.msg.GetClassification(), None)
def test_RememberClassification(self):
self.msg.RememberClassification(options['Headers',
'header_spam_string'])
self.assertEqual(self.msg.c, 's')
self.msg.RememberClassification(options['Headers',
'header_ham_string'])
self.assertEqual(self.msg.c, 'h')
self.msg.RememberClassification(options['Headers',
'header_unsure_string'])
self.assertEqual(self.msg.c, 'u')
self.assertRaises(ValueError, self.msg.RememberClassification, "a")
# Check that self.msg.modified is called.
saved = self.msg.modified
self.done = False
try:
self.msg.modified = self._fake_modified
self.msg.RememberClassification(options['Headers',
'header_unsure_string'])
self.assertEqual(self.done, True)
finally:
self.msg.modified = saved
def _fake_modified(self):
self.done = True
def test_GetAndRememberTrained(self):
t = "test"
saved = self.msg.modified
self.done = False
try:
self.msg.modified = self._fake_modified
self.msg.RememberTrained(t)
self.assertEqual(self.done, True)
finally:
self.msg.modified = saved
self.assertEqual(self.msg.GetTrained(), t)
class SBHeaderMessageTest(unittest.TestCase):
def setUp(self):
self.msg = email.message_from_string(spam1, _class=SBHeaderMessage)
# Get a prob and some clues.
c = Classifier()
self.u_prob, clues = c.spamprob(tokenize(good1), True)
c.learn(tokenize(good1), False)
self.g_prob, clues = c.spamprob(tokenize(good1), True)
c.unlearn(tokenize(good1), False)
c.learn(tokenize(spam1), True)
self.s_prob, self.clues = c.spamprob(tokenize(spam1), True)
self.ham = options['Headers','header_ham_string']
self.spam = options['Headers','header_spam_string']
self.unsure = options['Headers','header_unsure_string']
self.to = "tony.meyer@gmail.com;ta-meyer@ihug.co.nz"
self.msg["to"] = self.to
def test_setIdFromPayload(self):
id = self.msg.setIdFromPayload()
self.assertEqual(id, None)
self.assertEqual(self.msg.id, None)
msgid = "test"
msg = "".join((options['Headers','mailid_header_name'], ": ",
msgid, "\r\n", good1))
msg = email.message_from_string(msg, _class=SBHeaderMessage)
id = msg.setIdFromPayload()
self.assertEqual(id, msgid)
self.assertEqual(msg.id, msgid)
def test_disposition_header_ham(self):
name = options['Headers','classification_header_name']
self.msg.addSBHeaders(self.g_prob, self.clues)
self.assertEqual(self.msg[name], self.ham)
self.assertEqual(self.msg.GetClassification(), self.ham)
def test_disposition_header_spam(self):
name = options['Headers','classification_header_name']
self.msg.addSBHeaders(self.s_prob, self.clues)
self.assertEqual(self.msg[name], self.spam)
self.assertEqual(self.msg.GetClassification(), self.spam)
def test_disposition_header_unsure(self):
name = options['Headers','classification_header_name']
self.msg.addSBHeaders(self.u_prob, self.clues)
self.assertEqual(self.msg[name], self.unsure)
self.assertEqual(self.msg.GetClassification(), self.unsure)
def test_score_header_off(self):
options['Headers','include_score'] = False
self.msg.addSBHeaders(self.g_prob, self.clues)
self.assertEqual(self.msg[options['Headers', 'score_header_name']],
None)
def test_score_header(self):
options['Headers','include_score'] = True
options["Headers", "header_score_digits"] = 21
options["Headers", "header_score_logarithm"] = False
self.msg.addSBHeaders(self.g_prob, self.clues)
self.assertEqual(self.msg[options['Headers', 'score_header_name']],
"%.21f" % (self.g_prob,))
def test_score_header_log(self):
options['Headers','include_score'] = True
options["Headers", "header_score_digits"] = 21
options["Headers", "header_score_logarithm"] = True
self.msg.addSBHeaders(self.s_prob, self.clues)
self.assert_(self.msg[options['Headers', 'score_header_name']].\
startswith("%.21f" % (self.s_prob,)))
self.assert_(self.msg[options['Headers', 'score_header_name']].\
endswith(" (%d)" % (-math.log10(1.0-self.s_prob),)))
def test_thermostat_header_off(self):
options['Headers','include_thermostat'] = False
self.msg.addSBHeaders(self.u_prob, self.clues)
self.assertEqual(self.msg[options['Headers',
'thermostat_header_name']], None)
def test_thermostat_header_unsure(self):
options['Headers','include_thermostat'] = True
self.msg.addSBHeaders(self.u_prob, self.clues)
self.assertEqual(self.msg[options['Headers',
'thermostat_header_name']],
"*****")
def test_thermostat_header_spam(self):
options['Headers','include_thermostat'] = True
self.msg.addSBHeaders(self.s_prob, self.clues)
self.assertEqual(self.msg[options['Headers',
'thermostat_header_name']],
"*********")
def test_thermostat_header_ham(self):
options['Headers','include_thermostat'] = True
self.msg.addSBHeaders(self.g_prob, self.clues)
self.assertEqual(self.msg[options['Headers',
'thermostat_header_name']], "")
def test_evidence_header(self):
options['Headers', 'include_evidence'] = True
options['Headers', 'clue_mailheader_cutoff'] = 0.5 # all
self.msg.addSBHeaders(self.g_prob, self.clues)
header = self.msg[options['Headers', 'evidence_header_name']]
header_clues = [s.split(':') for s in \
[s.strip() for s in header.split(';')]]
header_clues = dict([(":".join(clue[:-1])[1:-1], float(clue[-1])) \
for clue in header_clues])
for word, score in self.clues:
self.assert_(word in header_clues)
self.assertEqual(round(score, 2), header_clues[word])
def test_evidence_header_partial(self):
options['Headers', 'include_evidence'] = True
options['Headers', 'clue_mailheader_cutoff'] = 0.1
self.msg.addSBHeaders(self.g_prob, self.clues)
header = self.msg[options['Headers', 'evidence_header_name']]
header_clues = [s.split(':') for s in \
[s.strip() for s in header.split(';')]]
header_clues = dict([(":".join(clue[:-1])[1:-1], float(clue[-1])) \
for clue in header_clues])
for word, score in self.clues:
if score <= 0.1 or score >= 0.9:
self.assert_(word in header_clues)
self.assertEqual(round(score, 2), header_clues[word])
else:
self.assert_(word not in header_clues)
def test_evidence_header_empty(self):
options['Headers', 'include_evidence'] = True
options['Headers', 'clue_mailheader_cutoff'] = 0.0
self.msg.addSBHeaders(self.g_prob, self.clues)
header = self.msg[options['Headers','evidence_header_name']]
header_clues = [s.split(':') for s in \
[s.strip() for s in header.split(';')]]
header_clues = dict([(":".join(clue[:-1])[1:-1], float(clue[-1])) \
for clue in header_clues])
for word, score in self.clues:
if word == "*H*" or word == "*S*":
self.assert_(word in header_clues)
self.assertEqual(round(score, 2), header_clues[word])
else:
self.assert_(word not in header_clues)
def test_evidence_header_off(self):
options['Headers', 'include_evidence'] = False
self.msg.addSBHeaders(self.g_prob, self.clues)
self.assertEqual(self.msg[options['Headers',
'evidence_header_name']], None)
def test_notate_to_off(self):
options["Headers", "notate_to"] = ()
self.msg.addSBHeaders(self.g_prob, self.clues)
self.msg.addSBHeaders(self.u_prob, self.clues)
self.msg.addSBHeaders(self.s_prob, self.clues)
self.assertEqual(self.msg["To"], self.to)
def test_notate_to_ham(self):
options["Headers", "notate_to"] = (self.ham,)
self.msg.addSBHeaders(self.g_prob, self.clues)
disp, orig = self.msg["To"].split(';', 1)
self.assertEqual(orig, self.to)
self.assertEqual(disp, "%s@spambayes.invalid" % (self.ham,))
def test_notate_to_unsure(self):
options["Headers", "notate_to"] = (self.ham, self.unsure)
self.msg.addSBHeaders(self.u_prob, self.clues)
disp, orig = self.msg["To"].split(';', 1)
self.assertEqual(orig, self.to)
self.assertEqual(disp, "%s@spambayes.invalid" % (self.unsure,))
def test_notate_to_spam(self):
options["Headers", "notate_to"] = (self.ham, self.spam, self.unsure)
self.msg.addSBHeaders(self.s_prob, self.clues)
disp, orig = self.msg["To"].split(';', 1)
self.assertEqual(orig, self.to)
self.assertEqual(disp, "%s@spambayes.invalid" % (self.spam,))
def test_notate_subject_off(self):
subject = self.msg["Subject"]
options["Headers", "notate_subject"] = ()
self.msg.addSBHeaders(self.g_prob, self.clues)
self.msg.addSBHeaders(self.u_prob, self.clues)
self.msg.addSBHeaders(self.s_prob, self.clues)
self.assertEqual(self.msg["Subject"], subject)
def test_notate_subject_ham(self):
subject = self.msg["Subject"]
options["Headers", "notate_subject"] = (self.ham,)
self.msg.addSBHeaders(self.g_prob, self.clues)
disp, orig = self.msg["Subject"].split(',', 1)
self.assertEqual(orig, subject)
self.assertEqual(disp, self.ham)
def test_notate_subject_unsure(self):
subject = self.msg["Subject"]
options["Headers", "notate_subject"] = (self.ham, self.unsure)
self.msg.addSBHeaders(self.u_prob, self.clues)
disp, orig = self.msg["Subject"].split(',', 1)
self.assertEqual(orig, subject)
self.assertEqual(disp, self.unsure)
def test_notate_subject_spam(self):
subject = self.msg["Subject"]
options["Headers", "notate_subject"] = (self.ham, self.spam,
self.unsure)
self.msg.addSBHeaders(self.s_prob, self.clues)
disp, orig = self.msg["Subject"].split(',', 1)
self.assertEqual(orig, subject)
self.assertEqual(disp, self.spam)
def test_notate_to_changed(self):
saved_ham = options["Headers", "header_ham_string"]
notate_to = options.get_option("Headers", "notate_to")
saved_to = notate_to.allowed_values
try:
options["Headers", "header_ham_string"] = "bacon"
header_strings = (options["Headers", "header_ham_string"],
options["Headers", "header_spam_string"],
options["Headers", "header_unsure_string"])
notate_to = options.get_option("Headers", "notate_to")
notate_to.allowed_values = header_strings
self.ham = options["Headers", "header_ham_string"]
result = self.test_notate_to_ham()
# Just be sure that it's using the new value.
self.assertEqual(self.msg["To"].split(';', 1)[0],
"bacon@spambayes.invalid")
finally:
# If we leave these changed, then lots of other tests will
# fail.
options["Headers", "header_ham_string"] = saved_ham
self.ham = saved_ham
notate_to.allowed_values = saved_to
return result
def test_id_header(self):
options['Headers','add_unique_id'] = True
id = "test"
self.msg.id = id
self.msg.addSBHeaders(self.g_prob, self.clues)
self.assertEqual(self.msg[options['Headers',
'mailid_header_name']], id)
def test_id_header_off(self):
options['Headers','add_unique_id'] = False
id = "test"
self.msg.id = id
self.msg.addSBHeaders(self.g_prob, self.clues)
self.assertEqual(self.msg[options['Headers',
'mailid_header_name']], None)
def test_currentSBHeaders(self):
sbheaders = self.msg.currentSBHeaders()
self.assertEqual({}, sbheaders)
headers = {options['Headers', 'classification_header_name'] : '1',
options['Headers', 'mailid_header_name'] : '2',
options['Headers',
'classification_header_name'] + "-ID" : '3',
options['Headers', 'thermostat_header_name'] : '4',
options['Headers', 'evidence_header_name'] : '5',
options['Headers', 'score_header_name'] : '6',
options['Headers', 'trained_header_name'] : '7',
}
for name, val in headers.items():
self.msg[name] = val
sbheaders = self.msg.currentSBHeaders()
self.assertEqual(headers, sbheaders)
def test_delSBHeaders(self):
headers = (options['Headers', 'classification_header_name'],
options['Headers', 'mailid_header_name'],
options['Headers',
'classification_header_name'] + "-ID",
options['Headers', 'thermostat_header_name'],
options['Headers', 'evidence_header_name'],
options['Headers', 'score_header_name'],
options['Headers', 'trained_header_name'],)
for header in headers:
self.msg[header] = "test"
for header in headers:
self.assert_(header in self.msg.keys())
self.msg.delSBHeaders()
for header in headers:
self.assert_(header not in self.msg.keys())
class MessageInfoBaseTest(unittest.TestCase):
def setUp(self, fn=TEMP_PICKLE_NAME):
self.db = self.klass(fn, self.mode)
def test_mode(self):
self.assertEqual(self.mode, self.db.mode)
def test_load_msg_missing(self):
msg = email.message_from_string(good1, _class=Message)
msg.id = "Test"
dummy_values = "a", "b"
msg.c, msg.t = dummy_values
self.db.load_msg(msg)
self.assertEqual((msg.c, msg.t), dummy_values)
def test_load_msg_compat(self):
msg = email.message_from_string(good1, _class=Message)
msg.id = "Test"
dummy_values = "a", "b"
self.db.db[msg.id] = dummy_values
self.db.load_msg(msg)
self.assertEqual((msg.c, msg.t), dummy_values)
def test_load_msg(self):
msg = email.message_from_string(good1, _class=Message)
msg.id = "Test"
dummy_values = [('a', 1), ('b', 2)]
self.db.db[msg.id] = dummy_values
self.db.load_msg(msg)
for att, val in dummy_values:
self.assertEqual(getattr(msg, att), val)
def test_store_msg(self):
msg = email.message_from_string(good1, _class=Message)
msg.id = "Test"
saved = self.db.store
self.done = False
try:
self.db.store = self._fake_store
self.db.store_msg(msg)
finally:
self.db.store = saved
self.assertEqual(self.done, True)
correct = [(att, getattr(msg, att)) \
for att in msg.stored_attributes]
self.assertEqual(self.db.db[msg.id], correct)
def _fake_store(self):
self.done = True
def test_remove_msg(self):
msg = email.message_from_string(good1, _class=Message)
msg.id = "Test"
self.db.db[msg.id] = "test"
saved = self.db.store
self.done = False
try:
self.db.store = self._fake_store
self.db.remove_msg(msg)
finally:
self.db.store = saved
self.assertEqual(self.done, True)
self.assertRaises(KeyError, self.db.db.__getitem__, msg.id)
def test_load(self):
# Create a db to try and load.
data = {"1" : ('a', 'b', 'c'),
"2" : ('d', 'e', 'f'),
"3" : "test"}
for k, v in data.items():
self.db.db[k] = v
self.db.store()
fn = self.db.db_name
self.db.close()
db2 = self.klass(fn, self.mode)
try:
self.assertEqual(len(db2.db.keys()), len(data.keys()))
for k, v in data.items():
self.assertEqual(db2.db[k], v)
finally:
db2.close()
def test_load_new(self):
# Load from a non-existing db (i.e. create new).
self.assertEqual(self.db.db.keys(), [])
class MessageInfoPickleTest(MessageInfoBaseTest):
def setUp(self):
self.mode = 1
self.klass = MessageInfoPickle
MessageInfoBaseTest.setUp(self, TEMP_PICKLE_NAME)
def tearDown(self):
try:
os.remove(TEMP_PICKLE_NAME)
except OSError:
pass
def store(self):
if self.db is not None:
self.db.sync()
class MessageInfoDBTest(MessageInfoBaseTest):
def setUp(self):
self.mode = 'c'
self.klass = MessageInfoDB
MessageInfoBaseTest.setUp(self, TEMP_DBM_NAME)
def tearDown(self):
self.db.close()
try:
os.remove(TEMP_DBM_NAME)
except OSError:
pass
def store(self):
if self.db is not None:
self.db.sync()
def _fake_close(self):
self.done += 1
def test_close(self):
saved_db = self.db.db.close
saved_dbm = self.db.dbm.close
try:
self.done = 0
self.db.db.close = self._fake_close
self.db.dbm.close = self._fake_close
self.db.close()
self.assertEqual(self.done, 2)
finally:
# If we don't put these back (whatever happens), then
# the db isn't closed and can't be deleted in tearDown.
self.db.db.close = saved_db
self.db.dbm.close = saved_dbm
class UtilitiesTest(unittest.TestCase):
def _verify_details(self, details):
loc = details.find(__file__)
self.assertNotEqual(loc, -1)
loc = details.find("Exception: Test")
self.assertNotEqual(loc, -1)
def _verify_exception_header(self, msg, details):
msg = email.message_from_string(msg)
details = "\r\n.".join(details.strip().split('\n'))
headerName = 'X-Spambayes-Exception'
header = email.Header.Header(details, header_name=headerName)
self.assertEqual(msg[headerName].replace('\n', '\r\n'),
str(header).replace('\n', '\r\n'))
def test_insert_exception_header(self):
# Cause an exception to insert.
try:
raise Exception("Test")
except Exception:
pass
msg, details = insert_exception_header(good1)
self._verify_details(details)
self._verify_exception_header(msg, details)
def test_insert_exception_header_and_id(self):
# Cause an exception to insert.
try:
raise Exception("Test")
except Exception:
pass
id = "Message ID"
msg, details = insert_exception_header(good1, id)
self._verify_details(details)
self._verify_exception_header(msg, details)
# Check that ID header is inserted.
msg = email.message_from_string(msg)
headerName = options["Headers", "mailid_header_name"]
header = email.Header.Header(id, header_name=headerName)
self.assertEqual(msg[headerName], str(header).replace('\n', '\r\n'))
def suite():
suite = unittest.TestSuite()
for cls in (MessageTest,
SBHeaderMessageTest,
MessageInfoPickleTest,
MessageInfoDBTest,
UtilitiesTest,
):
suite.addTest(unittest.makeSuite(cls))
return suite
if __name__=='__main__':
sb_test_support.unittest_main(argv=sys.argv + ['suite'])
--- NEW FILE: test_sb_filter.py ---
# Test sb_filter script.
import os
import sys
import email
import unittest
import sb_test_support
sb_test_support.fix_sys_path()
from spambayes.Options import options
from spambayes.tokenizer import tokenize
from spambayes.storage import open_storage
import sb_filter
# We borrow the test messages that test_sb_server uses.
# I doubt it really makes much difference, but if we wanted more than
# one message of each type (the tests should all handle this ok) then
# Richie's hammer.py script has code for generating any number of
# randomly composed email messages.
from test_sb_server import good1, spam1
good1 = email.message_from_string(good1)
spam1 = email.message_from_string(spam1)
TEMP_DBM_NAME = os.path.join(os.path.dirname(__file__), "temp.dbm")
# The chances of anyone having a file with this name in the test
# directory is minute, but we don't want to wipe anything, so make
# sure that it doesn't already exist. Our tearDown code gets rid
# of our copy (whether the tests pass or fail) so it shouldn't
# be ours.
if os.path.exists(TEMP_DBM_NAME):
print TEMP_DBM_NAME, "already exists. Please remove this file " \
"before running these tests (a file by that name will be " \
"created and destroyed as part of the tests)."
sys.exit(1)
class HammieFilterTest(unittest.TestCase):
def setUp(self):
self.h = sb_filter.HammieFilter()
self.h.dbname = TEMP_DBM_NAME
self.h.usedb = "dbm"
def tearDown(self):
if self.h.h:
self.h.close()
try:
os.remove(TEMP_DBM_NAME)
except OSError:
pass
def _fake_store(self):
self.done = True
def test_open(self):
mode = 'c'
self.h.open(mode)
self.assertEqual(self.h.mode, mode)
# Check the underlying classifier exists.
self.assert_(self.h.h is not None)
# This can also be called when there is an
# existing classifier, but we want to change
# mode. Verify that we store the old database
# first if we were not in readonly mode.
self.done = False
self.h.h.store = self._fake_store
mode = 'r'
self.h.open(mode)
self.assertEqual(self.h.mode, mode)
self.assert_(self.done)
def test_close_readonly(self):
# Must open with 'c' first, because otherwise it doesn't exist.
self.h.open('c')
self.h.open('r')
self.done = False
self.h.h.store = self._fake_store
# Verify that the classifier is not stored if we are
# in readonly mode.
self.h.close()
self.assert_(not self.done)
self.assertEqual(self.h.h, None)
def test_close(self):
self.h.open('c')
self.done = False
self.h.h.store = self._fake_store
# Verify that the classifier is stored if we are
# not in readonly mode.
self.h.close()
self.assert_(self.done)
self.assertEqual(self.h.h, None)
def test_newdb(self):
# Create an existing classifier.
b = open_storage(TEMP_DBM_NAME, "dbm")
b.learn(tokenize(spam1), True)
b.learn(tokenize(good1), False)
b.store()
b.close()
# Create the fresh classifier.
self.h.newdb()
# Verify that the classifier isn't open.
self.assertEqual(self.h.h, None)
# Verify that any existing classifier with the same name
# is overwritten.
b = open_storage(TEMP_DBM_NAME, "dbm")
self.assertEqual(b.nham, 0)
self.assertEqual(b.nspam, 0)
b.close()
def test_filter(self):
# Verify that the msg has the classification header added.
self.h.open('c')
self.h.h.bayes.learn(tokenize(good1), False)
self.h.h.bayes.learn(tokenize(spam1), True)
self.h.h.store()
result = email.message_from_string(self.h.filter(spam1))
self.assert_(result[options["Headers",
"classification_header_name"]].\
startswith(options["Headers", "header_spam_string"]))
result = email.message_from_string(self.h.filter(good1))
self.assert_(result[options["Headers",
"classification_header_name"]].\
startswith(options["Headers", "header_ham_string"]))
def test_filter_train(self):
# Verify that the msg has the classification header
# added, and that it was correctly trained.
self.h.open('c')
self.h.h.bayes.learn(tokenize(good1), False)
self.h.h.bayes.learn(tokenize(spam1), True)
self.h.h.store()
result = email.message_from_string(self.h.filter_train(spam1))
self.assert_(result[options["Headers",
"classification_header_name"]].\
startswith(options["Headers", "header_spam_string"]))
self.assertEqual(self.h.h.bayes.nspam, 2)
result = email.message_from_string(self.h.filter_train(good1))
self.assert_(result[options["Headers",
"classification_header_name"]].\
startswith(options["Headers", "header_ham_string"]))
self.assertEqual(self.h.h.bayes.nham, 2)
def test_train_ham(self):
# Verify that the classifier gets trained with the message.
self.h.open('c')
self.h.train_ham(good1)
self.assertEqual(self.h.h.bayes.nham, 1)
self.assertEqual(self.h.h.bayes.nspam, 0)
for token in tokenize(good1):
wi = self.h.h.bayes._wordinfoget(token)
self.assertEqual(wi.hamcount, 1)
self.assertEqual(wi.spamcount, 0)
def test_train_spam(self):
# Verify that the classifier gets trained with the message.
self.h.open('c')
self.h.train_spam(spam1)
self.assertEqual(self.h.h.bayes.nham, 0)
self.assertEqual(self.h.h.bayes.nspam, 1)
for token in tokenize(spam1):
wi = self.h.h.bayes._wordinfoget(token)
self.assertEqual(wi.hamcount, 0)
self.assertEqual(wi.spamcount, 1)
def test_untrain_ham(self):
self.h.open('c')
# Put a message in the classifier to be removed.
self.h.h.bayes.learn(tokenize(good1), False)
# Verify that the classifier gets untrained with the message.
self.h.untrain_ham(good1)
self.assertEqual(self.h.h.bayes.nham, 0)
self.assertEqual(self.h.h.bayes.nspam, 0)
for token in tokenize(spam1):
wi = self.h.h.bayes._wordinfoget(token)
self.assertEqual(wi, None)
def test_untrain_spam(self):
self.h.open('c')
# Put a message in the classifier to be removed.
self.h.h.bayes.learn(tokenize(spam1), True)
# Verify that the classifier gets untrained with the message.
self.h.untrain_spam(spam1)
self.assertEqual(self.h.h.bayes.nham, 0)
self.assertEqual(self.h.h.bayes.nspam, 0)
for token in tokenize(spam1):
wi = self.h.h.bayes._wordinfoget(token)
self.assertEqual(wi, None)
def suite():
suite = unittest.TestSuite()
for cls in (HammieFilterTest,
):
suite.addTest(unittest.makeSuite(cls))
return suite
if __name__=='__main__':
sb_test_support.unittest_main(argv=sys.argv + ['suite'])
Index: test_sb_dbexpimp.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/spambayes/test/test_sb_dbexpimp.py,v
retrieving revision 1.2
retrieving revision 1.3
diff -C2 -d -r1.2 -r1.3
*** test_sb_dbexpimp.py 15 Nov 2004 06:19:14 -0000 1.2
--- test_sb_dbexpimp.py 22 Nov 2004 00:22:54 -0000 1.3
***************
*** 40,44 ****
--- 40,50 ----
try:
os.remove(TEMP_PICKLE_NAME)
+ except OSError:
+ pass
+ try:
os.remove(TEMP_CSV_NAME)
+ except OSError:
+ pass
+ try:
os.remove(TEMP_DBM_NAME)
except OSError:
Index: test_sb_imapfilter.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/spambayes/test/test_sb_imapfilter.py,v
retrieving revision 1.5
retrieving revision 1.6
diff -C2 -d -r1.5 -r1.6
*** test_sb_imapfilter.py 5 Nov 2004 02:36:25 -0000 1.5
--- test_sb_imapfilter.py 22 Nov 2004 00:22:55 -0000 1.6
***************
*** 3,6 ****
--- 3,7 ----
import sys
import time
+ import email
import types
import socket
***************
*** 13,21 ****
sb_test_support.fix_sys_path()
from spambayes import Dibbler
from spambayes.Options import options
from spambayes.classifier import Classifier
from sb_imapfilter import BadIMAPResponseError
- from spambayes.message import message_from_string
from sb_imapfilter import IMAPSession, IMAPMessage, IMAPFolder, IMAPFilter
--- 14,22 ----
sb_test_support.fix_sys_path()
+ from spambayes import message
from spambayes import Dibbler
from spambayes.Options import options
from spambayes.classifier import Classifier
from sb_imapfilter import BadIMAPResponseError
from sb_imapfilter import IMAPSession, IMAPMessage, IMAPFolder, IMAPFilter
***************
*** 543,547 ****
for msg in self.folder:
msg = msg.get_full_message()
! msg_correct = message_from_string(IMAP_MESSAGES[int(keys[0])])
id_header_name = options["Headers", "mailid_header_name"]
if msg_correct[id_header_name] is None:
--- 544,549 ----
for msg in self.folder:
msg = msg.get_full_message()
! msg_correct = email.message_from_string(IMAP_MESSAGES[int(keys[0])],
! _class=message.Message)
id_header_name = options["Headers", "mailid_header_name"]
if msg_correct[id_header_name] is None:
***************
*** 563,567 ****
self.assertEqual(msg1.id, SB_ID_1)
msg1 = msg1.get_full_message()
! msg1_correct = message_from_string(IMAP_MESSAGES[101])
self.assertNotEqual(msg1[id_header_name], None)
msg1_correct[id_header_name] = SB_ID_1
--- 565,570 ----
self.assertEqual(msg1.id, SB_ID_1)
msg1 = msg1.get_full_message()
! msg1_correct = email.message_from_string(IMAP_MESSAGES[101],
! message.Message)
self.assertNotEqual(msg1[id_header_name], None)
msg1_correct[id_header_name] = SB_ID_1
***************
*** 584,588 ****
msg3 = self.folder[104]
self.assertNotEqual(msg3[id_header_name], None)
! msg_correct = message_from_string(IMAP_MESSAGES[104])
msg_correct[id_header_name] = msg3.id
self.assertEqual(msg3.as_string(), msg_correct.as_string())
--- 587,592 ----
msg3 = self.folder[104]
self.assertNotEqual(msg3[id_header_name], None)
! msg_correct = email.message_from_string(IMAP_MESSAGES[104],
! message.Message)
msg_correct[id_header_name] = msg3.id
self.assertEqual(msg3.as_string(), msg_correct.as_string())
***************
*** 628,631 ****
--- 632,667 ----
+ class SFBugsTest(BaseIMAPFilterTest):
+ def test_802545(self):
+ # Test that the filter selects each folder before expunging,
+ # and that it was logged in in the first place.
+ pass
+
+ def test_816400(self):
+ # Test that bad dates don't cause an error in appending.
+ # (also sf #890645)
+ # e.g. 31-Dec-1969 16:00:18 +0100
+ # Date: Mon, 06 May 0102 10:51:16 -0100
+ # Date: Sat, 08 Jun 0102 19:44:54 -0700
+ # Date: 16 Mar 80 8:16:44 AM
+ pass
+
+ def test_818552(self):
+ # Test that, when saving, we remove the RECENT flag including
+ # the space after it.
+ pass
+
+ def test_842984(self):
+ # Confirm that if webbrowser.open_new() fails, we print a
+ # message saying "Please point your web browser at
+ # http://localhost:8880/" rather than bombing out.
+ pass
+
+ def test_886133(self):
+ # Check that folder names with characters not allowed in XML
+ # are correctly handled for the web interface.
+ pass
+
+
def suite():
suite = unittest.TestSuite()
***************
*** 634,637 ****
--- 670,674 ----
IMAPFolderTest,
IMAPFilterTest,
+ SFBugsTest,
):
suite.addTest(unittest.makeSuite(cls))
From anadelonbrin at users.sourceforge.net Mon Nov 22 01:26:47 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Mon Nov 22 01:26:50 2004
Subject: [Spambayes-checkins] spambayes/spambayes Options.py, 1.117,
1.118 storage.py, 1.43, 1.44
Message-ID:
Update of /cvsroot/spambayes/spambayes/spambayes
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv21225/spambayes
Modified Files:
Options.py storage.py
Log Message:
Add new storage types:
CBDClassifier
ZODBClassifier
ZEOClassifier
ZODB and ZEO need ZODB installed, obviously. ZODB seems to work, but I'm only 50%
sure that ZEO is working correctly. I'll keep working on this as I can.
Add code to allow persistent_storage_name to not be expanded into an absolute path
with certain storage types (e.g. the SQL ones).
Index: Options.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/spambayes/Options.py,v
retrieving revision 1.117
retrieving revision 1.118
diff -C2 -d -r1.117 -r1.118
*** Options.py 9 Nov 2004 02:37:41 -0000 1.117
--- Options.py 22 Nov 2004 00:26:44 -0000 1.118
***************
*** 518,522 ****
with the default.""",
# True == "dbm", False == "pickle", "True" == "dbm", "False" == "pickle"
! ("mysql", "pgsql", "dbm", "pickle", "True", "False", True, False), RESTORE),
("persistent_storage_file", "Storage file name", "hammie.db",
--- 518,522 ----
with the default.""",
# True == "dbm", False == "pickle", "True" == "dbm", "False" == "pickle"
! ("zeo", "zodb", "cdb", "mysql", "pgsql", "dbm", "pickle", "True", "False", True, False), RESTORE),
("persistent_storage_file", "Storage file name", "hammie.db",
Index: storage.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/spambayes/storage.py,v
retrieving revision 1.43
retrieving revision 1.44
diff -C2 -d -r1.43 -r1.44
*** storage.py 28 Oct 2004 05:11:19 -0000 1.43
--- storage.py 22 Nov 2004 00:26:44 -0000 1.44
***************
*** 8,11 ****
--- 8,14 ----
PGClassifier - Classifier that uses postgres
mySQLClassifier - Classifier that uses mySQL
+ CBDClassifier - Classifier that uses CDB
+ ZODBClassifier - Classifier that uses ZODB
+ ZEOClassifier - Classifier that uses ZEO
Trainer - Classifier training observer
SpamTrainer - Trainer for spam
***************
*** 36,40 ****
To Do:
- o ZODBClassifier
o Would Trainer.trainall really want to train with the whole corpus,
or just a random subset?
--- 39,42 ----
***************
*** 43,47 ****
'''
! # This module is part of the spambayes project, which is Copyright 2002
# The Python Software Foundation and is covered by the Python Software
# Foundation license.
--- 45,49 ----
'''
! # This module is part of the spambayes project, which is Copyright 2002-5
# The Python Software Foundation and is covered by the Python Software
# Foundation license.
***************
*** 71,74 ****
--- 73,77 ----
import errno
import shelve
+ from spambayes import cdb
from spambayes import dbmstorage
***************
*** 147,151 ****
except IOError, e:
if options["globals", "verbose"]:
! print 'Failed update: ' + str(e)
if fp is not None:
os.remove(tmp)
--- 150,154 ----
except IOError, e:
if options["globals", "verbose"]:
! print >> sys.stderr, 'Failed update: ' + str(e)
if fp is not None:
os.remove(tmp)
***************
*** 595,598 ****
--- 598,761 ----
+ class CDBClassifier(classifier.Classifier):
+ """A classifier that uses a CDB database.
+
+ A CDB wordinfo database is quite small and fast but is slow to update.
+ It is appropriate if training is done rarely (e.g. monthly or weekly
+ using archived ham and spam).
+ """
+ def __init__(self, db_name):
+ classifier.Classifier.__init__(self)
+ self.db_name = db_name
+ self.statekey = STATE_KEY
+ self.load()
+
+ def _WordInfoFactory(self, counts):
+ # For whatever reason, WordInfo's cannot be created with
+ # constructor ham/spam counts, so we do the work here.
+ # Since we're doing the work, we accept the ham/spam count
+ # in the form of a comma-delimited string, as that's what
+ # we get.
+ ham, spam = counts.split(',')
+ wi = classifier.WordInfo()
+ wi.hamcount = int(ham)
+ wi.spamcount = int(spam)
+ return wi
+
+ def load(self):
+ if os.path.exists(self.db_name):
+ db = open(self.db_name, "rb")
+ data = dict(cdb.Cdb(db))
+ db.close()
+ self.nham, self.nspam = [int(i) for i in \
+ data[self.statekey].split(',')]
+ self.wordinfo = dict([(k, self._WordInfoFactory(v)) \
+ for k, v in data.iteritems() \
+ if k != self.statekey])
+ if options["globals", "verbose"]:
+ print >> sys.stderr, ('%s is an existing CDB,'
+ ' with %d ham and %d spam') \
+ % (self.db_name, self.nham,
+ self.nspam)
+ else:
+ if options["globals", "verbose"]:
+ print >> sys.stderr, self.db_name, 'is a new CDB'
+ self.wordinfo = {}
+ self.nham = 0
+ self.nspam = 0
+
+ def store(self):
+ items = [(self.statekey, "%d,%d" % (self.nham, self.nspam))]
+ for word, wi in self.wordinfo.iteritems():
+ items.append((word, "%d,%d" % (wi.hamcount, wi.spamcount)))
+ db = open(self.db_name, "wb")
+ cdb.cdb_make(db, items)
+ db.close()
+
+ def close(self):
+ # We keep no resources open - nothing to do.
+ pass
+
+
+ # If ZODB isn't available, then this class won't be useable, but we
+ # still need to be able to import this module. So we pretend that all
+ # is ok.
+ try:
+ Persistent
+ except NameError:
+ Persistent = object
+ class _PersistentClassifier(classifier.Classifier, Persistent):
+ def __init__(self):
+ import ZODB
+ from BTrees.OOBTree import OOBTree
+
+ classifier.Classifier.__init__(self)
+ self.wordinfo = OOBTree()
+
+ class ZODBClassifier(object):
+ def __init__(self, db_name):
+ self.statekey = STATE_KEY
+ self.db_name = db_name
+ self.load()
+
+ def __getattr__(self, att):
+ # We pretend that we are a classifier subclass.
+ if hasattr(self.classifier, att):
+ return getattr(self.classifier, att)
+ raise AttributeError("ZODBClassifier object has no attribute '%s'"
+ % (att,))
+
+ def __setattr__(self, att, value):
+ # For some attributes, we change the classifier instead.
+ if att in ["nham", "nspam"]:
+ setattr(self.classifier, att, value)
+ else:
+ object.__setattr__(self, att, value)
+
+ def create_storage(self):
+ import ZODB
+ from ZODB.FileStorage import FileStorage
+ self.storage = FileStorage(self.db_name)
+
+ def load(self):
+ import ZODB
+ self.create_storage()
+ self.db = ZODB.DB(self.storage)
+ root = self.db.open().root()
+ self.classifier = root.get(self.db_name)
+ if self.classifier is None:
+ # There is no classifier, so create one.
+ if options["globals", "verbose"]:
+ print >> sys.stderr, self.db_name, 'is a new ZODB'
+ self.classifier = root[self.db_name] = _PersistentClassifier()
+ get_transaction().commit()
+ else:
+ # It seems to me that the persistent classifier should store
+ # the nham and nspam values, but that doesn't appear to be the
+ # case, so work around that. This can be removed once I figure
+ # out the problem.
+ self.nham, self.nspam = self.classifier.wordinfo[self.statekey]
+ if options["globals", "verbose"]:
+ print >> sys.stderr, '%s is an existing ZODB, with %d ' \
+ 'ham and %d spam' % (self.db_name, self.nham,
+ self.nspam)
+
+ def store(self):
+ # It seems to me that the persistent classifier should store
+ # the nham and nspam values, but that doesn't appear to be the
+ # case, so work around that. This can be removed once I figure
+ # out the problem.
+ self.classifier.wordinfo[self.statekey] = (self.nham, self.nspam)
+ get_transaction().commit()
+
+ def close(self):
+ self.db.close()
+ self.storage.close()
+
+
+ class ZEOClassifier(ZODBClassifier):
+ def __init__(self, data_source_name):
+ source_info = data_source_name.split()
+ self.host = "localhost"
+ self.port = None
+ db_name = "SpamBayes"
+ for info in source_info:
+ if info.startswith("host"):
+ self.host = info[5:]
+ elif info.startswith("port"):
+ self.port = int(info[5:])
+ elif info.startswith("dbname"):
+ db_name = info[7:]
+ ZODBClassifier.__init__(self, db_name)
+
+ def create_storage(self):
+ from ZEO.ClientStorage import ClientStorage
+ if self.port:
+ addr = self.host, self.port
+ else:
+ addr = self.host
+ self.storage = ClientStorage(addr)
+
+
# Flags that the Trainer will recognise. These should be or'able integer
# values (i.e. 1, 2, 4, 8, etc.).
***************
*** 683,692 ****
return "Only one type of database can be specified"
! # values are classifier class and True if it accepts a mode
! # arg, False otherwise
! _storage_types = {"dbm" : (DBDictClassifier, True),
! "pickle" : (PickledClassifier, False),
! "pgsql" : (PGClassifier, False),
! "mysql" : (mySQLClassifier, False),
}
--- 846,858 ----
return "Only one type of database can be specified"
! # values are classifier class, True if it accepts a mode
! # arg, and True if the argument is a pathname
! _storage_types = {"dbm" : (DBDictClassifier, True, True),
! "pickle" : (PickledClassifier, False, True),
! "pgsql" : (PGClassifier, False, False),
! "mysql" : (mySQLClassifier, False, False),
! "cdb" : (CDBClassifier, False, True),
! "zodb" : (ZODBClassifier, False, True),
! "zeo" : (ZEOClassifier, False, False),
}
***************
*** 696,705 ****
By centralizing this code here, all the applications will behave
the same given the same options.
-
- db_type must be one of the following strings:
- dbm, pickle, pgsql, mysql
"""
try:
! klass, supports_mode = _storage_types[db_type]
except KeyError:
raise NoSuchClassifierError(db_type)
--- 862,868 ----
By centralizing this code here, all the applications will behave
the same given the same options.
"""
try:
! klass, supports_mode, unused = _storage_types[db_type]
except KeyError:
raise NoSuchClassifierError(db_type)
***************
*** 727,731 ****
}
! def database_type(opts):
"""Return the name of the database and the type to use. The output of
this function can be used as the db_type parameter for the open_storage
--- 890,895 ----
}
! def database_type(opts, default_type=("Storage", "persistent_use_database"),
! default_name=("Storage", "persistent_storage_file")):
"""Return the name of the database and the type to use. The output of
this function can be used as the db_type parameter for the open_storage
***************
*** 752,761 ****
raise MutuallyExclusiveError()
if nm is None and typ is None:
! typ = options["Storage", "persistent_use_database"]
if typ is True or typ == "True":
typ = "dbm"
elif typ is False or typ == "False":
typ = "pickle"
! nm = get_pathname_option("Storage", "persistent_storage_file")
return nm, typ
--- 916,933 ----
raise MutuallyExclusiveError()
if nm is None and typ is None:
! typ = options[default_type]
! # Backwards compatibility crud.
if typ is True or typ == "True":
typ = "dbm"
elif typ is False or typ == "False":
typ = "pickle"
! try:
! unused, unused, is_path = _storage_types[typ]
! except KeyError:
! raise NoSuchClassifierError(db_type)
! if is_path:
! nm = get_pathname_option(*default_name)
! else:
! nm = options[default_name]
return nm, typ
From anadelonbrin at users.sourceforge.net Mon Nov 22 01:27:55 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Mon Nov 22 01:27:57 2004
Subject: [Spambayes-checkins] spambayes/spambayes smtpproxy.py,1.8,1.9
Message-ID:
Update of /cvsroot/spambayes/spambayes/spambayes
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv21577/spambayes
Modified Files:
smtpproxy.py
Log Message:
Switch from using msg.asTokens to msg.tokenize.
Index: smtpproxy.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/spambayes/smtpproxy.py,v
retrieving revision 1.8
retrieving revision 1.9
diff -C2 -d -r1.8 -r1.9
*** smtpproxy.py 9 Nov 2004 02:30:33 -0000 1.8
--- smtpproxy.py 22 Nov 2004 00:27:52 -0000 1.9
***************
*** 447,457 ****
# mean that we didn't need to store the id with the message)
# but that might be a little unreliable.
! self.classifier.learn(msg.asTokens(), isSpam)
else:
if msg.GetTrained() == (not isSpam):
! self.classifier.unlearn(msg.asTokens(), not isSpam)
msg.RememberTrained(None)
if msg.GetTrained() is None:
! self.classifier.learn(msg.asTokens(), isSpam)
msg.RememberTrained(isSpam)
--- 447,457 ----
# mean that we didn't need to store the id with the message)
# but that might be a little unreliable.
! self.classifier.learn(msg.tokenize(), isSpam)
else:
if msg.GetTrained() == (not isSpam):
! self.classifier.unlearn(msg.tokenize(), not isSpam)
msg.RememberTrained(None)
if msg.GetTrained() is None:
! self.classifier.learn(msg.tokenize(), isSpam)
msg.RememberTrained(isSpam)
***************
*** 491,500 ****
msg.get_substance()
msg.delSBHeaders()
! self.classifier.unlearn(msg.asTokens(), not isSpam)
msg.RememberTrained(None)
if msg.GetTrained() is None:
msg.get_substance()
msg.delSBHeaders()
! self.classifier.learn(msg.asTokens(), isSpam)
msg.RememberTrained(isSpam)
self.classifier.store()
--- 491,500 ----
msg.get_substance()
msg.delSBHeaders()
! self.classifier.unlearn(msg.tokenize(), not isSpam)
msg.RememberTrained(None)
if msg.GetTrained() is None:
msg.get_substance()
msg.delSBHeaders()
! self.classifier.learn(msg.tokenize(), isSpam)
msg.RememberTrained(isSpam)
self.classifier.store()
From anadelonbrin at users.sourceforge.net Tue Nov 23 00:34:48 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Tue Nov 23 00:34:51 2004
Subject: [Spambayes-checkins] spambayes/spambayes Stats.py, 1.8,
1.9 message.py, 1.57, 1.58
Message-ID:
Update of /cvsroot/spambayes/spambayes/spambayes
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv26368/spambayes
Modified Files:
Stats.py message.py
Log Message:
Tidy up docstring.
Change MessageInfoBase's methods so that recording & retrieving a message are not
private methods and are more clearly named.
Change so that the messageinfodb doesn't get created/opened on import, but rather
through utility functions like those in spambayes.storage.
Change Stats.py to use the new methods rather than the old global.
Remove the asTokens function in favour of the existing tokenize function.
Fix the include_evidence header to check for *H* and *S* explicitly rather than any
token starting with *.
Index: Stats.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/spambayes/Stats.py,v
retrieving revision 1.8
retrieving revision 1.9
diff -C2 -d -r1.8 -r1.9
*** Stats.py 5 Nov 2004 03:03:00 -0000 1.8
--- Stats.py 22 Nov 2004 23:34:43 -0000 1.9
***************
*** 40,44 ****
import types
! from spambayes.message import msginfoDB
class Stats(object):
--- 40,44 ----
import types
! from spambayes.message import database_type, open_storage
class Stats(object):
***************
*** 64,67 ****
--- 64,69 ----
def CalculateStats(self):
self.Reset()
+ nm, typ = database_type()
+ msginfoDB = open_storage(nm, typ)
for msg in msginfoDB.db.keys():
self.total += 1
Index: message.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/spambayes/message.py,v
retrieving revision 1.57
retrieving revision 1.58
diff -C2 -d -r1.57 -r1.58
*** message.py 9 Nov 2004 02:30:33 -0000 1.57
--- message.py 22 Nov 2004 23:34:43 -0000 1.58
***************
*** 11,29 ****
MessageInfoDB is a simple shelve persistency class for the persistent
! state of a Message obect. For the moment, the db name is hard-coded,
! but we'll have to do this a different way. Mark Hammond's idea is to
! have a master database, that simply keeps track of the names and instances
! of other databases, such as the wordinfo and msginfo databases. The
! MessageInfoDB currently does not provide iterators, but should at some
! point. This would allow us to, for example, see how many messages
! have been trained differently than their classification, for fp/fn
! assessment purposes.
Message is an extension of the email package Message class, to
include persistent message information. The persistent state
! -currently- consists of the message id, its current
! classification, and its current training. The payload is not
! persisted. Payload persistence is left to whatever mail client
! software is being used.
SBHeaderMessage extends Message to include spambayes header specific
--- 11,23 ----
MessageInfoDB is a simple shelve persistency class for the persistent
! state of a Message obect. The MessageInfoDB currently does not provide
! iterators, but should at some point. This would allow us to, for
! example, see how many messages have been trained differently than their
! classification, for fp/fn assessment purposes.
Message is an extension of the email package Message class, to
include persistent message information. The persistent state
! currently consists of the message id, its current classification, and
! its current training. The payload is not persisted.
SBHeaderMessage extends Message to include spambayes header specific
***************
*** 33,38 ****
A typical classification usage pattern would be something like:
! >>> msg = spambayes.message.SBHeaderMessage()
! >>> msg.setPayload(substance) # substance comes from somewhere else
>>> id = msg.setIdFromPayload()
--- 27,33 ----
A typical classification usage pattern would be something like:
! >>> import email
! >>> # substance comes from somewhere else
! >>> msg = email.message_from_string(substance, _class=SBHeaderMessage)
>>> id = msg.setIdFromPayload()
***************
*** 50,55 ****
A typical usage pattern to train as spam would be something like:
! >>> msg = spambayes.message.SBHeaderMessage()
! >>> msg.setPayload(substance) # substance comes from somewhere else
>>> id = msg.setId(msgid) # id is a fname, outlook msg id, something...
--- 45,51 ----
A typical usage pattern to train as spam would be something like:
! >>> import email
! >>> # substance comes from somewhere else
! >>> msg = email.message_from_string(substance, _class=SBHeaderMessage)
>>> id = msg.setId(msgid) # id is a fname, outlook msg id, something...
***************
*** 64,75 ****
To Do:
- o Master DB module, or at least make the msginfodb name an options parm
- o Figure out how to safely add message id to body (or if it can be done
- at all...)
o Suggestions?
! """
!
! # This module is part of the spambayes project, which is Copyright 2002-3
# The Python Software Foundation and is covered by the Python Software
# Foundation license.
--- 60,67 ----
To Do:
o Suggestions?
+ """
! # This module is part of the spambayes project, which is Copyright 2002-5
# The Python Software Foundation and is covered by the Python Software
# Foundation license.
***************
*** 102,105 ****
--- 94,98 ----
import email.Header
+ from spambayes import storage
from spambayes import dbmstorage
from spambayes.Options import options, get_pathname_option
***************
*** 117,121 ****
self.db_name = db_name
! def _getState(self, msg):
if self.db is not None:
try:
--- 110,114 ----
self.db_name = db_name
! def load_msg(self, msg):
if self.db is not None:
try:
***************
*** 132,136 ****
setattr(msg, att, val)
! def _setState(self, msg):
if self.db is not None:
attributes = []
--- 125,129 ----
setattr(msg, att, val)
! def store_msg(self, msg):
if self.db is not None:
attributes = []
***************
*** 140,144 ****
self.store()
! def _delState(self, msg):
if self.db is not None:
del self.db[msg.getId()]
--- 133,137 ----
self.store()
! def remove_msg(self, msg):
if self.db is not None:
del self.db[msg.getId()]
***************
*** 205,228 ****
self.db.sync()
! # This should come from a Mark Hammond idea of a master db
! # For the moment, we get the name of another file from the options,
! # so that these files don't litter lots of working directories.
! # Once there is a master db, this option can be removed.
! message_info_db_name = get_pathname_option("Storage", "messageinfo_storage_file")
! if options["Storage", "persistent_use_database"] is True or \
! options["Storage", "persistent_use_database"] == "dbm":
! msginfoDB = MessageInfoDB(message_info_db_name)
! elif options["Storage", "persistent_use_database"] is False or \
! options["Storage", "persistent_use_database"] == "pickle":
! msginfoDB = MessageInfoPickle(message_info_db_name)
! else:
! # Ah - now, what? Maybe the user has mysql or pgsql or zeo,
! # or some other newfangled thing! We don't know what to do
! # in that case, so just use a pickle, since it's the safest
! # option.
! msginfoDB = MessageInfoPickle(message_info_db_name)
class Message(email.Message.Message):
! '''An email.Message.Message extended for Spambayes'''
def __init__(self):
--- 198,236 ----
self.db.sync()
! # values are classifier class, True if it accepts a mode
! # arg, and True if the argument is a pathname
! _storage_types = {"dbm" : (MessageInfoDB, True, True),
! "pickle" : (MessageInfoPickle, False, True),
! ## "pgsql" : (MessageInfoPG, False, False),
! ## "mysql" : (MessageInfoMySQL, False, False),
! ## "cdb" : (MessageInfoCDB, False, True),
! ## "zodb" : (MessageInfoZODB, False, True),
! ## "zeo" : (MessageInfoZEO, False, False),
! }
!
! def open_storage(data_source_name, db_type="dbm", mode=None):
! """Return a storage object appropriate to the given parameters."""
! try:
! klass, supports_mode, unused = _storage_types[db_type]
! except KeyError:
! raise storage.NoSuchClassifierError(db_type)
! if supports_mode and mode is not None:
! return klass(data_source_name, mode)
! else:
! return klass(data_source_name)
!
! def database_type():
! dn = ("Storage", "messageinfo_storage_file")
! # The storage options here may lag behind those in storage.py,
! # so we try and be more robust. If we can't use the same storage
! # method, then we fall back to pickle.
! nm, typ = storage.database_type((), default_name=dn)
! if typ not in _storage_types.keys():
! typ = "pickle"
! return nm, typ
!
class Message(email.Message.Message):
! '''An email.Message.Message extended for SpamBayes'''
def __init__(self):
***************
*** 230,233 ****
--- 238,243 ----
# persistent state
+ nm, typ = database_type()
+ self.message_info_db = open_storage(nm, typ)
self.stored_attributes = ['c', 't',]
self.id = None
***************
*** 271,284 ****
self.id = id
! msginfoDB._getState(self)
def getId(self):
return self.id
- def asTokens(self):
- return tokenize(self)
-
def tokenize(self):
! return self.asTokens()
def _force_CRLF(self, data):
--- 281,291 ----
self.id = id
! self.message_info_db.load_msg(self)
def getId(self):
return self.id
def tokenize(self):
! return tokenize(self)
def _force_CRLF(self, data):
***************
*** 303,307 ****
def modified(self):
if self.id: # only persist if key is present
! msginfoDB._setState(self)
def GetClassification(self):
--- 310,314 ----
def modified(self):
if self.id: # only persist if key is present
! self.message_info_db.store_msg(self)
def GetClassification(self):
***************
*** 348,356 ****
class SBHeaderMessage(Message):
! '''Message class that is cognizant of Spambayes headers.
! Adds routines to add/remove headers for Spambayes'''
!
! def __init__(self):
! Message.__init__(self)
def setIdFromPayload(self):
--- 355,360 ----
class SBHeaderMessage(Message):
! '''Message class that is cognizant of SpamBayes headers.
! Adds routines to add/remove headers for SpamBayes'''
def setIdFromPayload(self):
***************
*** 396,400 ****
evd = []
for word, score in clues:
! if (word[0] == '*' or score <= hco or score >= sco):
if isinstance(word, types.UnicodeType):
word = email.Header.Header(word,
--- 400,405 ----
evd = []
for word, score in clues:
! if (word == '*H*' or word == '*S*' \
! or score <= hco or score >= sco):
if isinstance(word, types.UnicodeType):
word = email.Header.Header(word,
From anadelonbrin at users.sourceforge.net Tue Nov 23 00:37:15 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Tue Nov 23 00:37:19 2004
Subject: [Spambayes-checkins] spambayes/spambayes/test test_storage.py, 1.5,
1.5.4.1
Message-ID:
Update of /cvsroot/spambayes/spambayes/spambayes/test
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv27019/spambayes/test
Modified Files:
Tag: release_1_0-branch
test_storage.py
Log Message:
Backport fix for test_storage (True->"dbm")
Index: test_storage.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/spambayes/test/test_storage.py,v
retrieving revision 1.5
retrieving revision 1.5.4.1
diff -C2 -d -r1.5 -r1.5.4.1
*** test_storage.py 24 Dec 2003 17:16:38 -0000 1.5
--- test_storage.py 22 Nov 2004 23:37:12 -0000 1.5.4.1
***************
*** 152,156 ****
try:
try:
! open_storage(db_name, True)
except SystemExit:
pass
--- 152,156 ----
try:
try:
! open_storage(db_name, "dbm")
except SystemExit:
pass
From anadelonbrin at users.sourceforge.net Tue Nov 23 00:38:37 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Tue Nov 23 00:38:40 2004
Subject: [Spambayes-checkins] spambayes/spambayes message.py, 1.49.4.4,
1.49.4.5
Message-ID:
Update of /cvsroot/spambayes/spambayes/spambayes
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv27234/spambayes
Modified Files:
Tag: release_1_0-branch
message.py
Log Message:
Backport docstring fix.
Fix usage of StringIO
Index: message.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/spambayes/message.py,v
retrieving revision 1.49.4.4
retrieving revision 1.49.4.5
diff -C2 -d -r1.49.4.4 -r1.49.4.5
*** message.py 22 Oct 2004 05:00:51 -0000 1.49.4.4
--- message.py 22 Nov 2004 23:38:34 -0000 1.49.4.5
***************
*** 11,17 ****
MessageInfoDB is a simple shelve persistency class for the persistent
! state of a Message obect. For the moment, the db name is hard-coded,
! but we'll have to do this a different way. Mark Hammond's idea is to
! have a master database, that simply keeps track of the names and instances
of other databases, such as the wordinfo and msginfo databases. The
MessageInfoDB currently does not provide iterators, but should at some
--- 11,16 ----
MessageInfoDB is a simple shelve persistency class for the persistent
! state of a Message obect. Mark Hammond's idea is to have a master
! database, that simply keeps track of the names and instances
of other databases, such as the wordinfo and msginfo databases. The
MessageInfoDB currently does not provide iterators, but should at some
***************
*** 22,29 ****
Message is an extension of the email package Message class, to
include persistent message information. The persistent state
! -currently- consists of the message id, its current
classification, and its current training. The payload is not
! persisted. Payload persistence is left to whatever mail client
! software is being used.
SBHeaderMessage extends Message to include spambayes header specific
--- 21,27 ----
Message is an extension of the email package Message class, to
include persistent message information. The persistent state
! currently consists of the message id, its current
classification, and its current training. The payload is not
! persisted.
SBHeaderMessage extends Message to include spambayes header specific
***************
*** 246,250 ****
def setPayload(self, payload):
prs = email.Parser.Parser()
! fp = StringIO(payload)
# this is kindof a hack, due to the fact that the parser creates a
# new message object, and we already have the message object
--- 244,248 ----
def setPayload(self, payload):
prs = email.Parser.Parser()
! fp = StringIO.StringIO(payload)
# this is kindof a hack, due to the fact that the parser creates a
# new message object, and we already have the message object
From anadelonbrin at users.sourceforge.net Tue Nov 23 00:39:14 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Tue Nov 23 00:39:18 2004
Subject: [Spambayes-checkins] spambayes/spambayes __init__.py, 1.11.4.2,
1.11.4.3
Message-ID:
Update of /cvsroot/spambayes/spambayes/spambayes
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv27358/spambayes
Modified Files:
Tag: release_1_0-branch
__init__.py
Log Message:
Prepare for 1.0.1
Index: __init__.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/spambayes/__init__.py,v
retrieving revision 1.11.4.2
retrieving revision 1.11.4.3
diff -C2 -d -r1.11.4.2 -r1.11.4.3
*** __init__.py 8 Jul 2004 23:51:24 -0000 1.11.4.2
--- __init__.py 22 Nov 2004 23:39:12 -0000 1.11.4.3
***************
*** 1,3 ****
# package marker.
! __version__ = '1.0'
--- 1,3 ----
# package marker.
! __version__ = '1.0.1'
From anadelonbrin at users.sourceforge.net Tue Nov 23 00:40:33 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Tue Nov 23 00:40:36 2004
Subject: [Spambayes-checkins] spambayes/scripts sb_dbexpimp.py, 1.12.4.2,
1.12.4.3
Message-ID:
Update of /cvsroot/spambayes/spambayes/scripts
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv27579/scripts
Modified Files:
Tag: release_1_0-branch
sb_dbexpimp.py
Log Message:
Backport fix for merging into a dbm database, and fix for opening a nonexistant csv
file.
Index: sb_dbexpimp.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/scripts/sb_dbexpimp.py,v
retrieving revision 1.12.4.2
retrieving revision 1.12.4.3
diff -C2 -d -r1.12.4.2 -r1.12.4.3
*** sb_dbexpimp.py 9 Nov 2004 22:53:02 -0000 1.12.4.2
--- sb_dbexpimp.py 22 Nov 2004 23:40:29 -0000 1.12.4.3
***************
*** 189,198 ****
bayes = spambayes.storage.open_storage(dbFN, useDBM)
! try:
! fp = open(inFN, 'rb')
! except IOError, e:
! if e.errno != errno.ENOENT:
! raise
!
rdr = csv.reader(fp)
(nham, nspam) = rdr.next()
--- 189,193 ----
bayes = spambayes.storage.open_storage(dbFN, useDBM)
! fp = open(inFN, 'rb')
rdr = csv.reader(fp)
(nham, nspam) = rdr.next()
***************
*** 215,221 ****
word = uunquote(word)
! try:
! wi = bayes.wordinfo[word]
! except KeyError:
wi = bayes.WordInfoClass()
--- 210,217 ----
word = uunquote(word)
! # Can't use wordinfo[word] here, because wordinfo
! # is only a cache with dbm! Need to use _wordinfoget instead.
! wi = bayes._wordinfoget(word)
! if wi is None:
wi = bayes.WordInfoClass()
***************
*** 240,245 ****
-
-
if __name__ == '__main__':
--- 236,239 ----
From anadelonbrin at users.sourceforge.net Tue Nov 23 00:41:43 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Tue Nov 23 00:41:48 2004
Subject: [Spambayes-checkins] spambayes README-DEVEL.txt,1.12,1.12.4.1
Message-ID:
Update of /cvsroot/spambayes/spambayes
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv27808
Modified Files:
Tag: release_1_0-branch
README-DEVEL.txt
Log Message:
Backport updates about the build process.
Index: README-DEVEL.txt
===================================================================
RCS file: /cvsroot/spambayes/spambayes/README-DEVEL.txt,v
retrieving revision 1.12
retrieving revision 1.12.4.1
diff -C2 -d -r1.12 -r1.12.4.1
*** README-DEVEL.txt 8 Feb 2004 02:45:37 -0000 1.12
--- README-DEVEL.txt 22 Nov 2004 23:41:40 -0000 1.12.4.1
***************
*** 505,511 ****
o Now commit spambayes/__init__.py and tag the whole checkout - see the
existing tag names for the tag name format.
! o Update the website News, Download and Application sections.
o Update reply.txt in the website repository as needed (it specifies the
! latest version). Then let Tim, Barry or Skip know that they need to
update the autoresponder.
--- 505,511 ----
o Now commit spambayes/__init__.py and tag the whole checkout - see the
existing tag names for the tag name format.
! o Update the website News, Download, Windows and Application sections.
o Update reply.txt in the website repository as needed (it specifies the
! latest version). Then let Tim, Barry, Tony, or Skip know that they need to
update the autoresponder.
***************
*** 525,526 ****
--- 525,555 ----
else is left alone.
+ Making a binary release
+ =======================
+
+ The binary release includes both sb_server and the Outlook plug-in and
+ is an installer for Windows (98 and above) systems. In order to have
+ COM typelibs that work with Outlook 2000, 2002 and 2003, you need to
+ build the installer on a system that has Outlook 2000 (not a more recent
+ version). You also need to have InnoSetup, resourcepackage and py2exe
+ installed.
+
+ o Get hold of a fresh copy of the source (Windows line endings,
+ presumably).
+ o Run sb_server and open the web interface. This gets resourcepackage
+ to generate the needed files.
+ o Replace the __init__.py file in spambayes/spambayes/resources with
+ a blank file to disable resourcepackage.
+ o Ensure that the version numbers in spambayes/spambayes/__init__.py
+ and spambayes/spambayes/Version.py are up-to-date.
+ o Ensure that you don't have any other copies of spambayes in your
+ PYTHONPATH, or py2exe will pick these up! If in doubt, run
+ setup.py install.
+ o Run the "setup_all.py" script in the spambayes/windows/py2exe/
+ directory. This uses py2exe to create the files that Inno will install.
+ o Open (in InnoSetup) the spambayes.iss file in the spambayes/windows/
+ directory. Change the version number in the AppVerName and
+ OutputBaseFilename lines to the new number.
+ o Compile the spambayes.iss script to get the executable.
+ o You can now follow the steps in the source release description above,
+ from the testing step.
From anadelonbrin at users.sourceforge.net Tue Nov 23 00:48:39 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Tue Nov 23 00:48:41 2004
Subject: [Spambayes-checkins] spambayes CHANGELOG.txt,1.44.4.3,1.44.4.4
Message-ID:
Update of /cvsroot/spambayes/spambayes
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv29218
Modified Files:
Tag: release_1_0-branch
CHANGELOG.txt
Log Message:
Bring up-to-date.
Index: CHANGELOG.txt
===================================================================
RCS file: /cvsroot/spambayes/spambayes/CHANGELOG.txt,v
retrieving revision 1.44.4.3
retrieving revision 1.44.4.4
diff -C2 -d -r1.44.4.3 -r1.44.4.4
*** CHANGELOG.txt 9 Nov 2004 22:53:55 -0000 1.44.4.3
--- CHANGELOG.txt 22 Nov 2004 23:48:28 -0000 1.44.4.4
***************
*** 3,6 ****
--- 3,7 ----
Release 1.0.1
=============
+ Tony Meyer 11/11/2004 The installer wasn't offered to install a startup items shortcut, so fix that. This is a non-ideal patch, but appears to be the only way Inno will work.
Tony Meyer 03/11/2004 Fix [ 1022848 ] sb_dbexpimp.py crashes while importing into pickle file
Tony Meyer 03/11/2004 Fix [ 831864 ] sb_mboxtrain.py: flock vs. lockf
***************
*** 21,24 ****
--- 22,26 ----
Tony Meyer 14/07/2004 Fix [ 959937 ] "Invalid server" message not always correct
Skip Montanaro 10/07/2004 tte.py: 2.3 compatibility: add reversed() function
+ Tony Meyer 09/07/2004 Update test_storage.py test to reflect (current) correct way to call open_storage. Fixes part of [ 981970 ] tests failing.
Tony Meyer 09/07/2004 Using -u with sb_server had been broken. Fix this.
From anadelonbrin at users.sourceforge.net Tue Nov 23 00:49:35 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Tue Nov 23 00:49:38 2004
Subject: [Spambayes-checkins] spambayes CHANGELOG.txt,1.44.4.4,1.44.4.5
Message-ID:
Update of /cvsroot/spambayes/spambayes
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv29458
Modified Files:
Tag: release_1_0-branch
CHANGELOG.txt
Log Message:
Bring up-to-date.
Index: CHANGELOG.txt
===================================================================
RCS file: /cvsroot/spambayes/spambayes/CHANGELOG.txt,v
retrieving revision 1.44.4.4
retrieving revision 1.44.4.5
diff -C2 -d -r1.44.4.4 -r1.44.4.5
*** CHANGELOG.txt 22 Nov 2004 23:48:28 -0000 1.44.4.4
--- CHANGELOG.txt 22 Nov 2004 23:49:32 -0000 1.44.4.5
***************
*** 3,6 ****
--- 3,8 ----
Release 1.0.1
=============
+ Tony Meyer 15/11/2004 Fix a bug in sb_dbexpimp.py where merging into an existing dbm file might lose training data.
+ Tony Meyer 15/11/2004 sb_dbexpimp.py: Fail if the csv file doesn't exist that we are trying to import from rather than keeping going, which made no sense.
Tony Meyer 11/11/2004 The installer wasn't offered to install a startup items shortcut, so fix that. This is a non-ideal patch, but appears to be the only way Inno will work.
Tony Meyer 03/11/2004 Fix [ 1022848 ] sb_dbexpimp.py crashes while importing into pickle file
From anadelonbrin at users.sourceforge.net Tue Nov 23 00:50:01 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Tue Nov 23 00:50:04 2004
Subject: [Spambayes-checkins] spambayes CHANGELOG.txt,1.48,1.49
Message-ID:
Update of /cvsroot/spambayes/spambayes
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv29564
Modified Files:
CHANGELOG.txt
Log Message:
Bring up-to-date.
Index: CHANGELOG.txt
===================================================================
RCS file: /cvsroot/spambayes/spambayes/CHANGELOG.txt,v
retrieving revision 1.48
retrieving revision 1.49
diff -C2 -d -r1.48 -r1.49
*** CHANGELOG.txt 9 Nov 2004 03:13:00 -0000 1.48
--- CHANGELOG.txt 22 Nov 2004 23:49:58 -0000 1.49
***************
*** 3,6 ****
--- 3,23 ----
Release 1.1a1
=============
+ Tony Meyer 23/11/2004 message.py: Change MessageInfoBase's methods so that recording & retrieving a message are not private methods and are more clearly named.
+ Tony Meyer 23/11/2004 message.py: Change so that the messageinfodb doesn't get created/opened on import, but rather through utility functions like those in spambayes.storage.
+ Tony Meyer 23/11/2004 message.py: Remove the asTokens method in favour of the existing tokenize function.
+ Tony Meyer 23/11/2004 message.py: Fix the include_evidence header to check for *H* and *S* explicitly rather than any token starting with *.
+ Tony Meyer 22/11/2004 Add new storage types: CBDClassifier, ZODBClassifier, ZEOClassifier
+ Tony Meyer 22/11/2004 Add code to allow persistent_storage_name to not be expanded into an absolute path with certain storage types (e.g. the SQL ones).
+ Tony Meyer 22/11/2004 sb_pop3dnd: Play nicer with win32 gui
+ Tony Meyer 22/11/2004 sb_pop3dnd: Don't use the deprecated 'strict' kwarg for email messages.
+ Tony Meyer 22/11/2004 sb_pop3dnd: Add appropriate state createworkers function & call.
+ Tony Meyer 22/11/2004 sb_pop3dnd: Modify to have the prepare/start/stop API that sb_server has.
+ Tony Meyer 22/11/2004 sb_filter: Remove the "experimental" marking in the docstring for the training functions.
+ Tony Meyer 15/11/2004 Fix a bug in sb_dbexpimp.py where merging into an existing dbm file might lose training data.
+ Tony Meyer 15/11/2004 sb_dbexpimp.py: Fail if the csv file doesn't exist that we are trying to import from rather than keeping going, which made no sense.
+ Tony Meyer 15/11/2004 sb_dbexpimp.py: Stop bothering to remove the .dat and .dir files that dumbdbm create (long time since they were supported), and remove the verbose flag, which doesn't actually do anything.
+ Kenny Pitt 12/11/2004 Add a separate Statistics tab to make room for more detailed statistics.
+ Toby Dickenson 11/11/2004 Add a version of sb_bnfilter in C (for speed).
+ Tony Meyer 11/11/2004 The installer wasn't offered to install a startup items shortcut, so fix that. This is a non-ideal patch, but appears to be the only way Inno will work.
Tony Meyer 09/11/2004 Implement [ 870524 ] Make the message-proxy timeout configurable
Tony Meyer 09/11/2004 Use email.message_from_string(text, _class) rather than our wrapper functions.
***************
*** 20,24 ****
Tony Meyer 03/11/2004 Fix [ 1022848 ] sb_dbexpimp.py crashes while importing into pickle file
Tony Meyer 03/11/2004 Fix [ 831864 ] sb_mboxtrain.py: flock vs. lockf
! Tony Meyer 03/11/2004 Fix [ 922063 ] Intermittent sb_filter.py faliure with URL pickle
Tony Meyer 03/11/2004 Outlook: Also add an "X-Exchange-Delivery-Time" header to the faked up Exchange headers.
Tony Meyer 02/11/2004 Improve the web interface statistics
--- 37,41 ----
Tony Meyer 03/11/2004 Fix [ 1022848 ] sb_dbexpimp.py crashes while importing into pickle file
Tony Meyer 03/11/2004 Fix [ 831864 ] sb_mboxtrain.py: flock vs. lockf
! Tony Meyer 03/11/2004 Fix [ 922063 ] Intermittent sb_filter.py failure with URL pickle
Tony Meyer 03/11/2004 Outlook: Also add an "X-Exchange-Delivery-Time" header to the faked up Exchange headers.
Tony Meyer 02/11/2004 Improve the web interface statistics
***************
*** 33,36 ****
--- 50,55 ----
Tony Meyer 18/10/2004 Copy Skip's -o command line option (available in all the regular scripts) to timcv.py.
Tony Meyer 18/10/2004 TestDriver: If show_histograms was False, then the global ham/spam histogram never had the stats computed, but this gets used later, so the script would die with an AtrributeError. Fix that.
+ Tony Meyer 15/10/2004 Outlook: Add persistent statistics
+ Tony Meyer 13/10/2004 Implement [ 1039057 ] Diffs for IMAP login problems...
Tony Meyer 13/10/2004 Add Classifier.use_bigrams option to the Advanced options page for sb_server and imapfilter.
Tony Meyer 13/10/2004 Fix mySQL storage option for the case where the server does not support rollbacks.
***************
*** 42,46 ****
Tony Meyer 30/09/2004 Fix [ 903905 ] IMAP Configuration Error
Tony Meyer 29/09/2004 Fix [ 1036601 ] typo on advanced config web page
! Tony Meyer 15/09/2004 sb_upload: Clarify docstring so that it's mroe clear what this script does. The -n / --null command line option didn't actually do anything; change it so that it does.
Sjoerd Mullender 20/08/2004 imapfilter: Fix the regular expression to match the Message-ID header by stopping on newline.
Skip Montanaro 18/08/2004 tte.py: Seems better to try and alternate ham/spam scoring instead of scoring all the hams in a batch and all the spams.
--- 61,65 ----
Tony Meyer 30/09/2004 Fix [ 903905 ] IMAP Configuration Error
Tony Meyer 29/09/2004 Fix [ 1036601 ] typo on advanced config web page
! Tony Meyer 15/09/2004 sb_upload: Clarify docstring so that it's more clear what this script does. The -n / --null command line option didn't actually do anything; change it so that it does.
Sjoerd Mullender 20/08/2004 imapfilter: Fix the regular expression to match the Message-ID header by stopping on newline.
Skip Montanaro 18/08/2004 tte.py: Seems better to try and alternate ham/spam scoring instead of scoring all the hams in a batch and all the spams.
***************
*** 84,88 ****
Tony Meyer 04/07/2004 Fix [ 933473 ] Unnecessary spam folder hook.
Neil Schemenauer 30/06/2004 New script, hammie2cdb.py, that converts hammie databases into cdb databases (usable by CdbClassifier).
! Skip Montanaro 29/06/2004 tte.py: Worm around the extremely rare case during verbose most where the message sneaks through without either a message-id or a subject.
Skip Montanaro 26/06/2004 New script, postfixproxy.py, a first cut proxy filter for use with PostFix 2.1's content filter stuff.
Skip Montanaro 26/06/2004 hammie: Rename filter() to score_and_filter() and return both the spamprob and the modified message.
--- 103,107 ----
Tony Meyer 04/07/2004 Fix [ 933473 ] Unnecessary spam folder hook.
Neil Schemenauer 30/06/2004 New script, hammie2cdb.py, that converts hammie databases into cdb databases (usable by CdbClassifier).
! Skip Montanaro 29/06/2004 tte.py: Worm around the extremely rare case during verbose mode where the message sneaks through without either a message-id or a subject.
Skip Montanaro 26/06/2004 New script, postfixproxy.py, a first cut proxy filter for use with PostFix 2.1's content filter stuff.
Skip Montanaro 26/06/2004 hammie: Rename filter() to score_and_filter() and return both the spamprob and the modified message.
***************
*** 342,346 ****
Alpha Release 8
===============
! There is no Alpha Release 8.
Alpha Release 7
--- 361,365 ----
Alpha Release 8
===============
! There was no Alpha Release 8.
Alpha Release 7
From anadelonbrin at users.sourceforge.net Tue Nov 23 00:57:31 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Tue Nov 23 00:57:34 2004
Subject: [Spambayes-checkins] website faq.txt,1.82,1.83
Message-ID:
Update of /cvsroot/spambayes/website
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv31119
Modified Files:
faq.txt
Log Message:
Add extra information to the "Does it work with Exchange" question based on a suggestion
by Scott L Miller.
Index: faq.txt
===================================================================
RCS file: /cvsroot/spambayes/website/faq.txt,v
retrieving revision 1.82
retrieving revision 1.83
diff -C2 -d -r1.82 -r1.83
*** faq.txt 8 Nov 2004 01:22:59 -0000 1.82
--- faq.txt 22 Nov 2004 23:57:29 -0000 1.83
***************
*** 542,545 ****
--- 542,551 ----
Yes.
+ The SpamBayes Outlook plug-in simply watches the folders that you have
+ instructed it to for new mail. When new mail is received, Outlook informs
+ SpamBayes, which then scores the message and performs the actions you have
+ asked it to, depending on the message score. Thus it isn't involved in
+ the delivery of mail, and so has no idea that it is coming from Exchange.
+
Can mail marked as spam automatically be marked as read?
--------------------------------------------------------
***************
*** 1246,1250 ****
``sb_server.py -u 8881 -b`` (or ``sb_imapfilter.py -u 8881 -b``), or another
port that you know is free and available on your machine.
!
Known Problems & Workarounds
--- 1252,1256 ----
``sb_server.py -u 8881 -b`` (or ``sb_imapfilter.py -u 8881 -b``), or another
port that you know is free and available on your machine.
!
Known Problems & Workarounds
From anadelonbrin at users.sourceforge.net Tue Nov 23 01:12:47 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Tue Nov 23 01:12:51 2004
Subject: [Spambayes-checkins] website faq.txt,1.83,1.84
Message-ID:
Update of /cvsroot/spambayes/website
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv2133
Modified Files:
faq.txt
Log Message:
Add a FAQ about the "no filterable messages" Outlook problem.
Index: faq.txt
===================================================================
RCS file: /cvsroot/spambayes/website/faq.txt,v
retrieving revision 1.83
retrieving revision 1.84
diff -C2 -d -r1.83 -r1.84
*** faq.txt 22 Nov 2004 23:57:29 -0000 1.83
--- faq.txt 23 Nov 2004 00:12:44 -0000 1.84
***************
*** 1420,1423 ****
--- 1420,1449 ----
+ I get an error message "No filterable messages are selected".
+ -------------------------------------------------------------
+
+ This applies to the Outlook plug-in only. SpamBayes only lets you train
+ on messages that have been received (these are the only messages that
+ should be trained on). This means that you cannot train on sent messages,
+ drafts, notes, calendar items, tasks, and so on.
+
+ To check whether a message has been received, SpamBayes checks some of the
+ Outlook properties for the message. Very seldomly, these can result in a
+ false classification, where the message has been received, but SpamBayes
+ does not believe it has. The best move here is to simply move the message
+ yourself. If this is a recurring problem, please add comments to the
+ `appropriate SourceForge tracker`_.
+
+ Note that one cause of this problem is that with some versions of Outlook
+ and Outlook Express, moving mail from Outlook Express to Outlook will strip
+ the mail of all Internet headers, which means the messages are not able to
+ be filtered/trained. However, this is not a problem with SpamBayes - you
+ can either work around the export/import problem, or simply not use those
+ messages for training (we do not recommend pre-training in bulk, in any
+ case).
+
+ .. _appropriate SourceForge tracker: http://sourceforge.net/tracker/index.php?func=detail&aid=854547&group_id=61702&atid=498103
+
+
Development
===========
From anadelonbrin at users.sourceforge.net Tue Nov 23 01:15:46 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Tue Nov 23 01:15:49 2004
Subject: [Spambayes-checkins] spambayes/spambayes oe_mailbox.py,1.9,1.10
Message-ID:
Update of /cvsroot/spambayes/spambayes/spambayes
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv2913/spambayes
Modified Files:
oe_mailbox.py
Log Message:
Add modifications/improvements mostly from:
[ 800671 ] Windows GUI for easy Outlook Express mailboxes training
Index: oe_mailbox.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/spambayes/oe_mailbox.py,v
retrieving revision 1.9
retrieving revision 1.10
diff -C2 -d -r1.9 -r1.10
*** oe_mailbox.py 19 Jan 2004 17:58:28 -0000 1.9
--- oe_mailbox.py 23 Nov 2004 00:15:42 -0000 1.10
***************
*** 1,30 ****
from __future__ import generators
! # This module is part of the spambayes project, which is Copyright 2002-3
# The Python Software Foundation and is covered by the Python Software
# Foundation license.
! # Simple Python library for Outlook Express mailboxes handling
! # Based on C++ work by Arne Schloh
!
! __author__ = "Romain Guy"
__credits__ = "All the SpamBayes folk"
import binascii
import os
import struct
import msgs
! import StringIO
import sys
! from time import gmtime, strftime
try:
import win32api
import win32con
from win32com.shell import shell, shellcon
except ImportError:
! # Not win32, or win32all not installed.
# Some functions will not work, but some will.
! win32api = win32con = shell = shellcon = None
###########################################################################
--- 1,58 ----
+ """
+ Simple Python library for Outlook Express mailbox handling, and some
+ other Outlook Express utility functions.
+
+ Functions:
+ getDBXFilesList()
+ Returns a list containing the DBX file names for current user
+ getMbox(dbxPath)
+ Returns an mbox converted from a DBX file
+ getRegistryKey()
+ Returns the root key for current user's Outlook Express settings
+ getStorePath()
+ Returns the path where DBX files are stored for current user
+ train(dbxPath, isSpam)
+ Trains a DBX file as spam or ham through Hammie
+ """
+
from __future__ import generators
! # This module is part of the spambayes project, which is Copyright 2002-5
# The Python Software Foundation and is covered by the Python Software
# Foundation license.
! __author__ = "Romain Guy "
__credits__ = "All the SpamBayes folk"
+ # Based on C++ work by Arne Schloh
+
import binascii
import os
+ import re
import struct
+ import mailbox
import msgs
! try:
! import cStringIO as StringIO
! except ImportError:
! import StringIO
import sys
! from time import *
try:
import win32api
import win32con
+ import win32gui
from win32com.shell import shell, shellcon
except ImportError:
! # Not win32, or pywin32 not installed.
# Some functions will not work, but some will.
! win32api = win32con = win32gui = shell = shellcon = None
!
! import hammie
! import oe_mailbox
! import mboxutils
!
! from spambayes.Options import options
###########################################################################
***************
*** 453,457 ****
if address and entries:
tree = dbxTree(dbxStream, address, entries)
! dbxBuffer = ""
for i in range(entries):
--- 481,485 ----
if address and entries:
tree = dbxTree(dbxStream, address, entries)
! dbxBuffer = []
for i in range(entries):
***************
*** 468,475 ****
# data from the message itself, as this will
# result in incorrect tokens.
! dbxBuffer += "From spambayes@spambayes.org %s\n%s" \
! % (strftime("%a %b %d %H:%M:%S MET %Y",
! gmtime()), message.getText())
! content = dbxBuffer
dbxStream.close()
return content
--- 496,504 ----
# data from the message itself, as this will
# result in incorrect tokens.
! dbxBuffer.append("From spambayes@spambayes.org %s\n%s" \
! % (strftime("%a %b %d %H:%M:%S MET %Y",
! gmtime()),
! message.getText()))
! content = "".join(dbxBuffer)
dbxStream.close()
return content
***************
*** 479,491 ****
Tested with Outlook Express 6.0 with Windows XP."""
- if sys.platform != "win32":
- # AFAIK, there is only a Win32 OE, and a Mac OE.
- # The Mac OE should be easy enough, but I don't know
- # where the dbx files are stored (I presume they are in the
- # same format).
- raise NotImplementedError
if win32api is None:
# Delayed import error from top.
! raise ImportError("win32all not installed")
reg = win32api.RegOpenKeyEx(win32con.HKEY_USERS, "")
--- 508,514 ----
Tested with Outlook Express 6.0 with Windows XP."""
if win32api is None:
# Delayed import error from top.
! raise ImportError("pywin32 not installed")
reg = win32api.RegOpenKeyEx(win32con.HKEY_USERS, "")
***************
*** 527,544 ****
yield subkey
def OEStoreRoot():
"""Return the path to the Outlook Express Store Root.
Tested with Outlook Express 6.0 with Windows XP."""
! # Run through the identity keys, using the first that
! # works.
! raw = ""
! for identity in OEIdentityKeys():
! try:
! raw = win32api.RegQueryValueEx(identity, "Store Root")
! except win32api.error:
! pass
! else:
! break
# I can't find a shellcon to that is the same as %UserProfile%,
# so extract it from CSIDL_LOCAL_APPDATA
--- 550,572 ----
yield subkey
+ def OECurrentUserKey():
+ """Returns the root registry key for current user Outlook
+ Express settings."""
+ if win32api is None:
+ # Delayed import error from top.
+ raise ImportError("pywin32 not installed")
+ key = "Identities"
+ reg = win32api.RegOpenKeyEx(win32con.HKEY_CURRENT_USER, key)
+ id = win32api.RegQueryValueEx(reg, "Default User ID")[0]
+ subKey = "%s\\%s\\Software\\Microsoft\\Outlook Express\\5.0" % (key, id)
+ return subKey
+
def OEStoreRoot():
"""Return the path to the Outlook Express Store Root.
Tested with Outlook Express 6.0 with Windows XP."""
! subKey = OECurrentUserKey()
! reg = win32api.RegOpenKeyEx(win32con.HKEY_CURRENT_USER, subKey)
! path = win32api.RegQueryValueEx(reg, "Store Root")[0]
# I can't find a shellcon to that is the same as %UserProfile%,
# so extract it from CSIDL_LOCAL_APPDATA
***************
*** 547,552 ****
parts = UserDirectory.split(os.sep)
UserProfile = os.sep.join(parts[:-2])
! raw = raw[0].replace("%UserProfile%", UserProfile)
! return raw
def OEAccountKeys(permission = None):
--- 575,586 ----
parts = UserDirectory.split(os.sep)
UserProfile = os.sep.join(parts[:-2])
! return path.replace("%UserProfile%", UserProfile)
!
! def OEDBXFilesList():
! """Returns a list of DBX files for current user."""
! path = OEStoreRoot()
! dbx_re = re.compile('.+\.dbx')
! dbxs = [f for f in os.listdir(path) if dbx_re.search(f) != None]
! return dbxs
def OEAccountKeys(permission = None):
***************
*** 680,690 ****
print_message = True
! if args:
! MAILBOX_DIR = args[0]
! else:
! MAILBOX_DIR = OEStoreRoot()
!
! files = [os.path.join(MAILBOX_DIR, file) for file in \
! os.listdir(MAILBOX_DIR) if os.path.splitext(file)[1] == '.dbx']
for file in files:
--- 714,719 ----
print_message = True
! MAILBOX_DIR = OEStoreRoot()
! files = [os.path.join(MAILBOX_DIR, f) for f in OEDBXFilesList()]
for file in files:
***************
*** 724,731 ****
print message.getText()
except Exception, (strerror):
print strerror
- dbx.close()
if __name__ == '__main__':
--- 753,761 ----
print message.getText()
+ dbx.close()
+
except Exception, (strerror):
print strerror
if __name__ == '__main__':
From anadelonbrin at users.sourceforge.net Tue Nov 23 04:31:52 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Tue Nov 23 04:31:56 2004
Subject: [Spambayes-checkins] spambayes/spambayes Version.py, 1.31.4.3,
1.31.4.4
Message-ID:
Update of /cvsroot/spambayes/spambayes/spambayes
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv11109/spambayes
Modified Files:
Tag: release_1_0-branch
Version.py
Log Message:
Bump some numbers for 1.0.1.
Index: Version.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/spambayes/Version.py,v
retrieving revision 1.31.4.3
retrieving revision 1.31.4.4
diff -C2 -d -r1.31.4.3 -r1.31.4.4
*** Version.py 9 Nov 2004 22:41:13 -0000 1.31.4.3
--- Version.py 23 Nov 2004 03:31:49 -0000 1.31.4.4
***************
*** 38,48 ****
# "description" strings below - they just need to increment
# so automated version checking works.
! "Version": 1.0,
! "BinaryVersion": 1.0,
"Description": "SpamBayes Outlook Addin",
! "Date": "July 2004",
! "Full Description": "%(Description)s Version 1.0 (%(Date)s)",
"Full Description Binary":
! "%(Description)s Binary Version 1.0 (%(Date)s)",
# Note this means we can change the download page later, and old
# versions will still go to the new page.
--- 38,48 ----
# "description" strings below - they just need to increment
# so automated version checking works.
! "Version": 1.0.1,
! "BinaryVersion": 1.0.1,
"Description": "SpamBayes Outlook Addin",
! "Date": "November 2004",
! "Full Description": "%(Description)s Version 1.0.1 (%(Date)s)",
"Full Description Binary":
! "%(Description)s Binary Version 1.0.1 (%(Date)s)",
# Note this means we can change the download page later, and old
# versions will still go to the new page.
***************
*** 53,63 ****
# Note these version numbers also currently don't appear in the
# "description" strings below - see above
! "Version": 1.0,
! "BinaryVersion": 1.0,
"Description": "SpamBayes POP3 Proxy",
! "Date": "July 2004",
! "Full Description": """%(Description)s Version 1.0 (%(Date)s)""",
"Full Description Binary":
! """%(Description)s Binary Version 1.0 (%(Date)s)""",
# Note this means we can change the download page later, and old
# versions will still go to the new page.
--- 53,63 ----
# Note these version numbers also currently don't appear in the
# "description" strings below - see above
! "Version": 1.0.1,
! "BinaryVersion": 1.0.1,
"Description": "SpamBayes POP3 Proxy",
! "Date": "November 2004",
! "Full Description": """%(Description)s Version 1.0.1 (%(Date)s)""",
"Full Description Binary":
! """%(Description)s Binary Version 1.0.1 (%(Date)s)""",
# Note this means we can change the download page later, and old
# versions will still go to the new page.
***************
*** 72,78 ****
},
"IMAP Filter" : {
! "Version": 0.4,
"Description": "SpamBayes IMAP Filter",
! "Date": "May 2004",
"Full Description": """%(Description)s Version %(Version)s (%(Date)s)""",
},
--- 72,78 ----
},
"IMAP Filter" : {
! "Version": 0.5,
"Description": "SpamBayes IMAP Filter",
! "Date": "November 2004",
"Full Description": """%(Description)s Version %(Version)s (%(Date)s)""",
},
From anadelonbrin at users.sourceforge.net Tue Nov 23 04:39:38 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Tue Nov 23 04:39:42 2004
Subject: [Spambayes-checkins] spambayes/spambayes Version.py, 1.31.4.4,
1.31.4.5
Message-ID:
Update of /cvsroot/spambayes/spambayes/spambayes
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv12491/spambayes
Modified Files:
Tag: release_1_0-branch
Version.py
Log Message:
Opps. Those are floats, not version numbers :) Use 1.01 not 1.0.1.
Index: Version.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/spambayes/Version.py,v
retrieving revision 1.31.4.4
retrieving revision 1.31.4.5
diff -C2 -d -r1.31.4.4 -r1.31.4.5
*** Version.py 23 Nov 2004 03:31:49 -0000 1.31.4.4
--- Version.py 23 Nov 2004 03:39:35 -0000 1.31.4.5
***************
*** 38,43 ****
# "description" strings below - they just need to increment
# so automated version checking works.
! "Version": 1.0.1,
! "BinaryVersion": 1.0.1,
"Description": "SpamBayes Outlook Addin",
"Date": "November 2004",
--- 38,43 ----
# "description" strings below - they just need to increment
# so automated version checking works.
! "Version": 1.01,
! "BinaryVersion": 1.01,
"Description": "SpamBayes Outlook Addin",
"Date": "November 2004",
***************
*** 53,58 ****
# Note these version numbers also currently don't appear in the
# "description" strings below - see above
! "Version": 1.0.1,
! "BinaryVersion": 1.0.1,
"Description": "SpamBayes POP3 Proxy",
"Date": "November 2004",
--- 53,58 ----
# Note these version numbers also currently don't appear in the
# "description" strings below - see above
! "Version": 1.01,
! "BinaryVersion": 1.01,
"Description": "SpamBayes POP3 Proxy",
"Date": "November 2004",
From anadelonbrin at users.sourceforge.net Tue Nov 23 05:03:03 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Tue Nov 23 05:03:06 2004
Subject: [Spambayes-checkins] spambayes/windows spambayes.iss, 1.15.4.4,
1.15.4.5
Message-ID:
Update of /cvsroot/spambayes/spambayes/windows
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv17103/windows
Modified Files:
Tag: release_1_0-branch
spambayes.iss
Log Message:
When backporting a fix I somehow cut off an "end;" as well. Fix that.
Index: spambayes.iss
===================================================================
RCS file: /cvsroot/spambayes/spambayes/windows/spambayes.iss,v
retrieving revision 1.15.4.4
retrieving revision 1.15.4.5
diff -C2 -d -r1.15.4.4 -r1.15.4.5
*** spambayes.iss 10 Nov 2004 22:15:37 -0000 1.15.4.4
--- spambayes.iss 23 Nov 2004 04:02:45 -0000 1.15.4.5
***************
*** 118,121 ****
--- 118,122 ----
'If this message persists, you may need to log off from Windows, and try again.'
Result := CheckNoAppMutex('InternetMailTransport', closeit);
+ end;
// And finally, the SpamBayes server
if Result then begin
From anadelonbrin at users.sourceforge.net Tue Nov 23 05:28:19 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Tue Nov 23 05:28:21 2004
Subject: [Spambayes-checkins] spambayes WHAT_IS_NEW.txt,1.35.4.2,1.35.4.3
Message-ID:
Update of /cvsroot/spambayes/spambayes
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv22188
Modified Files:
Tag: release_1_0-branch
WHAT_IS_NEW.txt
Log Message:
Update for 1.0.1
Index: WHAT_IS_NEW.txt
===================================================================
RCS file: /cvsroot/spambayes/spambayes/WHAT_IS_NEW.txt,v
retrieving revision 1.35.4.2
retrieving revision 1.35.4.3
diff -C2 -d -r1.35.4.2 -r1.35.4.3
*** WHAT_IS_NEW.txt 19 Jul 2004 03:21:23 -0000 1.35.4.2
--- WHAT_IS_NEW.txt 23 Nov 2004 04:28:16 -0000 1.35.4.3
***************
*** 4,86 ****
)
! Changes are broken into sections, so that it's easier for you to find the
! changes that are relevant to you.
! Any actions necessary to move to this release from the previous release are
! noted in the "Transition" section. You should also read the "Incompatible
! changes" section.
! New in 1.0
! ==========
! There have been no changes made between 1.0rc2 and 1.0. If you are
! upgrading from an earlier version, you may wish to read the WHAT_IS_NEW
! files from the versions that you skipped, as well.
! Deprecated Options
! ==================
! Since 1.0a9, SpamBayes has had a method of noting options that are
! deprecated and which will not be available in future releases (it is
! likely that options will only be deprecated for one release before being
! removed). Deprecated options will not be offered in the graphical
! interfaces (Outlook plugin and web interface), and will be listed in
! the "What's New" file (this file) for each release.
! Deprecated options have the same name as previously, but now begin with
! "x-" (so "extract_dow" is now "x-extract_dow"). You can continue to use
! the original name (e.g. "extract_dow") in your configuration file, but will
! receive warnings in your log file or console window. We recommend that you
! examine this output every time you upgrade SpamBayes to ensure that you are
! not using any newly deprecated options.
! Discussion regarding the deprecation of any particular option can be found
! in the spambayes-dev archives (at
! ).
! No options have been deprecated in this release.
! The following options are still deprecated and will be removed in the near
! future, unless testing indicates otherwise:
o [Tokenizer] generate_time_buckets
o [Tokenizer] extract_dow
o [Classifier] experimental_ham_spam_imbalance_adjustment
Experimental Options
====================
! Since 1.0a9, SpamBayes has had a method of noting options that are
! experimental and which may be removed or made permanent in future releases
! (many experimental options will only be experimental for one release before
! being removed or fully integrated). Experimental options are not exposed
! by the Outlook plugin, and are listed on a separate
! "Experimental Configuration" page in the web interface. The options will
! be listed in the "What's New" file (this file) for each release.
!
! Experimental options begin with "x-" (as do deprecated options). If you
! start using an experimental option and it later becomes permanent you can
! continue to use the "x-" name in your configuration file, but will
! receive warnings in your log file or console window. We recommend that you
! examine this output every time you upgrade SpamBayes to ensure that you are
! using the correct name for all options.
!
! Discussion of why experimental options and results from using them can be
! found in the spambayes-dev archives (at
! ). Ideally, we would like
! users to test these options out on their mail and let us know the results.
! This can be as simple as turning on the option and emailing
! spambayes@python.org with anecdotal results after a period of time, or the
! full testtools scripts can be used. For details about using these, please
! read the "README-DEVEL.txt" file that comes with the SpamBayes source
! archive.
! Experimental options are always turned off by default.
! No experimental options have been added in this release.
! Experimental options that are currently available (which we invite you to
! try out and report back your results) include:
o [Tokenizer] x-search_for_habeas_headers
o [Tokenizer] x-reduce_habeas_headers
--- 4,101 ----
)
! This is a bugfix release, so there are no new features, and you do not need
! to do anything to migrate to the new release (other than install it). There
! are no incompatible changes.
! New in 1.0.1
! ============
! o A bug with the import/export script (sb_dbexpimp.py) where merging into
! an existing database in the dbm format might lose training data has been
! fixed. Another minor bug with the script that caused an error to be
! printed when importing into a pickle file (although the import was still
! successful) has also been fixed.
! o The binary installer failed to offer to install a startup items shortcut,
! which is convenient for sb_server binary users. The installer will now
! do this.
+ o sb_server users who wish to use non-standard strings for classification
+ (e.g. "spambayes-ham" instead of "ham") can now use the "Notate To" and
+ "Notate Subject" options. This is particularly useful for Outlook
+ Express users.
! o Users of Windows extensions that automatically expand zip files (such
! as ZipMagic) should now be able to successfully use the binary versions
! of sb_server and the Outlook plug-in.
! o Checking whether a new version is available should now work for users
! who have entered proxy details in their configuration file.
! o Source code users can now use Python 2.4 with SpamBayes, although some
! DeprecationWarnings may still be generated.
! o The '-u' command line option for sb_server (letting you specify which
! port the web interface is served on) was broken, but is now fixed.
! o The tte.py (Train to Exhaustion) script now works with Python 2.3.
! o Various other minor fixes.
!
!
! Reported Bugs Fixed
! ===================
! The following bugs tracked via the SourceForge system were fixed:
! 981970, 990700, 941639, 986353, 790757, 944109, 959937, 903905,
! 1051081, 1036601, 922063, 831864, 1022848, 715248
!
! A URL containing the details of these bugs can be made by appending the
! bug number to this URL:
! http://sourceforge.net/tracker/index.php?func=detail&group_id=61702&atid=498103&aid=
!
! As this is a bugfix release, no feature requests or patches tracked via the
! SourceForge system were added.
!
!
! Deprecated Options
! ==================
!
! The following options are still deprecated and will be removed in the 1.1
! release:
o [Tokenizer] generate_time_buckets
o [Tokenizer] extract_dow
o [Classifier] experimental_ham_spam_imbalance_adjustment
+ We recommend that you cease using these options if you still are. If you
+ have any questions about the deprecated options, please email
+ spambayes@python.org and we will try and answer them.
+
Experimental Options
====================
! We would like to remind users about our set of experimental options. These
! are options which we believe may be of benefit to users, but have not been
! tested throughly enough to warrent full inclusion. We would greatly
! appreciate feedback from users willing to try these options out as to their
! perceived benefit. Both source code and binary users (including Outlook)
! can try these options out.
! To enable an experimental option, sb_server and sb_imapfilter users should
! click on the "Experimental Configuration" button on the main configuration
! page, and select the option(s) they wish to try.
! To enable an experimental option, Outlook plug-in users should open their
! "Data Directory" (via SpamBayes->SpamBayes Manager->Advanced->Show Data Folder)
! and open the "default_bayes_customize.ini" file in there (create one with
! Notepad if there isn't already one). In this file, add the options that
! you wish to try - for example, to enable searching for "Habeas" headers,
! add a line with "Tokenizer" and, below that, a line with
! "x-search_for_habeas_headers:True".
! If you have any queries about the experimental options, please email
! spambayes@python.org and we will try and answer them.
!
! Experimental options that are currently available include:
o [Tokenizer] x-search_for_habeas_headers
o [Tokenizer] x-reduce_habeas_headers
***************
*** 93,97 ****
and bigrams (pairs of words), but uses a 'tiling' scheme, where only
the set of unigrams and bigrams that have the strongest effect on
! the message are used.
o [URLRetriever] x-slurp_urls
--- 108,114 ----
and bigrams (pairs of words), but uses a 'tiling' scheme, where only
the set of unigrams and bigrams that have the strongest effect on
! the message are used. Note that this option will no longer be
! experimental (although still off by default) with 1.1 - we recommend
! that you try it out if you want higher accuracy.
o [URLRetriever] x-slurp_urls
From anadelonbrin at users.sourceforge.net Wed Nov 24 00:37:20 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Wed Nov 24 00:37:24 2004
Subject: [Spambayes-checkins] spambayes/scripts sb_server.py,1.28,1.29
Message-ID:
Update of /cvsroot/spambayes/spambayes/scripts
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv31243/scripts
Modified Files:
sb_server.py
Log Message:
Switch from using msg.asTokens to msg.tokenize.
Index: sb_server.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/scripts/sb_server.py,v
retrieving revision 1.28
retrieving revision 1.29
diff -C2 -d -r1.28 -r1.29
*** sb_server.py 9 Nov 2004 02:37:40 -0000 1.28
--- sb_server.py 23 Nov 2004 23:37:15 -0000 1.29
***************
*** 474,478 ****
msg.setId(state.getNewMessageName())
# Now find the spam disposition and add the header.
! (prob, clues) = state.bayes.spamprob(msg.asTokens(),\
evidence=True)
--- 474,478 ----
msg.setId(state.getNewMessageName())
# Now find the spam disposition and add the header.
! (prob, clues) = state.bayes.spamprob(msg.tokenize(),\
evidence=True)
From anadelonbrin at users.sourceforge.net Wed Nov 24 00:44:42 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Wed Nov 24 00:44:46 2004
Subject: [Spambayes-checkins] spambayes/utilities cleanarch.py,NONE,1.1
Message-ID:
Update of /cvsroot/spambayes/spambayes/utilities
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv591/utilities
Added Files:
cleanarch.py
Log Message:
I'm not sure why this is still in the root directory when everything else was moved
out. It really belongs in contrib or utilities, I think, so moving it to there (there
is no CVS history to preserve). Also adding .py to the end of the filename, since
it is a Python script.
--- NEW FILE: cleanarch.py ---
#! /usr/bin/env python
# Copyright (C) 2001,2002 by the Free Software Foundation, Inc.
#
# This program is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License
# as published by the Free Software Foundation; either version 2
# of the License, or (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
"""Clean up an .mbox archive file.
The archiver looks for Unix-From lines separating messages in an mbox archive
file. For compatibility, it specifically looks for lines that start with
"From " -- i.e. the letters capital-F, lowercase-r, o, m, space, ignoring
everything else on the line.
Normally, any lines that start "From " in the body of a message should be
escaped such that a > character is actually the first on a line. It is
possible though that body lines are not actually escaped. This script
attempts to fix these by doing a stricter test of the Unix-From lines. Any
lines that start "From " but do not pass this stricter test are escaped with a
> character.
Usage: cleanarch [options] < inputfile > outputfile
Options:
-s n
--status=n
Print a # character every n lines processed
-q / --quiet
Don't print changed line information to standard error.
-n / --dry-run
Don't actually output anything.
-h / --help
Print this message and exit
"""
import sys
import re
import getopt
import mailbox
cre = re.compile(mailbox.UnixMailbox._fromlinepattern)
# From RFC 2822, a header field name must contain only characters from 33-126
# inclusive, excluding colon. I.e. from oct 41 to oct 176 less oct 072. Must
# use re.match() so that it's anchored at the beginning of the line.
fre = re.compile(r'[\041-\071\073-\0176]+')
def usage(code, msg=''):
print >> sys.stderr, __doc__
if msg:
print >> sys.stderr, msg
sys.exit(code)
def escape_line(line, lineno, quiet, output):
if output:
sys.stdout.write('>' + line)
if not quiet:
print >> sys.stderr, '[%d]' % lineno, line[:-1]
def main():
try:
opts, args = getopt.getopt(
sys.argv[1:], 'hqns:',
['help', 'quiet', 'dry-run', 'status='])
except getopt.error, msg:
usage(1, msg)
quiet = 0
output = 1
status = -1
for opt, arg in opts:
if opt in ('-h', '--help'):
usage(0)
elif opt in ('-q', '--quiet'):
quiet = 1
elif opt in ('-n', '--dry-run'):
output = 0
elif opt in ('-s', '--status'):
try:
status = int(arg)
except ValueError:
usage(1, 'Bad status number: %s' % arg)
if args:
usage(1)
lineno = 0
statuscnt = 0
messages = 0
while 1:
lineno += 1
line = sys.stdin.readline()
if not line:
break
if line.startswith('From '):
if cre.match(line):
# This is a real Unix-From line. But it could be a message
# /about/ Unix-From lines, so as a second order test, make
# sure there's at least one RFC 2822 header following
nextline = sys.stdin.readline()
lineno += 1
if not nextline:
# It was the last line of the mbox, so it couldn't have
# been a Unix-From
escape_line(line, lineno, quiet, output)
break
fieldname = nextline.split(':', 1)
if len(fieldname) < 2 or not fre.match(nextline):
# The following line was not a header, so this wasn't a
# valid Unix-From
escape_line(line, lineno, quiet, output)
if output:
sys.stdout.write(nextline)
else:
# It's a valid Unix-From line
messages += 1
if output:
sys.stdout.write(line)
sys.stdout.write(nextline)
else:
# This is a bogus Unix-From line
escape_line(line, lineno, quiet, output)
elif output:
# Any old line
sys.stdout.write(line)
if status > 0 and (lineno % status) == 0:
sys.stderr.write('#')
statuscnt += 1
if statuscnt > 50:
print >> sys.stderr
statuscnt = 0
print >> sys.stderr, messages, 'messages found'
if __name__ == '__main__':
main()
From anadelonbrin at users.sourceforge.net Wed Nov 24 00:44:43 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Wed Nov 24 00:44:46 2004
Subject: [Spambayes-checkins] spambayes cleanarch,1.1,NONE
Message-ID:
Update of /cvsroot/spambayes/spambayes
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv591
Removed Files:
cleanarch
Log Message:
I'm not sure why this is still in the root directory when everything else was moved
out. It really belongs in contrib or utilities, I think, so moving it to there (there
is no CVS history to preserve). Also adding .py to the end of the filename, since
it is a Python script.
--- cleanarch DELETED ---
From anadelonbrin at users.sourceforge.net Thu Nov 25 07:36:25 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Thu Nov 25 07:36:28 2004
Subject: [Spambayes-checkins] website applications.ht,1.30,1.31
Message-ID:
Update of /cvsroot/spambayes/website
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv10220
Modified Files:
applications.ht
Log Message:
Including the most recent version number in this file was fairly pointless, and made
putting out a release more work, so it's gone.
Index: applications.ht
===================================================================
RCS file: /cvsroot/spambayes/website/applications.ht,v
retrieving revision 1.30
retrieving revision 1.31
diff -C2 -d -r1.30 -r1.31
*** applications.ht 9 Jul 2004 00:35:46 -0000 1.30
--- applications.ht 25 Nov 2004 06:36:21 -0000 1.31
***************
*** 44,48 ****
Availability
! Download the 1.0 source archive.
Alternatively, use CVS to get the code - go to the CVS page on the project's sourceforge site for more.
--- 44,48 ----
Availability
! Download the source archive.
Alternatively, use CVS to get the code - go to the CVS page on the project's sourceforge site for more.
***************
*** 63,67 ****
it.
Alternatively, to run from source, download the
! 1.0 source archive.
Alternatively, use CVS to get the code - go to the CVS page on the project's sourceforge site for more.
--- 63,67 ----
it.
Alternatively, to run from source, download the
! source archive.
Alternatively, use CVS to get the code - go to the CVS page on the project's sourceforge site for more.
***************
*** 78,82 ****
Availability
! Download the 1.0 source archive.
Alternatively, use CVS to get the code - go to the CVS page on the project's sourceforge site for more.
--- 78,82 ----
Availability
! Download the source archive.
Alternatively, use CVS to get the code - go to the CVS page on the project's sourceforge site for more.
***************
*** 94,98 ****
Availability
! Download the 1.0 source archive.
Alternatively, use CVS to get the code - go to the CVS page on the project's sourceforge site for more.
--- 94,98 ----
Availability
! Download the source archive.
Alternatively, use CVS to get the code - go to the CVS page on the project's sourceforge site for more.
***************
*** 112,115 ****
Availability
! Download the 1.0 source archive.
Alternatively, use CVS to get the code - go to the CVS page on the project's sourceforge site for more.
--- 112,115 ----
Availability
! Download the source archive.
Alternatively, use CVS to get the code - go to the CVS page on the project's sourceforge site for more.
From anadelonbrin at users.sourceforge.net Thu Nov 25 07:38:19 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Thu Nov 25 07:38:22 2004
Subject: [Spambayes-checkins]
website download.ht, 1.28, 1.29 index.ht, 1.35,
1.36 reply.txt, 1.15, 1.16 windows.ht, 1.41, 1.42
Message-ID:
Update of /cvsroot/spambayes/website
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv10525
Modified Files:
download.ht index.ht reply.txt windows.ht
Log Message:
Update for 1.0.1
Index: download.ht
===================================================================
RCS file: /cvsroot/spambayes/website/download.ht,v
retrieving revision 1.28
retrieving revision 1.29
diff -C2 -d -r1.28 -r1.29
*** download.ht 28 Sep 2004 07:38:08 -0000 1.28
--- download.ht 25 Nov 2004 06:38:15 -0000 1.29
***************
*** 3,16 ****
Author: SpamBayes
! Version 1.0 of the SpamBayes project is now available.
!
This is the final 1.0 release of SpamBayes. We expect it to prove to be
! quite stable and usable by most people. As time permits, we will endeavour
! to fix any remaining bugs and eventually a 1.0.1 release will be made.
! However, work can now begin on a 1.1 release, which may include many new
(possibly even exciting!) features. Feedback to
spambayes@python.org.
!
You may like to view the release notes
! or the
files that make up this release.
--- 3,17 ----
Author: SpamBayes
!
Version 1.0.1 of the SpamBayes project is now available.
!
This is a bugfix release - it is funtionally identical to 1.0, but includes
! fixes for a number of bugs. We expect it to prove to be quite stable and
! usable by most people. As time permits, we will endeavour
! to fix any remaining bugs and eventually a 1.0.2 release will be made.
! However, work has now begin on a 1.1 release, which may include many new
(possibly even exciting!) features. Feedback to
spambayes@python.org.
!
You may like to view the release notes
! or the
files that make up this release.
Index: index.ht
===================================================================
RCS file: /cvsroot/spambayes/website/index.ht,v
retrieving revision 1.35
retrieving revision 1.36
diff -C2 -d -r1.35 -r1.36
*** index.ht 9 Jul 2004 00:40:50 -0000 1.35
--- index.ht 25 Nov 2004 06:38:15 -0000 1.36
***************
*** 5,9 ****
News
! SpamBayes 1.0 is now available! (This includes both the source
archives and a Windows binary installer).
See the download page for more.
--- 5,9 ----
News
! SpamBayes 1.0.1 is now available! (This includes both the source
archives and a Windows binary installer).
See the download page for more.
Index: reply.txt
===================================================================
RCS file: /cvsroot/spambayes/website/reply.txt,v
retrieving revision 1.15
retrieving revision 1.16
diff -C2 -d -r1.15 -r1.16
*** reply.txt 9 Jul 2004 00:42:53 -0000 1.15
--- reply.txt 25 Nov 2004 06:38:15 -0000 1.16
***************
*** 48,55 ****
-----------------------------------------------
! Please ensure that you have the latest version. As of 2004-07-09, this is
! 1.0 for both the source and for the binary installer (for the Outlook
! plug-in and sb_server). If you are still having trouble, try looking at the
! bug reports that are currently open:
http://sf.net/tracker/?group_id=61702&atid=498103
--- 48,55 ----
-----------------------------------------------
! Please ensure that you have the latest version. As of November 25, 2004,
! this is 1.0.1 for both the source and for the binary installer (for the
! Outlook plug-in and sb_server). If you are still having trouble, try
! looking at the bug reports that are currently open:
http://sf.net/tracker/?group_id=61702&atid=498103
Index: windows.ht
===================================================================
RCS file: /cvsroot/spambayes/website/windows.ht,v
retrieving revision 1.41
retrieving revision 1.42
diff -C2 -d -r1.41 -r1.42
*** windows.ht 28 Sep 2004 07:38:09 -0000 1.41
--- windows.ht 25 Nov 2004 06:38:15 -0000 1.42
***************
*** 11,17 ****
Latest Release
! The latest release is 1.0 - see the
! release notes
! or download the installation program.
--- 11,17 ----
Latest Release
! The latest release is 1.0.1 - see the
! release notes
! or download the installation program.
***************
*** 74,78 ****
Windows users using other mail clients and retrieving mail via POP3
can now download the same
!
installation program and use it to install a binary version of
sb_server, including a tray application.
--- 74,78 ----
Windows users using other mail clients and retrieving mail via POP3
can now download the same
!
installation program and use it to install a binary version of
sb_server, including a tray application.
From anadelonbrin at users.sourceforge.net Thu Nov 25 07:39:07 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Thu Nov 25 07:39:10 2004
Subject: [Spambayes-checkins] spambayes README-DEVEL.txt,1.14,1.15
Message-ID:
Update of /cvsroot/spambayes/spambayes
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv10661
Modified Files:
README-DEVEL.txt
Log Message:
Putting out a release got a tiny bit simpler.
Index: README-DEVEL.txt
===================================================================
RCS file: /cvsroot/spambayes/spambayes/README-DEVEL.txt,v
retrieving revision 1.14
retrieving revision 1.15
diff -C2 -d -r1.14 -r1.15
*** README-DEVEL.txt 30 Sep 2004 02:01:26 -0000 1.14
--- README-DEVEL.txt 25 Nov 2004 06:39:04 -0000 1.15
***************
*** 505,509 ****
o Now commit spambayes/__init__.py and tag the whole checkout - see the
existing tag names for the tag name format.
! o Update the website News, Download, Windows and Application sections.
o Update reply.txt in the website repository as needed (it specifies the
latest version). Then let Tim, Barry, Tony, or Skip know that they need to
--- 505,509 ----
o Now commit spambayes/__init__.py and tag the whole checkout - see the
existing tag names for the tag name format.
! o Update the website News, Download and Windows sections.
o Update reply.txt in the website repository as needed (it specifies the
latest version). Then let Tim, Barry, Tony, or Skip know that they need to
From montanaro at users.sourceforge.net Thu Nov 25 16:12:06 2004
From: montanaro at users.sourceforge.net (Skip Montanaro)
Date: Thu Nov 25 16:12:08 2004
Subject: [Spambayes-checkins] spambayes/spambayes __init__.py,1.11,1.12
Message-ID:
Update of /cvsroot/spambayes/spambayes/spambayes
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv17182
Modified Files:
__init__.py
Log Message:
uprev
Index: __init__.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/spambayes/__init__.py,v
retrieving revision 1.11
retrieving revision 1.12
diff -C2 -d -r1.11 -r1.12
*** __init__.py 5 May 2004 00:38:22 -0000 1.11
--- __init__.py 25 Nov 2004 15:12:03 -0000 1.12
***************
*** 1,3 ****
# package marker.
! __version__ = '1.0rc1'
--- 1,3 ----
# package marker.
! __version__ = '1.0.1'
From anadelonbrin at users.sourceforge.net Fri Nov 26 00:19:07 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Fri Nov 26 00:19:10 2004
Subject: [Spambayes-checkins] spambayes/spambayes message.py,1.58,1.59
Message-ID:
Update of /cvsroot/spambayes/spambayes/spambayes
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv22222/spambayes
Modified Files:
message.py
Log Message:
Use cPickle when possible.
Handle loading an Outlook messageinfo database.
Add a __len__ function to the messageinfo databases.
The messageinfo db now needs messages to have a GetDBKey function to determine the
key to store the message under. For our message classes, this is just the same as
getId().
Index: message.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/spambayes/message.py,v
retrieving revision 1.58
retrieving revision 1.59
diff -C2 -d -r1.58 -r1.59
*** message.py 22 Nov 2004 23:34:43 -0000 1.58
--- message.py 25 Nov 2004 23:19:04 -0000 1.59
***************
*** 81,84 ****
--- 81,85 ----
import os
+ import sys
import types
import math
***************
*** 86,90 ****
import errno
import shelve
! import pickle
import traceback
--- 87,94 ----
import errno
import shelve
! try:
! import cPickle as pickle
! except ImportError:
! import pickle
import traceback
***************
*** 110,125 ****
self.db_name = db_name
def load_msg(self, msg):
if self.db is not None:
try:
! attributes = self.db[msg.getId()]
except KeyError:
! pass
else:
if not isinstance(attributes, types.ListType):
! # Old-style message info db, which only
! # handles storing 'c' and 't'.
! (msg.c, msg.t) = attributes
! return
for att, val in attributes:
setattr(msg, att, val)
--- 114,152 ----
self.db_name = db_name
+ def __len__(self):
+ return len(self.db)
+
def load_msg(self, msg):
if self.db is not None:
try:
! try:
! attributes = self.db[msg.getDBKey()]
! except pickle.UnpicklingError:
! # The old-style Outlook message info db didn't use
! # shelve, so get it straight from the dbm.
! if hasattr(self, "dbm"):
! attributes = self.dbm[msg.getDBKey()]
! else:
! raise
except KeyError:
! # Set to None, as it's not there.
! for att in msg.stored_attributes:
! setattr(msg, att, None)
else:
if not isinstance(attributes, types.ListType):
! # Old-style message info db
! if isinstance(attributes, types.TupleType):
! # sb_server/sb_imapfilter, which only handled
! # storing 'c' and 't'.
! (msg.c, msg.t) = attributes
! return
! elif isinstance(attributes, types.StringTypes):
! # Outlook plug-in, which only handled storing 't',
! # and did it as a string.
! msg.t = {"0" : False, "1" : True}[attributes]
! return
! else:
! print >> sys.stderr, "Unknown message info type"
! sys.exit(1)
for att, val in attributes:
setattr(msg, att, val)
***************
*** 130,139 ****
for att in msg.stored_attributes:
attributes.append((att, getattr(msg, att)))
! self.db[msg.getId()] = attributes
self.store()
def remove_msg(self, msg):
if self.db is not None:
! del self.db[msg.getId()]
self.store()
--- 157,166 ----
for att in msg.stored_attributes:
attributes.append((att, getattr(msg, att)))
! self.db[msg.getDBKey()] = attributes
self.store()
def remove_msg(self, msg):
if self.db is not None:
! del self.db[msg.getDBKey()]
self.store()
***************
*** 241,244 ****
--- 268,272 ----
self.message_info_db = open_storage(nm, typ)
self.stored_attributes = ['c', 't',]
+ self.getDBKey = self.getId
self.id = None
self.c = None
From anadelonbrin at users.sourceforge.net Fri Nov 26 00:27:02 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Fri Nov 26 00:27:05 2004
Subject: [Spambayes-checkins] spambayes/Outlook2000 addin.py, 1.138,
1.139 manager.py, 1.98, 1.99 msgstore.py, 1.88, 1.89 tester.py,
1.23, 1.24 train.py, 1.39, 1.40
Message-ID:
Update of /cvsroot/spambayes/spambayes/Outlook2000
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv23924/Outlook2000
Modified Files:
addin.py manager.py msgstore.py tester.py train.py
Log Message:
Stop using the deprecated access to the bayes database and use the manager.classifier_data
directly.
Switch to using a spambayes.message.MessageInfo database rather than an Outlook specific
one. This allows us to store more data than just the 'trained' status that we currently
store, and also reduces code duplication and simplifies the Outlook code a little
bit.
I have tested this as much as possible, and run it for a couple of days here and it
appears to work. The old database should still be usable (both old style and new
style data can be in the same database) and work, so it should be seemless. The
change is more-or-less the same as when the sb_server/sb_imapfilter database swapped
to storing more than just 'c' and 't', and there weren't problems there, so fingers
crossed...
Index: addin.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/Outlook2000/addin.py,v
retrieving revision 1.138
retrieving revision 1.139
diff -C2 -d -r1.138 -r1.139
*** addin.py 17 Nov 2004 00:01:06 -0000 1.138
--- addin.py 25 Nov 2004 23:26:57 -0000 1.139
***************
*** 125,130 ****
# If the message has been trained on, we certainly have seen it before.
import train
! if train.been_trained_as_ham(msgstore_message, manager.classifier_data) or \
! train.been_trained_as_spam(msgstore_message, manager.classifier_data):
return True
# I considered checking if the "save spam score" option is enabled - but
--- 125,131 ----
# If the message has been trained on, we certainly have seen it before.
import train
! manager.classifier_data.message_db.load_msg(msgstore_message)
! if train.been_trained_as_ham(msgstore_message) or \
! train.been_trained_as_spam(msgstore_message):
return True
# I considered checking if the "save spam score" option is enabled - but
***************
*** 149,155 ****
else:
print "already was trained as good"
! assert train.been_trained_as_ham(msgstore_message, manager.classifier_data)
if save_db:
! manager.SaveBayesPostIncrementalTrain()
def TrainAsSpam(msgstore_message, manager, rescore = True, save_db = True):
--- 150,157 ----
else:
print "already was trained as good"
! manager.classifier_data.message_db.load_msg(msgstore_message)
! assert train.been_trained_as_ham(msgstore_message)
if save_db:
! manager.classifier_data.SavePostIncrementalTrain()
def TrainAsSpam(msgstore_message, manager, rescore = True, save_db = True):
***************
*** 167,174 ****
else:
print "already was trained as spam"
! assert train.been_trained_as_spam(msgstore_message, manager.classifier_data)
# And if the DB can save itself incrementally, do it now
if save_db:
! manager.SaveBayesPostIncrementalTrain()
# Function to filter a message - note it is a msgstore msg, not an
--- 169,177 ----
else:
print "already was trained as spam"
! manager.classifier_data.message_db.load_msg(msgstore_message)
! assert train.been_trained_as_spam(msgstore_message)
# And if the DB can save itself incrementally, do it now
if save_db:
! manager.classifier_data.SavePostIncrementalTrain()
# Function to filter a message - note it is a msgstore msg, not an
***************
*** 190,194 ****
if manager.config.training.train_recovered_spam:
import train
! if train.been_trained_as_spam(msgstore_message, manager.classifier_data):
need_train = True
else:
--- 193,198 ----
if manager.config.training.train_recovered_spam:
import train
! manager.classifier_data.message_db.load_msg(msgstore_message)
! if train.been_trained_as_spam(msgstore_message):
need_train = True
else:
***************
*** 200,204 ****
# 'Unsure', then this event is unlikely to be the user
# re-classifying (and in fact it may simply be the Outlook
! # rules moving the item.
need_train = manager.config.filter.unsure_threshold < prop * 100
--- 204,208 ----
# 'Unsure', then this event is unlikely to be the user
# re-classifying (and in fact it may simply be the Outlook
! # rules moving the item).
need_train = manager.config.filter.unsure_threshold < prop * 100
***************
*** 422,426 ****
# previously trained, try and optimize.
import train
! if train.been_trained_as_ham(msgstore_message, self.manager.classifier_data):
need_train = True
else:
--- 426,431 ----
# previously trained, try and optimize.
import train
! self.manager.classifier_data.message_db.load_msg(msgstore_message)
! if train.been_trained_as_ham(msgstore_message):
need_train = True
else:
***************
*** 441,444 ****
--- 446,450 ----
if msgstore_message is None:
return
+ mgr.classifier_data.message_db.load_msg(msgstore_message)
item = msgstore_message.GetOutlookItem()
***************
*** 479,486 ****
# Report whether this message has been trained or not.
push("
\n")
- trained_as = mgr.classifier_data.message_db.get(msgstore_message.searchkey)
push("This message has %sbeen trained%s." % \
! {'0' : ("", " as ham"), '1' : ("", " as spam"), None : ("not ", "")}
! [trained_as])
# Format the clues.
push("%s Significant Tokens
\n" % len(clues))
--- 485,491 ----
# Report whether this message has been trained or not.
push("
\n")
push("This message has %sbeen trained%s." % \
! {False : ("", " as ham"), True : ("", " as spam"),
! None : ("not ", "")}[msgstore_message.t])
# Format the clues.
push("%s Significant Tokens
\n" % len(clues))
***************
*** 707,711 ****
# but we are smart enough to know we have already done it.
# And if the DB can save itself incrementally, do it now
! self.manager.SaveBayesPostIncrementalTrain()
SetWaitCursor(0)
--- 712,716 ----
# but we are smart enough to know we have already done it.
# And if the DB can save itself incrementally, do it now
! self.manager.classifier_data.SavePostIncrementalTrain()
SetWaitCursor(0)
***************
*** 774,778 ****
# but we are smart enough to know we have already done it.
# And if the DB can save itself incrementally, do it now
! self.manager.SaveBayesPostIncrementalTrain()
SetWaitCursor(0)
--- 779,783 ----
# but we are smart enough to know we have already done it.
# And if the DB can save itself incrementally, do it now
! self.manager.classifier_data.SavePostIncrementalTrain()
SetWaitCursor(0)
Index: manager.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/Outlook2000/manager.py,v
retrieving revision 1.98
retrieving revision 1.99
diff -C2 -d -r1.98 -r1.99
*** manager.py 2 Nov 2004 21:33:46 -0000 1.98
--- manager.py 25 Nov 2004 23:26:58 -0000 1.99
***************
*** 118,122 ****
def import_core_spambayes_stuff(ini_filenames):
! global bayes_classifier, bayes_tokenize, bayes_storage, bayes_options
if "spambayes.Options" in sys.modules:
# The only thing we are worried about here is spambayes.Options
--- 118,123 ----
def import_core_spambayes_stuff(ini_filenames):
! global bayes_classifier, bayes_tokenize, bayes_storage, bayes_options, \
! bayes_message
if "spambayes.Options" in sys.modules:
# The only thing we are worried about here is spambayes.Options
***************
*** 144,150 ****
--- 145,153 ----
from spambayes.tokenizer import tokenize
from spambayes import storage
+ from spambayes import message
bayes_classifier = classifier
bayes_tokenize = tokenize
bayes_storage = storage
+ bayes_message = message
assert "spambayes.Options" in sys.modules, \
"Expected 'spambayes.Options' to be loaded here"
***************
*** 170,174 ****
# Base class for our "storage manager" - we choose between the pickle
# and DB versions at runtime. As our bayes uses spambayes.storage,
! # our base class can share common bayes loading code.
class BasicStorageManager:
db_extension = None # for pychecker - overwritten by subclass
--- 173,179 ----
# Base class for our "storage manager" - we choose between the pickle
# and DB versions at runtime. As our bayes uses spambayes.storage,
! # our base class can share common bayes loading code, and we use
! # spambayes.message, so the base class can share common message info
! # code, too.
class BasicStorageManager:
db_extension = None # for pychecker - overwritten by subclass
***************
*** 186,205 ****
bayes.store()
def open_bayes(self):
! raise NotImplementedError
def close_bayes(self, bayes):
bayes.close()
class PickleStorageManager(BasicStorageManager):
db_extension = ".pck"
! def open_bayes(self):
! return bayes_storage.PickledClassifier(self.bayes_filename)
! def open_mdb(self):
! return cPickle.load(open(self.mdb_filename, 'rb'))
def new_mdb(self):
return {}
- def store_mdb(self, mdb):
- SavePickle(mdb, self.mdb_filename)
- def close_mdb(self, mdb):
- pass
def is_incremental(self):
return False # False means we always save the entire DB
--- 191,209 ----
bayes.store()
def open_bayes(self):
! return bayes_storage.open_storage(self.bayes_filename, self.klass)
def close_bayes(self, bayes):
bayes.close()
+ def open_mdb(self):
+ return bayes_message.open_storage(self.mdb_filename, self.klass)
+ def store_mdb(self, mdb):
+ mdb.store()
+ def close_mdb(self, mdb):
+ mdb.close()
class PickleStorageManager(BasicStorageManager):
db_extension = ".pck"
! klass = "pickle"
def new_mdb(self):
return {}
def is_incremental(self):
return False # False means we always save the entire DB
***************
*** 207,217 ****
class DBStorageManager(BasicStorageManager):
db_extension = ".db"
! def open_bayes(self):
! # bsddb doesn't handle unicode filenames yet :(
! fname = self.bayes_filename.encode(filesystem_encoding)
! return bayes_storage.DBDictClassifier(fname)
! def open_mdb(self):
! fname = self.mdb_filename.encode(filesystem_encoding)
! return bsddb.hashopen(fname)
def new_mdb(self):
try:
--- 211,220 ----
class DBStorageManager(BasicStorageManager):
db_extension = ".db"
! klass = "dbm"
! def __init__(self, bayes_base_name, mdb_base_name):
! self.bayes_filename = bayes_base_name.encode(filesystem_encoding) + \
! self.db_extension
! self.mdb_filename = mdb_base_name.encode(filesystem_encoding) + \
! self.db_extension
def new_mdb(self):
try:
***************
*** 220,227 ****
if e.errno != errno.ENOENT: raise
return self.open_mdb()
- def store_mdb(self, mdb):
- mdb.sync()
- def close_mdb(self, mdb):
- mdb.close()
def is_incremental(self):
return True # True means only changed records get actually written
--- 223,226 ----
***************
*** 424,432 ****
db_manager = ManagerClass(bayes_base, mdb_base)
self.classifier_data = ClassifierData(db_manager, self)
- self.LoadBayes()
- self.stats = oastats.Stats(self.config, self.data_directory)
-
- # "old" bayes functions - new code should use "classifier_data" directly
- def LoadBayes(self):
try:
self.classifier_data.Load()
--- 423,426 ----
***************
*** 434,445 ****
self.ReportFatalStartupError("Failed to load bayes database")
self.classifier_data.InitNew()
! def InitNewBayes(self):
! self.classifier_data.InitNew()
! def SaveBayes(self):
! self.classifier_data.Save()
! def SaveBayesPostIncrementalTrain(self):
! self.classifier_data.SavePostIncrementalTrain()
! # Logging - this too should be somewhere else.
def LogDebug(self, level, *args):
if self.verbose >= level:
--- 428,434 ----
self.ReportFatalStartupError("Failed to load bayes database")
self.classifier_data.InitNew()
+ self.stats = oastats.Stats(self.config, self.data_directory)
! # Logging - this should be somewhere else.
def LogDebug(self, level, *args):
if self.verbose >= level:
Index: msgstore.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/Outlook2000/msgstore.py,v
retrieving revision 1.88
retrieving revision 1.89
diff -C2 -d -r1.88 -r1.89
*** msgstore.py 2 Nov 2004 21:34:56 -0000 1.88
--- msgstore.py 25 Nov 2004 23:26:58 -0000 1.89
***************
*** 807,810 ****
--- 807,817 ----
self.dirty = False
+ # For use with the spambayes.message messageinfo database.
+ self.stored_attributes = ['t',]
+
+ def getDBKey(self):
+ # Long lived search key.
+ return self.searchkey
+
def __repr__(self):
if self.id is None:
Index: tester.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/Outlook2000/tester.py,v
retrieving revision 1.23
retrieving revision 1.24
diff -C2 -d -r1.23 -r1.24
*** tester.py 24 Dec 2003 04:08:38 -0000 1.23
--- tester.py 25 Nov 2004 23:26:58 -0000 1.24
***************
*** 258,265 ****
# Now move the message back to the inbox - it should get trained.
store_msg = driver.manager.message_store.GetMessage(spam_msg)
import train
! if train.been_trained_as_ham(store_msg, driver.manager.classifier_data):
TestFailed("This new spam message should not have been trained as ham yet")
! if train.been_trained_as_spam(store_msg, driver.manager.classifier_data):
TestFailed("This new spam message should not have been trained as spam yet")
spam_msg.Move(folder_watch)
--- 258,266 ----
# Now move the message back to the inbox - it should get trained.
store_msg = driver.manager.message_store.GetMessage(spam_msg)
+ driver.manager.classifier_data.message_db.load_msg(store_msg)
import train
! if train.been_trained_as_ham(store_msg):
TestFailed("This new spam message should not have been trained as ham yet")
! if train.been_trained_as_spam(store_msg):
TestFailed("This new spam message should not have been trained as spam yet")
spam_msg.Move(folder_watch)
***************
*** 269,272 ****
--- 270,274 ----
TestFailed("The message appears to have been filtered out of the watch folder")
store_msg = driver.manager.message_store.GetMessage(spam_msg)
+ driver.manager.classifier_data.message_db.load_msg(store_msg)
need_untrain = True
try:
***************
*** 275,281 ****
if nham+1 != bayes.nham:
TestFailed("There was not one more ham messages after a re-train")
! if train.been_trained_as_spam(store_msg, driver.manager.classifier_data):
TestFailed("This new spam message should not have been trained as spam yet")
! if not train.been_trained_as_ham(store_msg, driver.manager.classifier_data):
TestFailed("This new spam message should have been trained as ham now")
# word infos should have one extra ham
--- 277,283 ----
if nham+1 != bayes.nham:
TestFailed("There was not one more ham messages after a re-train")
! if train.been_trained_as_spam(store_msg):
TestFailed("This new spam message should not have been trained as spam yet")
! if not train.been_trained_as_ham(store_msg):
TestFailed("This new spam message should have been trained as ham now")
# word infos should have one extra ham
***************
*** 289,299 ****
TestFailed("Could not find the message in the Spam folder")
store_msg = driver.manager.message_store.GetMessage(spam_msg)
if nspam +1 != bayes.nspam:
TestFailed("There should be one more spam now")
if nham != bayes.nham:
TestFailed("There should be the same number of hams again")
! if not train.been_trained_as_spam(store_msg, driver.manager.classifier_data):
TestFailed("This new spam message should have been trained as spam by now")
! if train.been_trained_as_ham(store_msg, driver.manager.classifier_data):
TestFailed("This new spam message should have been un-trained as ham")
# word infos should have one extra spam, no extra ham
--- 291,302 ----
TestFailed("Could not find the message in the Spam folder")
store_msg = driver.manager.message_store.GetMessage(spam_msg)
+ driver.manager.classifier_data.message_db.load_msg(store_msg)
if nspam +1 != bayes.nspam:
TestFailed("There should be one more spam now")
if nham != bayes.nham:
TestFailed("There should be the same number of hams again")
! if not train.been_trained_as_spam(store_msg):
TestFailed("This new spam message should have been trained as spam by now")
! if train.been_trained_as_ham(store_msg):
TestFailed("This new spam message should have been un-trained as ham")
# word infos should have one extra spam, no extra ham
***************
*** 308,312 ****
TestFailed("Could not find the message in the Unsure folder")
store_msg = driver.manager.message_store.GetMessage(spam_msg)
! if not train.been_trained_as_spam(store_msg, driver.manager.classifier_data):
TestFailed("Message was not identified as Spam after moving")
--- 311,316 ----
TestFailed("Could not find the message in the Unsure folder")
store_msg = driver.manager.message_store.GetMessage(spam_msg)
! driver.manager.classifier_data.message_db.load_msg(store_msg)
! if not train.been_trained_as_spam(store_msg):
TestFailed("Message was not identified as Spam after moving")
***************
*** 316,323 ****
# Now undo the damage we did.
was_spam = train.untrain_message(store_msg, driver.manager.classifier_data)
if not was_spam:
TestFailed("Untraining this message did not indicate it was spam")
! if train.been_trained_as_spam(store_msg, driver.manager.classifier_data) or \
! train.been_trained_as_ham(store_msg, driver.manager.classifier_data):
TestFailed("Untraining this message kept it has ham/spam")
need_untrain = False
--- 320,328 ----
# Now undo the damage we did.
was_spam = train.untrain_message(store_msg, driver.manager.classifier_data)
+ driver.manager.classifier_data.message_db.load_msg(store_msg)
if not was_spam:
TestFailed("Untraining this message did not indicate it was spam")
! if train.been_trained_as_spam(store_msg) or \
! train.been_trained_as_ham(store_msg):
TestFailed("Untraining this message kept it has ham/spam")
need_untrain = False
Index: train.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/Outlook2000/train.py,v
retrieving revision 1.39
retrieving revision 1.40
diff -C2 -d -r1.39 -r1.40
*** train.py 2 Nov 2004 21:36:54 -0000 1.39
--- train.py 25 Nov 2004 23:26:58 -0000 1.40
***************
*** 5,8 ****
--- 5,9 ----
# Copyright PSF, license under the PSF license
+ import sys
import traceback
from win32com.mapi import mapi
***************
*** 17,29 ****
# Note our Message Database uses PR_SEARCH_KEY, *not* PR_ENTRYID, as the
# latter changes after a Move operation - see msgstore.py
! def been_trained_as_ham(msg, cdata):
! if not cdata.message_db.has_key(msg.searchkey):
return False
! return cdata.message_db[msg.searchkey]=='0'
! def been_trained_as_spam(msg, cdata):
! if not cdata.message_db.has_key(msg.searchkey):
return False
! return cdata.message_db[msg.searchkey]=='1'
def train_message(msg, is_spam, cdata):
--- 18,30 ----
# Note our Message Database uses PR_SEARCH_KEY, *not* PR_ENTRYID, as the
# latter changes after a Move operation - see msgstore.py
! def been_trained_as_ham(msg):
! if msg.t is None:
return False
! return msg.t == False
! def been_trained_as_spam(msg):
! if msg.t is None:
return False
! return msg.t == True
def train_message(msg, is_spam, cdata):
***************
*** 36,43 ****
from spambayes.tokenizer import tokenize
! if not cdata.message_db.has_key(msg.searchkey):
! was_spam = None
! else:
! was_spam = cdata.message_db[msg.searchkey]=='1'
if was_spam == is_spam:
return False # already correctly classified
--- 37,42 ----
from spambayes.tokenizer import tokenize
! cdata.message_db.load_msg(msg)
! was_spam = msg.t
if was_spam == is_spam:
return False # already correctly classified
***************
*** 51,55 ****
# Learn the correct classification.
cdata.bayes.learn(tokenize(stream), is_spam)
! cdata.message_db[msg.searchkey] = ['0', '1'][is_spam]
cdata.dirty = True
return True
--- 50,55 ----
# Learn the correct classification.
cdata.bayes.learn(tokenize(stream), is_spam)
! msg.t = is_spam
! cdata.message_db.store_msg(msg)
cdata.dirty = True
return True
***************
*** 62,75 ****
from spambayes.tokenizer import tokenize
stream = msg.GetEmailPackageObject()
! if been_trained_as_spam(msg, cdata):
! assert not been_trained_as_ham(msg, cdata), "Can't have been both!"
cdata.bayes.unlearn(tokenize(stream), True)
! del cdata.message_db[msg.searchkey]
cdata.dirty = True
return True
! if been_trained_as_ham(msg, cdata):
! assert not been_trained_as_spam(msg, cdata), "Can't have been both!"
cdata.bayes.unlearn(tokenize(stream), False)
! del cdata.message_db[msg.searchkey]
cdata.dirty = True
return False
--- 62,76 ----
from spambayes.tokenizer import tokenize
stream = msg.GetEmailPackageObject()
! cdata.message_db.load_msg(msg)
! if been_trained_as_spam(msg):
! assert not been_trained_as_ham(msg), "Can't have been both!"
cdata.bayes.unlearn(tokenize(stream), True)
! cdata.message_db.remove_msg(msg)
cdata.dirty = True
return True
! if been_trained_as_ham(msg):
! assert not been_trained_as_spam(msg), "Can't have been both!"
cdata.bayes.unlearn(tokenize(stream), False)
! cdata.message_db.remove_msg(msg)
cdata.dirty = True
return False
From anadelonbrin at users.sourceforge.net Fri Nov 26 04:06:47 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Fri Nov 26 04:06:49 2004
Subject: [Spambayes-checkins] spambayes/Outlook2000 README.txt,1.12,1.13
Message-ID:
Update of /cvsroot/spambayes/spambayes/Outlook2000
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv543/Outlook2000
Modified Files:
README.txt
Log Message:
Encourage people to mail the list, not Mark personally.
Index: README.txt
===================================================================
RCS file: /cvsroot/spambayes/spambayes/Outlook2000/README.txt,v
retrieving revision 1.12
retrieving revision 1.13
diff -C2 -d -r1.12 -r1.13
*** README.txt 3 Oct 2003 05:23:15 -0000 1.12
--- README.txt 26 Nov 2004 03:06:43 -0000 1.13
***************
*** 35,39 ****
labyrinth of Outlook preference dialogs.) If this happens and you have
the Python exception that caused the failure (via the tracing mentioned
! above) please send it to Mark.
To unregister the addin, execute "addin.py --unregister", then optionally
--- 35,39 ----
labyrinth of Outlook preference dialogs.) If this happens and you have
the Python exception that caused the failure (via the tracing mentioned
! above) please send it to spambayes@python.org.
To unregister the addin, execute "addin.py --unregister", then optionally
From anadelonbrin at users.sourceforge.net Fri Nov 26 04:11:46 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Fri Nov 26 04:11:49 2004
Subject: [Spambayes-checkins] spambayes/Outlook2000 addin.py, 1.139,
1.140 filter.py, 1.39, 1.40 msgstore.py, 1.89, 1.90
Message-ID:
Update of /cvsroot/spambayes/spambayes/Outlook2000
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv1423/Outlook2000
Modified Files:
addin.py filter.py msgstore.py
Log Message:
Save the current folder when doing a "delete as spam", because the message may not
be in the folder it was when it was filtered, or it may not have been filtered, but
we do really want to recover it to wherever it was last.
Save the original folder data in the messageinfo database as well. This does mean
it'll end up being somewhat larger - if this is a problem, then it could be an option.
However, it does mean that recovery now goes to the right place, even with an IMAP
store. So closes [ 1071319 ] Outlook plug in for IMAP boxes
Index: addin.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/Outlook2000/addin.py,v
retrieving revision 1.139
retrieving revision 1.140
diff -C2 -d -r1.139 -r1.140
*** addin.py 25 Nov 2004 23:26:57 -0000 1.139
--- addin.py 26 Nov 2004 03:11:43 -0000 1.140
***************
*** 692,695 ****
--- 692,698 ----
self.manager.stats.RecordManualClassification(False,
self.manager.score(msgstore_message))
+ # Record the original folder, in case this message is not where
+ # it was after filtering, or has never been filtered.
+ msgstore_message.RememberMessageCurrentFolder()
# Must train before moving, else we lose the message!
subject = msgstore_message.GetSubject()
***************
*** 747,750 ****
--- 750,754 ----
try:
subject = msgstore_message.GetSubject()
+ self.manager.classifier_data.message_db.load_msg(msgstore_message)
restore_folder = msgstore_message.GetRememberedFolder()
if restore_folder is None or \
Index: filter.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/Outlook2000/filter.py,v
retrieving revision 1.39
retrieving revision 1.40
diff -C2 -d -r1.39 -r1.40
*** filter.py 2 Nov 2004 21:33:46 -0000 1.39
--- filter.py 26 Nov 2004 03:11:43 -0000 1.40
***************
*** 44,47 ****
--- 44,48 ----
if all_actions:
msg.RememberMessageCurrentFolder()
+ mgr.classifier_data.message_db.store_msg(msg)
msg.Save()
break
Index: msgstore.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/Outlook2000/msgstore.py,v
retrieving revision 1.89
retrieving revision 1.90
diff -C2 -d -r1.89 -r1.90
*** msgstore.py 25 Nov 2004 23:26:58 -0000 1.89
--- msgstore.py 26 Nov 2004 03:11:43 -0000 1.90
***************
*** 808,812 ****
# For use with the spambayes.message messageinfo database.
! self.stored_attributes = ['t',]
def getDBKey(self):
--- 808,814 ----
# For use with the spambayes.message messageinfo database.
! self.stored_attributes = ['t', 'original_folder']
! self.t = None
! self.original_folder = None
def getDBKey(self):
***************
*** 1244,1247 ****
--- 1246,1252 ----
try:
folder = self.GetFolder()
+ # Also save this information in our messageinfo database, which
+ # means that restoring should work even with IMAP.
+ self.original_folder = folder.id[0], folder.id[1]
props = ( (mapi.PS_PUBLIC_STRINGS, "SpamBayesOriginalFolderStoreID"),
(mapi.PS_PUBLIC_STRINGS, "SpamBayesOriginalFolderID")
***************
*** 1274,1277 ****
--- 1279,1285 ----
return self.msgstore.GetFolder(folder_id)
except:
+ # Try to get it from the message info database, if possible
+ if self.original_folder:
+ return self.msgstore.GetFolder(self.original_folder)
print "Error locating origin of message", self
return None
From anadelonbrin at users.sourceforge.net Mon Nov 29 00:38:19 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Mon Nov 29 00:38:24 2004
Subject: [Spambayes-checkins] spambayes/spambayes ProxyUI.py,1.52,1.53
Message-ID:
Update of /cvsroot/spambayes/spambayes/spambayes
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv29550/spambayes
Modified Files:
ProxyUI.py
Log Message:
Fix error reported by Hatuka*nezumi:
Subject lines are not cgi.escape()d in the web interface, which might cause errors.
Index: ProxyUI.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/spambayes/ProxyUI.py,v
retrieving revision 1.52
retrieving revision 1.53
diff -C2 -d -r1.52 -r1.53
*** ProxyUI.py 9 Nov 2004 02:37:41 -0000 1.52
--- ProxyUI.py 28 Nov 2004 23:38:17 -0000 1.53
***************
*** 340,344 ****
else:
h = self.html.reviewRow.headerValue.clone()
! h.text = text
row.optionalHeadersValues += h
--- 340,344 ----
else:
h = self.html.reviewRow.headerValue.clone()
! h.text = cgi.escape(text)
row.optionalHeadersValues += h
From anadelonbrin at users.sourceforge.net Mon Nov 29 01:11:50 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Mon Nov 29 01:11:53 2004
Subject: [Spambayes-checkins]
spambayes/spambayes/test test_sb_server.py, 1.2, 1.3
Message-ID:
Update of /cvsroot/spambayes/spambayes/spambayes/test
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv6041/spambayes/test
Modified Files:
test_sb_server.py
Log Message:
Fix error reported by Hatuka*nezumi:
Messages that did not have the required \r?\n\r?\n separator would just pass through
spambayes unproxied. Change this so that they are (everything will be a header,
though, so it may not proxy well, and might generate and exception header - but that's
better than just letting it through.
Index: test_sb_server.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/spambayes/test/test_sb_server.py,v
retrieving revision 1.2
retrieving revision 1.3
diff -C2 -d -r1.2 -r1.3
*** test_sb_server.py 9 Nov 2004 02:37:41 -0000 1.2
--- test_sb_server.py 29 Nov 2004 00:11:47 -0000 1.3
***************
*** 77,80 ****
--- 77,83 ----
"""
+ malformed1 = """From: ta-meyer@ihug.co.nz
+ Subject: No body, and no separator"""
+
import asyncore
import socket
***************
*** 123,127 ****
Dibbler.BrighterAsyncChat.__init__(self, map=socketMap)
Dibbler.BrighterAsyncChat.set_socket(self, clientSocket, socketMap)
! self.maildrop = [spam1, good1]
self.set_terminator('\r\n')
self.okCommands = ['USER', 'PASS', 'APOP', 'NOOP', 'SLOW',
--- 126,130 ----
Dibbler.BrighterAsyncChat.__init__(self, map=socketMap)
Dibbler.BrighterAsyncChat.set_socket(self, clientSocket, socketMap)
! self.maildrop = [spam1, good1, malformed1]
self.set_terminator('\r\n')
self.okCommands = ['USER', 'PASS', 'APOP', 'NOOP', 'SLOW',
***************
*** 219,223 ****
if 0 < number <= len(self.maildrop):
message = self.maildrop[number-1]
! headers, body = message.split('\n\n', 1)
bodyLines = body.split('\n')[:maxLines]
message = headers + '\r\n\r\n' + '\n'.join(bodyLines)
--- 222,229 ----
if 0 < number <= len(self.maildrop):
message = self.maildrop[number-1]
! try:
! headers, body = message.split('\n\n', 1)
! except ValueError:
! return "+OK\r\n%s\r\n.\r\n" % message
bodyLines = body.split('\n')[:maxLines]
message = headers + '\r\n\r\n' + '\n'.join(bodyLines)
***************
*** 314,318 ****
response = proxy.recv(100)
count, totalSize = map(int, response.split()[1:3])
! assert count == 2
# Loop through the messages ensuring that they have judgement
--- 320,324 ----
response = proxy.recv(100)
count, totalSize = map(int, response.split()[1:3])
! assert count == 3
# Loop through the messages ensuring that they have judgement
From anadelonbrin at users.sourceforge.net Mon Nov 29 01:11:50 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Mon Nov 29 01:11:54 2004
Subject: [Spambayes-checkins] spambayes/scripts sb_server.py,1.29,1.30
Message-ID:
Update of /cvsroot/spambayes/spambayes/scripts
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv6041/scripts
Modified Files:
sb_server.py
Log Message:
Fix error reported by Hatuka*nezumi:
Messages that did not have the required \r?\n\r?\n separator would just pass through
spambayes unproxied. Change this so that they are (everything will be a header,
though, so it may not proxy well, and might generate and exception header - but that's
better than just letting it through.
Index: sb_server.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/scripts/sb_server.py,v
retrieving revision 1.29
retrieving revision 1.30
diff -C2 -d -r1.29 -r1.30
*** sb_server.py 23 Nov 2004 23:37:15 -0000 1.29
--- sb_server.py 29 Nov 2004 00:11:46 -0000 1.30
***************
*** 457,566 ****
"""Adds the judgement header based on the raw headers and body
of the message."""
! # Use '\n\r?\n' to detect the end of the headers in case of
! # broken emails that don't use the proper line separators.
! if re.search(r'\n\r?\n', response):
! # Remove the trailing .\r\n before passing to the email parser.
! # Thanks to Scott Schlesier for this fix.
! terminatingDotPresent = (response[-4:] == '\n.\r\n')
! if terminatingDotPresent:
! response = response[:-3]
! # Break off the first line, which will be '+OK'.
! ok, messageText = response.split('\n', 1)
! try:
! msg = email.message_from_string(messageText,
! _class=spambayes.message.SBHeaderMessage)
! msg.setId(state.getNewMessageName())
! # Now find the spam disposition and add the header.
! (prob, clues) = state.bayes.spamprob(msg.tokenize(),\
! evidence=True)
! msg.addSBHeaders(prob, clues)
! # Check for "RETR" or "TOP N 99999999" - fetchmail without
! # the 'fetchall' option uses the latter to retrieve messages.
! if (command == 'RETR' or
! (command == 'TOP' and
! len(args) == 2 and args[1] == '99999999')):
! cls = msg.GetClassification()
! if cls == options["Headers", "header_ham_string"]:
! state.numHams += 1
! elif cls == options["Headers", "header_spam_string"]:
! state.numSpams += 1
! else:
! state.numUnsure += 1
! # Suppress caching of "Precedence: bulk" or
! # "Precedence: list" ham if the options say so.
! isSuppressedBulkHam = \
! (cls == options["Headers", "header_ham_string"] and
! options["Storage", "no_cache_bulk_ham"] and
! msg.get('precedence') in ['bulk', 'list'])
! # Suppress large messages if the options say so.
! size_limit = options["Storage",
! "no_cache_large_messages"]
! isTooBig = size_limit > 0 and \
! len(messageText) > size_limit
! # Cache the message. Don't pollute the cache with test
! # messages or suppressed bulk ham.
! if (not state.isTest and
! options["Storage", "cache_messages"] and
! not isSuppressedBulkHam and not isTooBig):
! # Write the message into the Unknown cache.
! makeMessage = state.unknownCorpus.makeMessage
! message = makeMessage(msg.getId(), msg.as_string())
! state.unknownCorpus.addMessage(message)
! # We'll return the message with the headers added. We take
! # all the headers from the SBHeaderMessage, but take the body
! # directly from the POP3 conversation, because the
! # SBHeaderMessage might have "fixed" a partial message by
! # appending a closing boundary separator. Remember we can
! # be dealing with partial message here because of the timeout
! # code in onServerLine.
! headers = []
! for name, value in msg.items():
! header = "%s: %s" % (name, value)
! headers.append(re.sub(r'\r?\n', '\r\n', header))
body = re.split(r'\n\r?\n', messageText, 1)[1]
messageText = "\r\n".join(headers) + "\r\n\r\n" + body
! except:
! # Something nasty happened while parsing or classifying -
! # report the exception in a hand-appended header and recover.
! # This is one case where an unqualified 'except' is OK, 'cos
! # anything's better than destroying people's email...
! stream = cStringIO.StringIO()
! traceback.print_exc(None, stream)
! details = stream.getvalue()
!
! # Build the header. This will strip leading whitespace from
! # the lines, so we add a leading dot to maintain indentation.
! detailLines = details.strip().split('\n')
! dottedDetails = '\n.'.join(detailLines)
! headerName = 'X-Spambayes-Exception'
! header = Header(dottedDetails, header_name=headerName)
! # Insert the header, converting email.Header's '\n' line
! # breaks to POP3's '\r\n'.
! headers, body = re.split(r'\n\r?\n', messageText, 1)
! header = re.sub(r'\r?\n', '\r\n', str(header))
! headers += "\n%s: %s\r\n\r\n" % (headerName, header)
! messageText = headers + body
! # Print the exception and a traceback.
! print >>sys.stderr, details
! # Restore the +OK and the POP3 .\r\n terminator if there was one.
! retval = ok + "\n" + messageText
! if terminatingDotPresent:
! retval += '.\r\n'
! return retval
! else:
! # Must be an error response.
! return response
def onTop(self, command, args, response):
--- 457,578 ----
"""Adds the judgement header based on the raw headers and body
of the message."""
! # Previously, we used '\n\r?\n' to detect the end of the headers in
! # case of broken emails that don't use the proper line separators,
! # and if we couldn't find it, then we assumed that the response was
! # and error response and passed it unfiltered. However, if the
! # message doesn't contain the separator (malformed mail), then this
! # would mean the message was passed straight through the proxy.
! # Since all the content is then in the headers, this probably
! # doesn't do a spammer much good, but, just in case, we now just
! # check for "+OK" and assume no error response will be given if
! # that is (which seems reasonable).
! # Remove the trailing .\r\n before passing to the email parser.
! # Thanks to Scott Schlesier for this fix.
! terminatingDotPresent = (response[-4:] == '\n.\r\n')
! if terminatingDotPresent:
! response = response[:-3]
! # Break off the first line, which will be '+OK'.
! ok, messageText = response.split('\n', 1)
! if ok.strip().upper() != "+OK":
! # Must be an error response. Return unproxied.
! return response
! try:
! msg = email.message_from_string(messageText,
! _class=spambayes.message.SBHeaderMessage)
! msg.setId(state.getNewMessageName())
! # Now find the spam disposition and add the header.
! (prob, clues) = state.bayes.spamprob(msg.tokenize(),\
! evidence=True)
! msg.addSBHeaders(prob, clues)
! # Check for "RETR" or "TOP N 99999999" - fetchmail without
! # the 'fetchall' option uses the latter to retrieve messages.
! if (command == 'RETR' or
! (command == 'TOP' and
! len(args) == 2 and args[1] == '99999999')):
! cls = msg.GetClassification()
! if cls == options["Headers", "header_ham_string"]:
! state.numHams += 1
! elif cls == options["Headers", "header_spam_string"]:
! state.numSpams += 1
! else:
! state.numUnsure += 1
! # Suppress caching of "Precedence: bulk" or
! # "Precedence: list" ham if the options say so.
! isSuppressedBulkHam = \
! (cls == options["Headers", "header_ham_string"] and
! options["Storage", "no_cache_bulk_ham"] and
! msg.get('precedence') in ['bulk', 'list'])
! # Suppress large messages if the options say so.
! size_limit = options["Storage",
! "no_cache_large_messages"]
! isTooBig = size_limit > 0 and \
! len(messageText) > size_limit
! # Cache the message. Don't pollute the cache with test
! # messages or suppressed bulk ham.
! if (not state.isTest and
! options["Storage", "cache_messages"] and
! not isSuppressedBulkHam and not isTooBig):
! # Write the message into the Unknown cache.
! makeMessage = state.unknownCorpus.makeMessage
! message = makeMessage(msg.getId(), msg.as_string())
! state.unknownCorpus.addMessage(message)
! # We'll return the message with the headers added. We take
! # all the headers from the SBHeaderMessage, but take the body
! # directly from the POP3 conversation, because the
! # SBHeaderMessage might have "fixed" a partial message by
! # appending a closing boundary separator. Remember we can
! # be dealing with partial message here because of the timeout
! # code in onServerLine.
! headers = []
! for name, value in msg.items():
! header = "%s: %s" % (name, value)
! headers.append(re.sub(r'\r?\n', '\r\n', header))
! try:
body = re.split(r'\n\r?\n', messageText, 1)[1]
+ except IndexError:
+ # No separator, so no body. Bad message, but proxy it
+ # through anyway (adding the missing separator).
+ messageText = "\r\n".join(headers) + "\r\n\r\n"
+ else:
messageText = "\r\n".join(headers) + "\r\n\r\n" + body
! except:
! # Something nasty happened while parsing or classifying -
! # report the exception in a hand-appended header and recover.
! # This is one case where an unqualified 'except' is OK, 'cos
! # anything's better than destroying people's email...
! stream = cStringIO.StringIO()
! traceback.print_exc(None, stream)
! details = stream.getvalue()
! # Build the header. This will strip leading whitespace from
! # the lines, so we add a leading dot to maintain indentation.
! detailLines = details.strip().split('\n')
! dottedDetails = '\n.'.join(detailLines)
! headerName = 'X-Spambayes-Exception'
! header = Header(dottedDetails, header_name=headerName)
! # Insert the header, converting email.Header's '\n' line
! # breaks to POP3's '\r\n'.
! headers, body = re.split(r'\n\r?\n', messageText, 1)
! header = re.sub(r'\r?\n', '\r\n', str(header))
! headers += "\n%s: %s\r\n\r\n" % (headerName, header)
! messageText = headers + body
! # Print the exception and a traceback.
! print >>sys.stderr, details
! # Restore the +OK and the POP3 .\r\n terminator if there was one.
! retval = ok + "\n" + messageText
! if terminatingDotPresent:
! retval += '.\r\n'
! return retval
def onTop(self, command, args, response):
***************
*** 656,659 ****
--- 668,672 ----
if options["globals", "verbose"]:
self.logFile = open('_pop3proxy.log', 'wb', 0)
+
self.servers = []
self.proxyPorts = []
From anadelonbrin at users.sourceforge.net Mon Nov 29 01:18:02 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Mon Nov 29 01:18:04 2004
Subject: [Spambayes-checkins] spambayes/spambayes message.py,1.59,1.60
Message-ID:
Update of /cvsroot/spambayes/spambayes/spambayes
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv7566/spambayes
Modified Files:
message.py
Log Message:
Handle message not having a proper separator in insert_exception_header.
Change sb_server to use the centralised insert_exception_header code.
Index: message.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/spambayes/message.py,v
retrieving revision 1.59
retrieving revision 1.60
diff -C2 -d -r1.59 -r1.60
*** message.py 25 Nov 2004 23:19:04 -0000 1.59
--- message.py 29 Nov 2004 00:17:59 -0000 1.60
***************
*** 539,546 ****
# otherwise we might keep doing this message over and over again.
# We also ensure that the line endings are /r/n as RFC822 requires.
! headers, body = re.split(r'\n\r?\n', string_msg, 1)
header = re.sub(r'\r?\n', '\r\n', str(header))
! headers += "\n%s: %s\r\n" % \
! (headerName, header)
if msg_id:
headers += "%s: %s\r\n" % \
--- 539,550 ----
# otherwise we might keep doing this message over and over again.
# We also ensure that the line endings are /r/n as RFC822 requires.
! try:
! headers, body = re.split(r'\n\r?\n', string_msg, 1)
! except ValueError:
! # No body - this is a bad message!
! headers = string_msg
! body = ""
header = re.sub(r'\r?\n', '\r\n', str(header))
! headers += "\n%s: %s\r\n" % (headerName, header)
if msg_id:
headers += "%s: %s\r\n" % \
From anadelonbrin at users.sourceforge.net Mon Nov 29 01:18:02 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Mon Nov 29 01:18:05 2004
Subject: [Spambayes-checkins] spambayes/scripts sb_server.py,1.30,1.31
Message-ID:
Update of /cvsroot/spambayes/spambayes/scripts
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv7566/scripts
Modified Files:
sb_server.py
Log Message:
Handle message not having a proper separator in insert_exception_header.
Change sb_server to use the centralised insert_exception_header code.
Index: sb_server.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/scripts/sb_server.py,v
retrieving revision 1.30
retrieving revision 1.31
diff -C2 -d -r1.30 -r1.31
*** sb_server.py 29 Nov 2004 00:11:46 -0000 1.30
--- sb_server.py 29 Nov 2004 00:17:58 -0000 1.31
***************
*** 549,569 ****
# This is one case where an unqualified 'except' is OK, 'cos
# anything's better than destroying people's email...
! stream = cStringIO.StringIO()
! traceback.print_exc(None, stream)
! details = stream.getvalue()
!
! # Build the header. This will strip leading whitespace from
! # the lines, so we add a leading dot to maintain indentation.
! detailLines = details.strip().split('\n')
! dottedDetails = '\n.'.join(detailLines)
! headerName = 'X-Spambayes-Exception'
! header = Header(dottedDetails, header_name=headerName)
!
! # Insert the header, converting email.Header's '\n' line
! # breaks to POP3's '\r\n'.
! headers, body = re.split(r'\n\r?\n', messageText, 1)
! header = re.sub(r'\r?\n', '\r\n', str(header))
! headers += "\n%s: %s\r\n\r\n" % (headerName, header)
! messageText = headers + body
# Print the exception and a traceback.
--- 549,554 ----
# This is one case where an unqualified 'except' is OK, 'cos
# anything's better than destroying people's email...
! messageText, details = spambayes.message.\
! insert_exception_header(messageText)
# Print the exception and a traceback.
From anadelonbrin at users.sourceforge.net Tue Nov 30 07:00:42 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Tue Nov 30 07:00:45 2004
Subject: [Spambayes-checkins] website/sigs - New directory
Message-ID:
Update of /cvsroot/spambayes/website/sigs
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv17161/sigs
Log Message:
Directory /cvsroot/spambayes/website/sigs added to the repository
From anadelonbrin at users.sourceforge.net Tue Nov 30 07:02:30 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Tue Nov 30 07:02:32 2004
Subject: [Spambayes-checkins] website/sigs Makefile, NONE,
1.1 spambayes-1.0.1.exe.asc, NONE,
1.1 spambayes-1.0.1.tar.gz.asc, NONE,
1.1 spambayes-1.0.1.zip.asc, NONE, 1.1
Message-ID:
Update of /cvsroot/spambayes/website/sigs
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv17435/sigs
Added Files:
Makefile spambayes-1.0.1.exe.asc spambayes-1.0.1.tar.gz.asc
spambayes-1.0.1.zip.asc
Log Message:
Put OpenPGP sig's for released files on the website in this directory.
I think that by default the file permissions for new files will be wrong, so they
have to be manually corrected - maybe this should be built into the makefile?
--- NEW FILE: Makefile ---
include ../scripts/make.rules
ROOT_DIR = ..
ROOT_OFFSET = sigs
--- NEW FILE: spambayes-1.0.1.exe.asc ---
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (MingW32)
iD8DBQBBrAMJVcUzCvI/cyoRArlUAKDyvXLO8fs2eTdO/bpLOFKIfmhaxACfaiWJ
47p7KL2Ov6kKnrCUbpfwXdM=
=Hb0X
-----END PGP SIGNATURE-----
--- NEW FILE: spambayes-1.0.1.tar.gz.asc ---
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (MingW32)
iD8DBQBBrAMRVcUzCvI/cyoRAowgAKC34ZWloV85YEnLZIwdik9HHBO2ogCgsSBE
nGX0sYG6yd7R9Lni+2r+5tc=
=E//e
-----END PGP SIGNATURE-----
--- NEW FILE: spambayes-1.0.1.zip.asc ---
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (MingW32)
iD8DBQBBrAMZVcUzCvI/cyoRAsX8AJ9duCjCjYUjJGYaJvB8JjXRzWcwhQCdFvUK
qU2vENJocIzrNoFW0ZFLQck=
=I45v
-----END PGP SIGNATURE-----
From anadelonbrin at users.sourceforge.net Tue Nov 30 07:03:45 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Tue Nov 30 07:03:48 2004
Subject: [Spambayes-checkins] website download.ht,1.29,1.30
Message-ID:
Update of /cvsroot/spambayes/website
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv17643
Modified Files:
download.ht
Log Message:
Update to include OpenPGP sigs, MD5 checksums and file sizes.
(Heavily based on the Python download pages).
Index: download.ht
===================================================================
RCS file: /cvsroot/spambayes/website/download.ht,v
retrieving revision 1.29
retrieving revision 1.30
diff -C2 -d -r1.29 -r1.30
*** download.ht 25 Nov 2004 06:38:15 -0000 1.29
--- download.ht 30 Nov 2004 06:03:35 -0000 1.30
***************
*** 42,45 ****
--- 42,91 ----
+ Files, MD5 checksums, signatures and sizes
+ The signatures below were generated with
+ GnuPG using release manager
+ Tony Meyer's
+ public key, which has a key id of F23F732A.
+
+
+
+
+ You can import the release manager public keys by either downloading the
+ public key file and then running
+
+ % gpg --import TonyMeyer.asc
+
+ or by grabbing the key directly from the keyserver network by running
+ this command:
+
+ % gpg --recv-keys F23F732A
+
+ To verify the authenticity of the download, grab both the file(s) and the
+ signature(s) (above) and then run this command:
+
+ % gpg --verify spambayes-1.0.1.exe.asc
+
+ Note that you must use the name of the signature file, and you should
+ use the one that's appropriate to the download you're verifying.
+
+ These instructions are geared to GnuPG and command-line weenies.
+ Suggestions are welcome for other OpenPGP applications.
+
CVS Access
The code is currently available from sourceforge's CVS server -
From anadelonbrin at users.sourceforge.net Tue Nov 30 07:04:27 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Tue Nov 30 07:04:29 2004
Subject: [Spambayes-checkins] website Makefile,1.18,1.19
Message-ID:
Update of /cvsroot/spambayes/website
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv17817
Modified Files:
Makefile
Log Message:
Also sync the sigs directory.
Index: Makefile
===================================================================
RCS file: /cvsroot/spambayes/website/Makefile,v
retrieving revision 1.18
retrieving revision 1.19
diff -C2 -d -r1.18 -r1.19
*** Makefile 14 Feb 2004 00:44:38 -0000 1.18
--- Makefile 30 Nov 2004 06:04:24 -0000 1.19
***************
*** 39,46 ****
-cd apps; $(MAKE) install
-cd download ; $(MAKE) install
subdirs:
cd apps; $(MAKE)
! cd download ; $(MAKE)
!
--- 39,47 ----
-cd apps; $(MAKE) install
-cd download ; $(MAKE) install
+ -cd sigs; $(MAKE) install
subdirs:
cd apps; $(MAKE)
! cd download ; $(MAKE)
! cd sigs ; $(MAKE)
From anadelonbrin at users.sourceforge.net Tue Nov 30 07:05:48 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Tue Nov 30 07:05:51 2004
Subject: [Spambayes-checkins] spambayes README-DEVEL.txt,1.15,1.16
Message-ID:
Update of /cvsroot/spambayes/spambayes
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv18034
Modified Files:
README-DEVEL.txt
Log Message:
Add instructions for generating OpenPGP sigs, MD5 checksums and file sizes.
Add instructions for modifying __init__.py's __version__ after the release, based
on the discussion on spambayes@python.org.
Index: README-DEVEL.txt
===================================================================
RCS file: /cvsroot/spambayes/spambayes/README-DEVEL.txt,v
retrieving revision 1.15
retrieving revision 1.16
diff -C2 -d -r1.15 -r1.16
*** README-DEVEL.txt 25 Nov 2004 06:39:04 -0000 1.15
--- README-DEVEL.txt 30 Nov 2004 06:05:45 -0000 1.16
***************
*** 505,513 ****
o Now commit spambayes/__init__.py and tag the whole checkout - see the
existing tag names for the tag name format.
o Update the website News, Download and Windows sections.
o Update reply.txt in the website repository as needed (it specifies the
! latest version). Then let Tim, Barry, Tony, or Skip know that they need to
! update the autoresponder.
!
Then announce the release on the mailing lists and watch the bug reports
roll in. 8-)
--- 505,536 ----
o Now commit spambayes/__init__.py and tag the whole checkout - see the
existing tag names for the tag name format.
+ o Create MD5 checksums for the files, and update download.ht with these.
+ Tony uses wxChecksums (http://wxchecksums.sourceforge.net) for this,
+ but you could just do
+ >>> import md5
+ >>> print md5.md5(file("spambayes-1.0.1.exe", "rb").read()).hexdigest()
+ o Calculate the sizes of the files, and update download.ht with these.
+ o Create OpenPGP/PGP signatures for the files. Using GnuPG:
+ % gpg -sab spambayes-1.0.1.zip
+ % gpg -sab spambayes-1.0.1.tar.gz
+ % gpg -sab spambayes-1.0.1.exe
+ Put the created *.asc files in the "sigs" directory of the website.
+ o If your public key isn't already linked to on the Download page, put
+ it there.
o Update the website News, Download and Windows sections.
o Update reply.txt in the website repository as needed (it specifies the
! latest version). Then let Tim, Barry, Tony, or Skip know that they need
! to update the autoresponder.
! o Run "make install version" in the website directory to push the new
! version file, so that "Check for new version" works.
! o Add '+' to the end of spambayes/__init__.py's __version__, to
! differentiate CVS users, and check this change in. After a number of
! changes have been checked in, this can be incremented and have "a0"
! added to the end. For example, with a 1.1 release:
! [before the release process] '1.1rc1'
! [during the release process] '1.1'
! [after the release process] '1.1+'
! [later] '1.2a0'
!
Then announce the release on the mailing lists and watch the bug reports
roll in. 8-)
From anadelonbrin at users.sourceforge.net Tue Nov 30 22:45:06 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Tue Nov 30 22:45:09 2004
Subject: [Spambayes-checkins] spambayes/spambayes __init__.py, 1.11.4.3,
1.11.4.4
Message-ID:
Update of /cvsroot/spambayes/spambayes/spambayes
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv11647/spambayes
Modified Files:
Tag: release_1_0-branch
__init__.py
Log Message:
Update to match the new versioning scheme.
Index: __init__.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/spambayes/__init__.py,v
retrieving revision 1.11.4.3
retrieving revision 1.11.4.4
diff -C2 -d -r1.11.4.3 -r1.11.4.4
*** __init__.py 22 Nov 2004 23:39:12 -0000 1.11.4.3
--- __init__.py 30 Nov 2004 21:44:55 -0000 1.11.4.4
***************
*** 1,3 ****
# package marker.
! __version__ = '1.0.1'
--- 1,3 ----
# package marker.
! __version__ = '1.0.1+'
From anadelonbrin at users.sourceforge.net Tue Nov 30 22:49:22 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Tue Nov 30 22:49:26 2004
Subject: [Spambayes-checkins] spambayes/spambayes __init__.py,1.12,1.13
Message-ID:
Update of /cvsroot/spambayes/spambayes/spambayes
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv12772/spambayes
Modified Files:
__init__.py
Log Message:
Update to reflect where things really are at the moment.
Index: __init__.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/spambayes/__init__.py,v
retrieving revision 1.12
retrieving revision 1.13
diff -C2 -d -r1.12 -r1.13
*** __init__.py 25 Nov 2004 15:12:03 -0000 1.12
--- __init__.py 30 Nov 2004 21:49:19 -0000 1.13
***************
*** 1,3 ****
# package marker.
! __version__ = '1.0.1'
--- 1,3 ----
# package marker.
! __version__ = '1.1a0'