From anadelonbrin at users.sourceforge.net Tue Nov 2 07:33:26 2004 From: anadelonbrin at users.sourceforge.net (Tony Meyer) Date: Tue Nov 2 07:33:29 2004 Subject: [Spambayes-checkins] spambayes/spambayes Stats.py,1.6,1.7 Message-ID: Update of /cvsroot/spambayes/spambayes/spambayes In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv28614/spambayes Modified Files: Stats.py Log Message: Improve the web interface statistics. This is the format that was devised by Mark Moraes and Kenny Pitt on spambayes-dev quite some time ago (but was never checked in - maybe we were feature frozen then?). This is my own code, though, not the patch that Mark submitted, which added unnecessary counters. At some point I'll copy across the code that Outlook has that lets the number of decimal places for the percentages be specified. The Outlook stats could be changed to look more like this (or the damn code could be centralised), too, maybe, except that there isn't much room in the dialog for a lot of text. Maybe Kenny has a patch for that? (A spambayes-dev message indicated that he might). The new stats should look something like this: SpamBayes has classified a total of 1223 messages: 827 ham (67.6% of total) 333 spam (27.2% of total) 63 unsure (5.2% of total) 1125 messages were classified correctly (92.0% of total) 35 messages were classified incorrectly (2.9% of total) 0 false positives (0.0% of total) 35 false negatives (2.9% of total) 6 unsures trained as ham (9.5% of unsures) 56 unsures trained as spam (88.9% of unsures) 1 unsure was not trained (1.6% of unsures) A total of 760 messages have been trained: 346 ham (98.3% ham, 1.7% unsure, 0.0% false positives) 414 spam (78.0% spam, 13.5% unsure, 8.5% false negatives) Index: Stats.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/spambayes/Stats.py,v retrieving revision 1.6 retrieving revision 1.7 diff -C2 -d -r1.6 -r1.7 *** Stats.py 15 Feb 2004 02:15:51 -0000 1.6 --- Stats.py 2 Nov 2004 06:33:23 -0000 1.7 *************** *** 25,34 **** """ ! # This module is part of the spambayes project, which is Copyright 2002-3 # The Python Software Foundation and is covered by the Python Software # Foundation license. __author__ = "Tony Meyer " ! __credits__ = "Mark Hammond, all the spambayes folk." from spambayes.message import msginfoDB --- 25,42 ---- """ ! # This module is part of the spambayes project, which is Copyright 2002-4 # The Python Software Foundation and is covered by the Python Software # Foundation license. __author__ = "Tony Meyer " ! __credits__ = "Kenny Pitt, Mark Hammond, all the spambayes folk." ! ! try: ! True, False ! except NameError: ! # Maintain compatibility with Python 2.2 ! True, False = 1, 0 ! ! import types from spambayes.message import msginfoDB *************** *** 62,81 **** msginfoDB._getState(m) if m.c == 's': self.cls_spam += 1 ! if m.t == 0: self.fp += 1 elif m.c == 'h': self.cls_ham += 1 ! if m.t == 1: self.fn += 1 elif m.c == 'u': self.cls_unsure += 1 ! if m.t == 0: self.trn_unsure_ham += 1 ! elif m.t == 1: self.trn_unsure_spam += 1 ! if m.t == 1: self.trn_spam += 1 ! elif m.t == 0: self.trn_ham += 1 --- 70,94 ---- msginfoDB._getState(m) if m.c == 's': + # Classified as spam. self.cls_spam += 1 ! if m.t == False: ! # False positive (classified as spam, trained as ham) self.fp += 1 elif m.c == 'h': + # Classified as ham. self.cls_ham += 1 ! if m.t == True: ! # False negative (classified as ham, trained as spam) self.fn += 1 elif m.c == 'u': + # Classified as unsure. self.cls_unsure += 1 ! if m.t == False: self.trn_unsure_ham += 1 ! elif m.t == True: self.trn_unsure_spam += 1 ! if m.t == True: self.trn_spam += 1 ! elif m.t == False: self.trn_ham += 1 *************** *** 85,128 **** chunks = [] push = chunks.append ! perc_ham = 100.0 * self.cls_ham / self.total ! perc_spam = 100.0 * self.cls_spam / self.total ! perc_unsure = 100.0 * self.cls_unsure / self.total format_dict = { ! 'perc_spam': perc_spam, ! 'perc_ham': perc_ham, ! 'perc_unsure': perc_unsure, ! 'num_seen': self.total } format_dict.update(self.__dict__) # Figure out plurals ! for num, key in [(self.total, "sp1"), (self.trn_ham, "sp2"), ! (self.trn_spam, "sp3"), ! (self.trn_unsure_ham, "sp4"), ! (self.fp, "sp5"), (self.fn, "sp6")]: ! if num == 1: format_dict[key] = '' else: format_dict[key] = 's' ! for num, key in [(self.fp, "wp1"), (self.fn, "wp2")]: ! if num == 1: ! format_dict[key] = 'was a' else: format_dict[key] = 'were' ! push("SpamBayes has processed %(num_seen)d message%(sp1)s - " \ ! "%(cls_ham)d (%(perc_ham).0f%%) good, " \ ! "%(cls_spam)d (%(perc_spam).0f%%) spam " \ ! "and %(cls_unsure)d (%(perc_unsure)d%%) unsure." % format_dict) ! push("%(trn_ham)d message%(sp2)s were manually " \ ! "classified as good (%(fp)d %(wp1)s false positive%(sp5)s)." \ ! % format_dict) ! push("%(trn_spam)d message%(sp3)s were manually " \ ! "classified as spam (%(fn)d %(wp2)s false negative%(sp6)s)." \ ! % format_dict) ! push("%(trn_unsure_ham)d unsure message%(sp4)s were manually " \ ! "identified as good, and %(trn_unsure_spam)d as spam." \ ! % format_dict) return chunks if __name__=='__main__': s = Stats() --- 98,238 ---- chunks = [] push = chunks.append ! not_trn_unsure = self.cls_unsure - self.trn_unsure_ham - \ ! self.trn_unsure_spam ! if self.cls_unsure: ! unsure_ham_perc = 100.0 * self.trn_unsure_ham / self.cls_unsure ! unsure_spam_perc = 100.0 * self.trn_unsure_spam / self.cls_unsure ! unsure_not_perc = 100.0 * not_trn_unsure / self.cls_unsure ! else: ! unsure_ham_perc = 0.0 # Not correct, really! ! unsure_spam_perc = 0.0 # Not correct, really! ! unsure_not_perc = 0.0 # Not correct, really! ! if self.trn_ham: ! trn_perc_unsure_ham = 100.0 * self.trn_unsure_ham / \ ! self.trn_ham ! trn_perc_fp = 100.0 * self.fp / self.trn_ham ! trn_perc_ham = 100.0 - (trn_perc_unsure_ham + trn_perc_fp) ! else: ! trn_perc_ham = 0.0 # Not correct, really! ! trn_perc_unsure_ham = 0.0 # Not correct, really! ! trn_perc_fp = 0.0 # Not correct, really! ! if self.trn_spam: ! trn_perc_unsure_spam = 100.0 * self.trn_unsure_spam / \ ! self.trn_spam ! trn_perc_fn = 100.0 * self.fn / self.trn_spam ! trn_perc_spam = 100.0 - (trn_perc_unsure_spam + trn_perc_fn) ! else: ! trn_perc_spam = 0.0 # Not correct, really! ! trn_perc_unsure_spam = 0.0 # Not correct, really! ! trn_perc_fn = 0.0 # Not correct, really! format_dict = { ! 'num_seen' : self.total, ! 'correct' : self.total - (self.cls_unsure + self.fp + self.fn), ! 'incorrect' : self.cls_unsure + self.fp + self.fn, ! 'unsure_ham_perc' : unsure_ham_perc, ! 'unsure_spam_perc' : unsure_spam_perc, ! 'unsure_not_perc' : unsure_not_perc, ! 'not_trn_unsure' : not_trn_unsure, ! 'trn_total' : (self.trn_ham + self.trn_spam + \ ! self.trn_unsure_ham + self.trn_unsure_spam), ! 'trn_perc_ham' : trn_perc_ham, ! 'trn_perc_unsure_ham' : trn_perc_unsure_ham, ! 'trn_perc_fp' : trn_perc_fp, ! 'trn_perc_spam' : trn_perc_spam, ! 'trn_perc_unsure_spam' : trn_perc_unsure_spam, ! 'trn_perc_fn' : trn_perc_fn, } format_dict.update(self.__dict__) + + # Add percentages of everything. + for key, val in format_dict.items(): + perc_key = "perc_" + key + if self.total and isinstance(val, types.IntType): + format_dict[perc_key] = 100.0 * val / self.total + else: + format_dict[perc_key] = 0.0 # Not correct, really! + # Figure out plurals ! for num, key in [("num_seen", "sp1"), ! ("correct", "sp2"), ! ("incorrect", "sp3"), ! ("fp", "sp4"), ! ("fn", "sp5"), ! ("trn_unsure_ham", "sp6"), ! ("trn_unsure_spam", "sp7"), ! ("not_trn_unsure", "sp8"), ! ("trn_total", "sp9"), ! ]: ! if format_dict[num] == 1: format_dict[key] = '' else: format_dict[key] = 's' ! for num, key in [("correct", "wp1"), ! ("incorrect", "wp2"), ! ("not_trn_unsure", "wp3"), ! ]: ! if format_dict[num] == 1: ! format_dict[key] = 'was' else: format_dict[key] = 'were' ! ## Our result should look something like this: ! ## (devised by Mark Moraes and Kenny Pitt) ! ## ! ## SpamBayes has classified a total of 1223 messages: ! ## 827 ham (67.6% of total) ! ## 333 spam (27.2% of total) ! ## 63 unsure (5.2% of total) ! ## ! ## 1125 messages were classified correctly (92.0% of total) ! ## 35 messages were classified incorrectly (2.9% of total) ! ## 0 false positives (0.0% of total) ! ## 35 false negatives (2.9% of total) ! ## ! ## 6 unsures trained as ham (9.5% of unsures) ! ## 56 unsures trained as spam (88.9% of unsures) ! ## 1 unsure was not trained (1.6% of unsures) ! ## ! ## A total of 760 messages have been trained: ! ## 346 ham (98.3% ham, 1.7% unsure, 0.0% false positives) ! ## 414 spam (78.0% spam, 13.5% unsure, 8.5% false negatives) ! ! push("SpamBayes has classified a total of " \ ! "%(num_seen)d message%(sp1)s:" \ ! "
    %(cls_ham)d " \ ! "(%(perc_cls_ham).0f%% of total) good" \ ! "
    %(cls_spam)d " \ ! "(%(perc_cls_spam).0f%% of total) spam" \ ! "
    %(cls_unsure)d " \ ! "(%(perc_cls_unsure).0f%% of total) unsure." % \ ! format_dict) ! push("%(correct)d message%(sp2)s %(wp1)s classified correctly " \ ! "(%(perc_correct).0f%% of total)" \ ! "
%(incorrect)d message%(sp3)s %(wp2)s classified " \ ! "incorrectly " \ ! "(%(perc_incorrect).0f%% of total)" \ ! "
    %(fp)d false positive%(sp4)s " \ ! "(%(perc_fp).0f%% of total)" \ ! "
    %(fn)d false negative%(sp5)s " \ ! "(%(perc_fn).0f%% of total)" % \ ! format_dict) ! push("%(trn_unsure_ham)d unsure%(sp6)s trained as good " \ ! "(%(unsure_ham_perc).0f%% of unsures)" \ ! "
%(trn_unsure_spam)d unsure%(sp7)s trained as spam " \ ! "(%(unsure_spam_perc).0f%% of unsures)" \ ! "
%(not_trn_unsure)d unsure%(sp8)s %(wp3)s not trained " \ ! "(%(unsure_not_perc).0f%% of unsures)" % \ ! format_dict) ! push("A total of %(trn_total)d message%(sp9)s have been trained:" \ ! "
    %(trn_ham)d good " \ ! "(%(trn_perc_ham)0.f%% good, %(trn_perc_unsure_ham)0.f%% " \ ! "unsure, %(trn_perc_fp).0f%% false positives)" \ ! "
    %(trn_spam)d spam " \ ! "(%(trn_perc_spam)0.f%% spam, %(trn_perc_unsure_spam)0.f%% " \ ! "unsure, %(trn_perc_fn).0f%% false negatives)" % \ ! format_dict) return chunks + if __name__=='__main__': s = Stats() From anadelonbrin at users.sourceforge.net Tue Nov 2 22:27:46 2004 From: anadelonbrin at users.sourceforge.net (Tony Meyer) Date: Tue Nov 2 22:27:50 2004 Subject: [Spambayes-checkins] spambayes/spambayes i18n.py, NONE, 1.1 Options.py, 1.114, 1.115 Message-ID: Update of /cvsroot/spambayes/spambayes/spambayes In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv5690/spambayes Modified Files: Options.py Added Files: i18n.py Log Message: Add [ 1052816 ] I18N This is basically the patch created by Hernan Martinez Foffani, but modified a tad by me. --- NEW FILE: i18n.py --- """Internationalisation Classes: LanguageManager - Interface class for languages. Abstract: Manages the internationalisation (i18n) aspects of SpamBayes. """ # This module is part of the spambayes project, which is Copyright 2002-4 # The Python Software Foundation and is covered by the Python Software # Foundation license. __author__ = "Hernan Martinez Foffani " __credits__ = "Tony Meyer, All the SpamBayes folk." try: True, False except NameError: # Maintain compatibility with Python 2.2 True, False = 1, 0 import os import sys from locale import getdefaultlocale from gettext import translation, NullTranslations # Note, we must not import spambayes.Options, or Outlook will not be happy. ## Set language environment for gettext and for dynamic load of dialogs. ## ## Our directory layout is: ## spambayes ## spambayes ## i18n.py <--- this file ## languages <--- the directory for lang packs ## es <-- generic language data ## DIALOGS ## LC_MESSAGES ## es_ES <-- specific language/country data. ## DIALOGS <-- resource dialogs ## LC_MESSAGES <-- gettext messages files ## zn ## zn_TW ## Outlook2000 ## utilities ## ..etc.. class LanguageManager: def __init__(self, directory=os.path.dirname(__file__)): """Initialisation. 'directory' is the parent directory of the 'languages' directory. It defaults to the directory of this module.""" self.current_langs_codes = [] self.local_dir = os.path.join(directory, "..", "languages") self._sys_path_modifications = [] def set_language(self, lang_code=None): """Set a language as the current one.""" if not lang_code: return self.current_langs_codes = [ lang_code ] self._rebuild_syspath_for_dialogs() self._install_gettext() def locale_default_lang(self): """Get the default language for the locale.""" # Note that this may return None. return getdefaultlocale()[0] def add_language(self, lang_code=None): """Add a language to the current languages list. The list acts as a fallback mechanism, where the first language of the list is used if possible, and if not the second one, and so on. """ if not lang_code: return self.current_langs_codes.insert(0, lang_code) self._rebuild_syspath_for_dialogs() self._install_gettext() def clear_language(self): """Clear the current language(s) and set SpamBayes to use the default.""" self.current_langs_codes = [] self._clear_syspath() lang = NullTranslations() lang.install() def _install_gettext(self): """Set the gettext specific environment.""" lang = translation("outlook_addin", self.local_dir, self.current_langs_codes, fallback=True) lang.install() def _rebuild_syspath_for_dialogs(self): """Add to sys.path the directories of the translated dialogs. For each language of the current list, we add two directories, one for language code and country and the other for the language code only, so we can simulate the fallback procedures.""" self._clear_syspath() for lcode in self.current_langs_codes: code_and_country = os.path.join(self.local_dir, lcode, 'DIALOGS') code_only = os.path.join(self.local_dir, lcode.split("_")[0], 'DIALOGS') if code_and_country not in sys.path: sys.path.append(code_and_country) self._sys_path_modifications.append(code_and_country) if code_only not in sys.path: sys.path.append(code_only) self._sys_path_modifications.append(code_only) def _clear_syspath(self): """Clean sys.path of the stuff that we put in it.""" for path in self._sys_path_modifications: sys.path.remove(path) self._sys_path_modifications = [] def test(): lm = LanguageManager() print "INIT: len(sys.path): ", len(sys.path) print "TEST default lang" lm.set_language(lm.locale_default_lang()) print "\tCurrent Languages: ", lm.current_langs_codes print "\tlen(sys.path): ", len(sys.path) print "\t", _("Help") print "TEST clear_language" lm.clear_language() print "\tCurrent Languages: ", lm.current_langs_codes print "\tlen(sys.path): ", len(sys.path) print "\t", _("Help") print "TEST set_language" for langcode in ["kk_KK", "z", "", "es", None, "es_AR"]: print "lang: ", langcode lm.set_language(langcode) print "\tCurrent Languages: ", lm.current_langs_codes print "\tlen(sys.path): ", len(sys.path) print "\t", _("Help") lm.clear_language() print "TEST add_language" for langcode in ["kk_KK", "z", "", "es", None, "es_AR"]: print "lang: ", langcode lm.add_language(langcode) print "\tCurrent Languages: ", lm.current_langs_codes print "\tlen(sys.path): ", len(sys.path) print "\t", _("Help") if __name__ == '__main__': test() Index: Options.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/spambayes/Options.py,v retrieving revision 1.114 retrieving revision 1.115 diff -C2 -d -r1.114 -r1.115 *** Options.py 30 Sep 2004 05:16:30 -0000 1.114 --- Options.py 2 Nov 2004 21:27:42 -0000 1.115 *************** *** 1126,1129 **** --- 1126,1134 ---- entered with the server:port form.""", SERVER, DO_NOT_RESTORE), + + ("language", "User Interface Language", ("en_US",), + """If possible, the user interface should use a language from this + list (in order of preference).""", + r"\w\w(?:_\w\w)?", RESTORE), ), } From anadelonbrin at users.sourceforge.net Tue Nov 2 22:29:42 2004 From: anadelonbrin at users.sourceforge.net (Tony Meyer) Date: Tue Nov 2 22:29:46 2004 Subject: [Spambayes-checkins] spambayes/Outlook2000/dialogs __init__.py, 1.12, 1.13 Message-ID: Update of /cvsroot/spambayes/spambayes/Outlook2000/dialogs In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv6166/Outlook2000/dialogs Modified Files: __init__.py Log Message: Add [ 1052816 ] I18N This is basically the patch created by Hernan Martinez Foffani, but modified a tad by me. Index: __init__.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/Outlook2000/dialogs/__init__.py,v retrieving revision 1.12 retrieving revision 1.13 diff -C2 -d -r1.12 -r1.13 *** __init__.py 16 Dec 2003 05:06:33 -0000 1.12 --- __init__.py 2 Nov 2004 21:29:36 -0000 1.13 *************** *** 6,15 **** base_name = os.path.splitext(rc_name)[0] mod_name = "dialogs.resources." + base_name ! mod = None # If we are running from source code, check the .py file is up to date # wrt the .rc file passed in. # If we are running from binaries, the rc name is not used at all - we # assume someone running from source previously generated the .py! ! if not hasattr(sys, "frozen"): from resources import rc2py rc_path = os.path.dirname( rc2py.__file__ ) --- 6,23 ---- base_name = os.path.splitext(rc_name)[0] mod_name = "dialogs.resources." + base_name ! ! # I18N ! # Loads a foreign language dialogs.py file, assuming that sys.path ! # already points to one with the foreign language resources. ! try: ! mod = __import__("i18n_" + base_name) ! except ImportError: ! mod = None ! # If we are running from source code, check the .py file is up to date # wrt the .rc file passed in. # If we are running from binaries, the rc name is not used at all - we # assume someone running from source previously generated the .py! ! if not hasattr(sys, "frozen") and not mod: from resources import rc2py rc_path = os.path.dirname( rc2py.__file__ ) From anadelonbrin at users.sourceforge.net Tue Nov 2 22:29:42 2004 From: anadelonbrin at users.sourceforge.net (Tony Meyer) Date: Tue Nov 2 22:29:46 2004 Subject: [Spambayes-checkins] spambayes/Outlook2000/dialogs/resources rc2py.py, 1.6, 1.7 rcparser.py, 1.11, 1.12 Message-ID: Update of /cvsroot/spambayes/spambayes/Outlook2000/dialogs/resources In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv6166/Outlook2000/dialogs/resources Modified Files: rc2py.py rcparser.py Log Message: Add [ 1052816 ] I18N This is basically the patch created by Hernan Martinez Foffani, but modified a tad by me. Index: rc2py.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/Outlook2000/dialogs/resources/rc2py.py,v retrieving revision 1.6 retrieving revision 1.7 diff -C2 -d -r1.6 -r1.7 *** rc2py.py 26 Aug 2003 10:57:44 -0000 1.6 --- rc2py.py 2 Nov 2004 21:29:36 -0000 1.7 *************** *** 11,15 **** import rcparser ! def convert(inputFilename = None, outputFilename = None): """See the module doc string""" if inputFilename is None: --- 11,16 ---- import rcparser ! def convert(inputFilename = None, outputFilename = None, ! enableGettext = True): """See the module doc string""" if inputFilename is None: *************** *** 17,21 **** if outputFilename is None: outputFilename = "test.py" ! rcp = rcparser.ParseDialogs(inputFilename) in_stat = os.stat(inputFilename) --- 18,22 ---- if outputFilename is None: outputFilename = "test.py" ! rcp = rcparser.ParseDialogs(inputFilename, enableGettext) in_stat = os.stat(inputFilename) *************** *** 34,39 **** if __name__=="__main__": ! if len(sys.argv)>1: ! convert(sys.argv[1], sys.argv[2]) else: convert() --- 35,42 ---- if __name__=="__main__": ! if len(sys.argv) > 3: ! convert(sys.argv[1], sys.argv[2], bool(sys.argv[3])) ! elif len(sys.argv) > 2: ! convert(sys.argv[1], sys.argv[2], True) else: convert() Index: rcparser.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/Outlook2000/dialogs/resources/rcparser.py,v retrieving revision 1.11 retrieving revision 1.12 diff -C2 -d -r1.11 -r1.12 *** rcparser.py 16 Dec 2003 05:06:33 -0000 1.11 --- rcparser.py 2 Nov 2004 21:29:37 -0000 1.12 *************** *** 6,9 **** --- 6,15 ---- __author__="Adam Walker" + try: + True, False + except NameError: + # Maintain compatibility with Python 2.2 + True, False = 1, 0 + import sys, os, shlex import win32con *************** *** 92,95 **** --- 98,112 ---- + class gt_str(str): + """Change a string to a gettext version of itself.""" + def __repr__(self): + if len(self) > 0: + # timeit indicates that addition is faster than interpolation + # here + return "_(" + super(gt_str, self).__repr__() + ")" + else: + return super(gt_str, self).__repr__() + + class RCParser: next_id = 1001 *************** *** 103,106 **** --- 120,124 ---- self.names = {1:"IDOK", 2:"IDCANCEL", -1:"IDC_STATIC"} self.bitmaps = {} + self.gettexted = False def debug(self, *args): *************** *** 293,297 **** self.token = self.token[1:-1] self.debug("Caption is:",self.token) ! dlg.caption = self.token self.getToken() def dialogFont(self, dlg): --- 311,319 ---- self.token = self.token[1:-1] self.debug("Caption is:",self.token) ! if self.gettexted: ! # gettext captions ! dlg.caption = gt_str(self.token) ! else: ! dlg.caption = self.token self.getToken() def dialogFont(self, dlg): *************** *** 313,317 **** self.getToken() if self.token[0:1]=='"': ! control.label = self.token[1:-1] self.getCommaToken() self.getToken() --- 335,343 ---- self.getToken() if self.token[0:1]=='"': ! if self.gettexted: ! # gettext labels ! control.label = gt_str(self.token[1:-1]) ! else: ! control.label = self.token[1:-1] self.getCommaToken() self.getToken() *************** *** 352,357 **** #print control.toString() dlg.controls.append(control) ! def ParseDialogs(rc_file): rcp = RCParser() try: rcp.loadDialogs(rc_file) --- 378,385 ---- #print control.toString() dlg.controls.append(control) ! ! def ParseDialogs(rc_file, gettexted=False): rcp = RCParser() + rcp.gettexted = gettexted try: rcp.loadDialogs(rc_file) From anadelonbrin at users.sourceforge.net Tue Nov 2 22:33:49 2004 From: anadelonbrin at users.sourceforge.net (Tony Meyer) Date: Tue Nov 2 22:33:53 2004 Subject: [Spambayes-checkins] spambayes/Outlook2000 addin.py, 1.133, 1.134 config.py, 1.31, 1.32 config_wizard.py, 1.9, 1.10 filter.py, 1.38, 1.39 manager.py, 1.97, 1.98 oastats.py, 1.7, 1.8 Message-ID: Update of /cvsroot/spambayes/spambayes/Outlook2000 In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv7107/Outlook2000 Modified Files: addin.py config.py config_wizard.py filter.py manager.py oastats.py Log Message: Wrap strings to translate in _(). Leave all log messages untranslated, because logs are more use to the spambayes@python.org people than they are to users, really, and I don't want to have to translate logs into English to try and help people. Not sure if the config.py stuff is right or not - will check. Index: addin.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/Outlook2000/addin.py,v retrieving revision 1.133 retrieving revision 1.134 diff -C2 -d -r1.133 -r1.134 *** addin.py 16 Oct 2004 22:37:10 -0000 1.133 --- addin.py 2 Nov 2004 21:33:46 -0000 1.134 *************** *** 539,554 **** try: if spam_folder.GetItemCount() > 0: ! message = "Are you sure you want to permanently delete all items " \ ! "in the \"%s\" folder?" % spam_folder.name if mgr.AskQuestion(message): ! mgr.LogDebug(2, "Emptying spam from folder '%s'" % spam_folder.GetFQName()) import manager spam_folder.EmptyFolder(manager._GetParent()) else: ! mgr.LogDebug(2, "Spam folder '%s' was already empty" % spam_folder.GetFQName()) ! message = "The \"%s\" folder is already empty." % spam_folder.name mgr.ReportInformation(message) except: ! mgr.LogDebug(0, "Error emptying spam folder '%s'!" % spam_folder.GetFQName()) traceback.print_exc() --- 539,559 ---- try: if spam_folder.GetItemCount() > 0: ! message = _("Are you sure you want to permanently delete " \ ! "all items in the \"%s\" folder?") \ ! % spam_folder.name if mgr.AskQuestion(message): ! mgr.LogDebug(2, "Emptying spam from folder '%s'" % \ ! spam_folder.GetFQName()) import manager spam_folder.EmptyFolder(manager._GetParent()) else: ! mgr.LogDebug(2, "Spam folder '%s' was already empty" % \ ! spam_folder.GetFQName()) ! message = _("The \"%s\" folder is already empty.") % \ ! spam_folder.name mgr.ReportInformation(message) except: ! mgr.LogDebug(0, "Error emptying spam folder '%s'!" % \ ! spam_folder.GetFQName()) traceback.print_exc() *************** *** 578,584 **** traceback.print_exc() manager.ReportError( ! "There was an error checking for the latest version\r\n" ! "For specific details on the error, please see the SpamBayes log" ! "\r\n\r\nPlease check your internet connection, or try again later" ) return --- 583,589 ---- traceback.print_exc() manager.ReportError( ! _("There was an error checking for the latest version\r\n" ! "For specific details on the error, please see the SpamBayes log" ! "\r\n\r\nPlease check your internet connection, or try again later") ) return *************** *** 587,600 **** if latest_ver_num > cur_ver_num: url = get_version_string(app_name, "Download Page", version_dict=latest) ! msg = "You are running %s\r\n\r\nThe latest available version is %s" \ ! "\r\n\r\nThe download page for the latest version is\r\n%s" \ ! "\r\n\r\nWould you like to visit this page now?" \ ! % (cur_ver_string, latest_ver_string, url) if manager.AskQuestion(msg): print "Opening browser page", url os.startfile(url) else: ! msg = "The latest available version is %s\r\n\r\n" \ ! "No later version is available." % latest_ver_string manager.ReportInformation(msg) --- 592,605 ---- if latest_ver_num > cur_ver_num: url = get_version_string(app_name, "Download Page", version_dict=latest) ! msg = _("You are running %s\r\n\r\nThe latest available version is %s" \ ! "\r\n\r\nThe download page for the latest version is\r\n%s" \ ! "\r\n\r\nWould you like to visit this page now?") \ ! % (cur_ver_string, latest_ver_string, url) if manager.AskQuestion(msg): print "Opening browser page", url os.startfile(url) else: ! msg = _("The latest available version is %s\r\n\r\n" \ ! "No later version is available.") % latest_ver_string manager.ReportInformation(msg) *************** *** 635,640 **** if not self.manager.config.filter.enabled: self.manager.ReportError( ! "You must configure and enable SpamBayes before you can " \ ! "mark messages as spam") return SetWaitCursor(1) --- 640,645 ---- if not self.manager.config.filter.enabled: self.manager.ReportError( ! _("You must configure and enable SpamBayes before you " \ ! "can mark messages as spam")) return SetWaitCursor(1) *************** *** 650,655 **** pass if spam_folder is None: ! self.manager.ReportError("You must configure the Spam folder", ! "Invalid Configuration") return import train --- 655,660 ---- pass if spam_folder is None: ! self.manager.ReportError(_("You must configure the Spam folder"), ! _("Invalid Configuration")) return import train *************** *** 694,699 **** if not self.manager.config.filter.enabled: self.manager.ReportError( ! "You must configure and enable SpamBayes before you can " \ ! "mark messages as not spam") return SetWaitCursor(1) --- 699,704 ---- if not self.manager.config.filter.enabled: self.manager.ReportError( ! _("You must configure and enable SpamBayes before you " \ ! "can mark messages as not spam")) return SetWaitCursor(1) *************** *** 793,803 **** # Add our "Spam" and "Not Spam" buttons ! tt_text = "Move the selected message to the Spam folder,\n" \ ! "and train the system that this is Spam." self.but_delete_as = self._AddControl( None, constants.msoControlButton, ButtonDeleteAsSpamEvent, (self.manager, self), ! Caption="Spam", TooltipText = tt_text, BeginGroup = False, --- 798,808 ---- # Add our "Spam" and "Not Spam" buttons ! tt_text = _("Move the selected message to the Spam folder,\n" \ ! "and train the system that this is Spam.") self.but_delete_as = self._AddControl( None, constants.msoControlButton, ButtonDeleteAsSpamEvent, (self.manager, self), ! Caption=_("Spam"), TooltipText = tt_text, BeginGroup = False, *************** *** 805,818 **** image = "delete_as_spam.bmp") # And again for "Not Spam" ! tt_text = \ ! "Recovers the selected item back to the folder\n" \ ! "it was filtered from (or to the Inbox if this\n" \ ! "folder is not known), and trains the system that\n" \ ! "this is a good message\n" self.but_recover_as = self._AddControl( None, constants.msoControlButton, ButtonRecoverFromSpamEvent, (self.manager, self), ! Caption="Not Spam", TooltipText = tt_text, Tag = "SpamBayesCommand.RecoverFromSpam", --- 810,823 ---- image = "delete_as_spam.bmp") # And again for "Not Spam" ! tt_text = _(\ ! "Recovers the selected item back to the folder\n" \ ! "it was filtered from (or to the Inbox if this\n" \ ! "folder is not known), and trains the system that\n" \ ! "this is a good message\n") self.but_recover_as = self._AddControl( None, constants.msoControlButton, ButtonRecoverFromSpamEvent, (self.manager, self), ! Caption=_("Not Spam"), TooltipText = tt_text, Tag = "SpamBayesCommand.RecoverFromSpam", *************** *** 828,833 **** constants.msoControlPopup, None, None, ! Caption="SpamBayes", ! TooltipText = "SpamBayes anti-spam filters and functions", Enabled = True, Tag = "SpamBayesCommand.Popup") --- 833,838 ---- constants.msoControlPopup, None, None, ! Caption=_("SpamBayes"), ! TooltipText = _("SpamBayes anti-spam filters and functions"), Enabled = True, Tag = "SpamBayesCommand.Popup") *************** *** 846,851 **** constants.msoControlButton, ButtonEvent, (manager.ShowManager,), ! Caption="SpamBayes Manager...", ! TooltipText = "Show the SpamBayes manager dialog.", Enabled = True, Visible=True, --- 851,856 ---- constants.msoControlButton, ButtonEvent, (manager.ShowManager,), ! Caption=_("SpamBayes Manager..."), ! TooltipText = _("Show the SpamBayes manager dialog."), Enabled = True, Visible=True, *************** *** 874,878 **** constants.msoControlButton, ButtonEvent, (ShowClues, self.manager, self), ! Caption="Show spam clues for current message", Enabled=True, Visible=True, --- 879,883 ---- constants.msoControlButton, ButtonEvent, (ShowClues, self.manager, self), ! Caption=_("Show spam clues for current message"), Enabled=True, Visible=True, *************** *** 881,885 **** constants.msoControlButton, ButtonEvent, (manager.ShowFilterNow,), ! Caption="Filter messages...", Enabled=True, Visible=True, --- 886,890 ---- constants.msoControlButton, ButtonEvent, (manager.ShowFilterNow,), ! Caption=_("Filter messages..."), Enabled=True, Visible=True, *************** *** 888,892 **** constants.msoControlButton, ButtonEvent, (EmptySpamFolder, self.manager), ! Caption="Empty Spam Folder", Enabled=True, Visible=True, --- 893,897 ---- constants.msoControlButton, ButtonEvent, (EmptySpamFolder, self.manager), ! Caption=_("Empty Spam Folder"), Enabled=True, Visible=True, *************** *** 896,900 **** constants.msoControlButton, ButtonEvent, (CheckLatestVersion, self.manager,), ! Caption="Check for new version", Enabled=True, Visible=True, --- 901,905 ---- constants.msoControlButton, ButtonEvent, (CheckLatestVersion, self.manager,), ! Caption=_("Check for new version"), Enabled=True, Visible=True, *************** *** 905,910 **** constants.msoControlPopup, None, None, ! Caption="Help", ! TooltipText = "SpamBayes help documents", Enabled = True, Tag = "SpamBayesCommand.HelpPopup") --- 910,915 ---- constants.msoControlPopup, None, None, ! Caption=_("Help"), ! TooltipText = _("SpamBayes help documents"), Enabled = True, Tag = "SpamBayesCommand.HelpPopup") *************** *** 912,932 **** helpPopup = CastTo(helpPopup, "CommandBarPopup") self._AddHelpControl(helpPopup, ! "About SpamBayes", "about.html", "SpamBayesCommand.Help.ShowAbout") self._AddHelpControl(helpPopup, ! "Troubleshooting Guide", "docs/troubleshooting.html", "SpamBayesCommand.Help.ShowTroubleshooting") self._AddHelpControl(helpPopup, ! "SpamBayes Website", "http://spambayes.sourceforge.net/", "SpamBayesCommand.Help.ShowSpamBayes Website") self._AddHelpControl(helpPopup, ! "Frequently Asked Questions", "http://spambayes.sourceforge.net/faq.html", "SpamBayesCommand.Help.ShowFAQ") self._AddHelpControl(helpPopup, ! "SpamBayes Bug Tracker", "http://sourceforge.net/tracker/?group_id=61702&atid=498103", "SpamBayesCommand.Help.BugTacker") --- 917,937 ---- helpPopup = CastTo(helpPopup, "CommandBarPopup") self._AddHelpControl(helpPopup, ! _("About SpamBayes"), "about.html", "SpamBayesCommand.Help.ShowAbout") self._AddHelpControl(helpPopup, ! _("Troubleshooting Guide"), "docs/troubleshooting.html", "SpamBayesCommand.Help.ShowTroubleshooting") self._AddHelpControl(helpPopup, ! _("SpamBayes Website"), "http://spambayes.sourceforge.net/", "SpamBayesCommand.Help.ShowSpamBayes Website") self._AddHelpControl(helpPopup, ! _("Frequently Asked Questions"), "http://spambayes.sourceforge.net/faq.html", "SpamBayesCommand.Help.ShowFAQ") self._AddHelpControl(helpPopup, ! _("SpamBayes Bug Tracker"), "http://sourceforge.net/tracker/?group_id=61702&atid=498103", "SpamBayesCommand.Help.BugTacker") *************** *** 937,941 **** constants.msoControlButton, ButtonEvent, (Tester, self.manager), ! Caption="Execute test suite", Enabled=True, Visible=True, --- 942,946 ---- constants.msoControlButton, ButtonEvent, (Tester, self.manager), ! Caption=_("Execute test suite"), Enabled=True, Visible=True, *************** *** 1051,1055 **** sel = explorer.Selection if sel.Count > 1 and not allow_multi: ! self.manager.ReportError("Please select a single item", "Large selection") return None --- 1056,1061 ---- sel = explorer.Selection if sel.Count > 1 and not allow_multi: ! self.manager.ReportError(_("Please select a single item"), ! _("Large selection")) return None *************** *** 1070,1074 **** if len(ret) == 0: ! self.manager.ReportError("No filterable mail items are selected", "No selection") return None if allow_multi: --- 1076,1081 ---- if len(ret) == 0: ! self.manager.ReportError(_("No filterable mail items are selected"), ! _("No selection")) return None if allow_multi: *************** *** 1140,1149 **** print "Error finding the MAPI folders for a folder switch event" # As this happens once per move, we should only display it once. ! self.manager.ReportErrorOnce( "There appears to be a problem with the SpamBayes" " configuration\r\n\r\nPlease select the SpamBayes" " manager, and run the\r\nConfiguration Wizard to" ! " reconfigure the filter.", ! "Invalid SpamBayes Configuration") traceback.print_exc() if self.but_recover_as is not None: --- 1147,1156 ---- print "Error finding the MAPI folders for a folder switch event" # As this happens once per move, we should only display it once. ! self.manager.ReportErrorOnce(_( "There appears to be a problem with the SpamBayes" " configuration\r\n\r\nPlease select the SpamBayes" " manager, and run the\r\nConfiguration Wizard to" ! " reconfigure the filter."), ! _("Invalid SpamBayes Configuration")) traceback.print_exc() if self.but_recover_as is not None: *************** *** 1255,1258 **** --- 1262,1267 ---- print "Error connecting to Outlook!" traceback.print_exc() + # We can't translate this string, as we haven't managed to load + # the translation tools. manager.ReportError( "There was an error initializing the SpamBayes addin\r\n\r\n" *************** *** 1278,1283 **** if not self.manager.config.filter.spam_folder_id or \ not self.manager.config.filter.watch_folder_ids: ! msg = "It appears there was an error loading your configuration\r\n\r\n" \ ! "Please re-configure SpamBayes via the SpamBayes dropdown" self.manager.ReportError(msg) # But continue on regardless. --- 1287,1292 ---- if not self.manager.config.filter.spam_folder_id or \ not self.manager.config.filter.watch_folder_ids: ! msg = _("It appears there was an error loading your configuration\r\n\r\n" \ ! "Please re-configure SpamBayes via the SpamBayes dropdown") self.manager.ReportError(msg) # But continue on regardless. *************** *** 1293,1298 **** # being enabled. The new Wizard should help, but things can # still screw up. ! self.manager.LogDebug(0, "*** SpamBayes is NOT enabled, so will " \ ! "not filter incoming mail. ***") # Toolbar and other UI stuff must be setup once startup is complete. explorers = self.application.Explorers --- 1302,1307 ---- # being enabled. The new Wizard should help, but things can # still screw up. ! self.manager.LogDebug(0, _("*** SpamBayes is NOT enabled, so " \ ! "will not filter incoming mail. ***")) # Toolbar and other UI stuff must be setup once startup is complete. explorers = self.application.Explorers Index: config.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/Outlook2000/config.py,v retrieving revision 1.31 retrieving revision 1.32 diff -C2 -d -r1.31 -r1.32 *** config.py 1 Oct 2004 14:31:34 -0000 1.31 --- config.py 2 Nov 2004 21:33:46 -0000 1.32 *************** *** 12,15 **** --- 12,16 ---- import sys, types + def _(text): return text try: *************** *** 22,27 **** FOLDER_ID = r"\(\'[a-fA-F0-9]+\', \'[a-fA-F0-9]+\'\)" FIELD_NAME = r"[a-zA-Z0-9 ]+" ! FILTER_ACTION = "Untouched", "Moved", "Copied" ! MSG_READ_STATE = "None", "Read", "Unread" from spambayes.OptionsClass import OptionsClass, Option --- 23,28 ---- FOLDER_ID = r"\(\'[a-fA-F0-9]+\', \'[a-fA-F0-9]+\'\)" FIELD_NAME = r"[a-zA-Z0-9 ]+" ! FILTER_ACTION = _("Untouched"), _("Moved"), _("Copied") ! MSG_READ_STATE = _("None"), _("Read"), _("Unread") from spambayes.OptionsClass import OptionsClass, Option *************** *** 89,109 **** defaults = { "General" : ( ! ("field_score_name", "The name of the field used to store the spam score", "Spam", ! """SpamBayes stores the spam score for each message in a custom field. ! This option specifies the name of the field""", FIELD_NAME, RESTORE), ! ("data_directory", "The directory to store the data files.", "", ! """""", PATH, DO_NOT_RESTORE), ! ("delete_as_spam_message_state", "How the 'read' flag on a message is modified", "None", ! """When the 'Spam' function is used, the message 'read' flag can ! also be set.""", MSG_READ_STATE, RESTORE), ! ("recover_from_spam_message_state", "How the 'read' flag on a message is modified", "None", ! """When the 'Not Spam' function is used, the message 'read' flag can ! also be set.""", MSG_READ_STATE, RESTORE), ! ("verbose", "Changes the verbosity of the debug output from the program", 0, ! """Indicates how much information is written to the SpamBayes log file.""", INTEGER, RESTORE), ), --- 90,110 ---- defaults = { "General" : ( ! ("field_score_name", _("The name of the field used to store the spam score"), _("Spam"), ! _("""SpamBayes stores the spam score for each message in a custom field. ! This option specifies the name of the field"""), FIELD_NAME, RESTORE), ! ("data_directory", _("The directory to store the data files."), "", ! _(""""""), PATH, DO_NOT_RESTORE), ! ("delete_as_spam_message_state", _("How the 'read' flag on a message is modified"), "None", ! _("""When the 'Spam' function is used, the message 'read' flag can ! also be set."""), MSG_READ_STATE, RESTORE), ! ("recover_from_spam_message_state", _("How the 'read' flag on a message is modified"), "None", ! _("""When the 'Not Spam' function is used, the message 'read' flag can ! also be set."""), MSG_READ_STATE, RESTORE), ! ("verbose", _("Changes the verbosity of the debug output from the program"), 0, ! _("""Indicates how much information is written to the SpamBayes log file."""), INTEGER, RESTORE), ), *************** *** 118,162 **** ("timer_interval", "obsolete", 1000, "", INTEGER, RESTORE), ("timer_only_receive_folders", "obsolete", True, "", BOOLEAN, RESTORE), ), "Training" : ( (FolderIDOption, ! "ham_folder_ids", "Folders containing known good messages", [], ! """A list of folders known to contain good (ham) messages. When SpamBayes ! is trained, these messages will be used as examples of good messages.""", FOLDER_ID, DO_NOT_RESTORE), ! ("ham_include_sub", "Does the nominated ham folders include sub-folders?", False, ! """""", BOOLEAN, DO_NOT_RESTORE), (FolderIDOption, ! "spam_folder_ids", "Folders containing known bad or spam messages", [], ! """A list of folders known to contain bad (spam) messages. When SpamBayes ! is trained, these messages will be used as examples of messages to filter.""", FOLDER_ID, DO_NOT_RESTORE), ! ("spam_include_sub", "Does the nominated spam folders include sub-folders?", False, ! """""", BOOLEAN, DO_NOT_RESTORE), ! ("train_recovered_spam", "Train as good as items are recovered?", True, ! """SpamBayes can detect when a message previously classified as spam (or unsure) is moved back to the folder from which it was filtered. If this option is enabled, SpamBayes will automatically train on ! such messages""", BOOLEAN, RESTORE), ! ("train_manual_spam", "Train as spam items are manually moved?", True, ! """SpamBayes can detect when a message previously classified as good (or unsure) is manually moved to the Spam folder. If this option is ! enabled, SpamBayes will automatically train on such messages""", BOOLEAN, RESTORE), ! ("rescore", "Rescore message after training?", True, ! """After the training has completed, should all the messages be scored for their Spam value. This is particularly useful after your initial training runs, so you can see how effective your ! sorting of spam and ham was.""", BOOLEAN, RESTORE), ! ("rebuild", "Rescore message after training?", True, ! """Should the entire database be rebuilt? If enabled, then all training information is reset, and a complete new database built from the existing messages in your folders. If disabled, then only new messages in the folders that have not previously been trained ! on will be processed""", BOOLEAN, RESTORE), ), --- 119,166 ---- ("timer_interval", "obsolete", 1000, "", INTEGER, RESTORE), ("timer_only_receive_folders", "obsolete", True, "", BOOLEAN, RESTORE), + # Rather than fpfnunsure, do tte. DeleteAs/RecoverFrom just move + # the message, and a tte update is done on close. + ("train_to_exhaustion", "Train to exhaustion", False, "", BOOLEAN, RESTORE), ), "Training" : ( (FolderIDOption, ! "ham_folder_ids", _("Folders containing known good messages"), [], ! _("""A list of folders known to contain good (ham) messages. When SpamBayes ! is trained, these messages will be used as examples of good messages."""), FOLDER_ID, DO_NOT_RESTORE), ! ("ham_include_sub", _("Does the nominated ham folders include sub-folders?"), False, ! _(""""""), BOOLEAN, DO_NOT_RESTORE), (FolderIDOption, ! "spam_folder_ids", _("Folders containing known bad or spam messages"), [], ! _("""A list of folders known to contain bad (spam) messages. When SpamBayes ! is trained, these messages will be used as examples of messages to filter."""), FOLDER_ID, DO_NOT_RESTORE), ! ("spam_include_sub", _("Does the nominated spam folders include sub-folders?"), False, ! _(""""""), BOOLEAN, DO_NOT_RESTORE), ! ("train_recovered_spam", _("Train as good as items are recovered?"), True, ! _("""SpamBayes can detect when a message previously classified as spam (or unsure) is moved back to the folder from which it was filtered. If this option is enabled, SpamBayes will automatically train on ! such messages"""), BOOLEAN, RESTORE), ! ("train_manual_spam", _("Train as spam items are manually moved?"), True, ! _("""SpamBayes can detect when a message previously classified as good (or unsure) is manually moved to the Spam folder. If this option is ! enabled, SpamBayes will automatically train on such messages"""), BOOLEAN, RESTORE), ! ("rescore", _("Rescore message after training?"), True, ! _("""After the training has completed, should all the messages be scored for their Spam value. This is particularly useful after your initial training runs, so you can see how effective your ! sorting of spam and ham was."""), BOOLEAN, RESTORE), ! ("rebuild", _("Rescore message after training?"), True, ! _("""Should the entire database be rebuilt? If enabled, then all training information is reset, and a complete new database built from the existing messages in your folders. If disabled, then only new messages in the folders that have not previously been trained ! on will be processed"""), BOOLEAN, RESTORE), ), *************** *** 164,269 **** # These options control how a message is categorized "Filter" : ( ! ("filter_now", "State of 'Filter Now' checkbox", False, ! """Something useful.""", BOOLEAN, RESTORE), ! ("save_spam_info", "Save spam score", True, ! """Should the spam score and other information be saved in each message ! as it is filtered or scored?""", BOOLEAN, RESTORE), (FolderIDOption, ! "watch_folder_ids", "Folders to watch for new messages", [], ! """The list of folders SpamBayes will watch for new messages, ! processing messages as defined by the filters.""", FOLDER_ID, DO_NOT_RESTORE), ! ("watch_include_sub", "Does the nominated watch folders include sub-folders?", False, ! """""", BOOLEAN, DO_NOT_RESTORE), (FolderIDOption, ! "spam_folder_id", "The folder used to track spam", None, ! """The folder SpamBayes moves or copies spam to.""", FOLDER_ID, DO_NOT_RESTORE), ! ("spam_threshold", "The score necessary to be considered 'certain' spam", 90.0, ! """Any message with a Spam score greater than or equal to this value ! will be considered spam, and processed accordingly.""", REAL, RESTORE), ! ("spam_action", "The action to take for new spam", "Moved", ! """The action that should be taken as Spam messages arrive.""", FILTER_ACTION, RESTORE), ! ("spam_mark_as_read", "Should filtered spam also be marked as 'read'", False, ! """Determines if spam messages are marked as 'Read' as they are filtered. This can be set to 'True' if the new-mail folder counts bother you when the only new items are spam. It can be set to 'False' if you use the 'read' state of these messages to determine which items you are yet to review. This option does not affect the ! new-mail icon in the system tray.""", BOOLEAN, RESTORE), (FolderIDOption, ! "unsure_folder_id", "The folder used to track uncertain messages", None, ! """The folder SpamBayes moves or copies uncertain messages to.""", FOLDER_ID, DO_NOT_RESTORE), ! ("unsure_threshold", "The score necessary to be considered 'unsure'", 15.0, ! """Any message with a Spam score greater than or equal to this value (but less than the spam threshold) will be considered spam, and ! processed accordingly.""", REAL, RESTORE), ! ("unsure_action", "The action to take for new uncertain messages", "Moved", ! """The action that should be taken as unsure messages arrive.""", FILTER_ACTION, RESTORE), ! ("unsure_mark_as_read", "Should filtered uncertain message also be marked as 'read'", False, ! """Determines if unsure messages are marked as 'Read' as they are ! filtered. See 'spam_mark_as_read' for more details.""", BOOLEAN, RESTORE), ! ("enabled", "Is filtering enabled?", False, ! """""", BOOLEAN, RESTORE), # Options that allow the filtering to be done by a timer. ! ("timer_enabled", "Should items be filtered by a timer?", True, ! """Depending on a number of factors, SpamBayes may occasionally miss messages or conflict with builtin Outlook rules. If this option is set, SpamBayes will filter all messages in the background. This generally solves both of these problem, at the cost of having Spam stay ! in your inbox for a few extra seconds.""", BOOLEAN, RESTORE), ! ("timer_start_delay", "The interval (in seconds) before the timer starts.", 2.0, ! """Once a new item is received in the inbox, SpamBayes will begin processing messages after the given delay. If a new message arrives ! during this period, the timer will be reset and the delay will start again.""", REAL, RESTORE), ! ("timer_interval", "The interval between subsequent timer checks (in seconds)", 1.0, ! """Once the new message timer finds a new message, how long should SpamBayes wait before checking for another new message, assuming no other new messages arrive. Should a new message arrive during this process, the timer will reset, meaning that timer_start_delay will ! elapse before the process begins again.""", REAL, RESTORE), ("timer_only_receive_folders", ! "Should the timer only be used for 'Inbox' type folders?", True, ! """The point of using a timer is to prevent the SpamBayes filter getting in the way the builtin Outlook rules. Therefore, is it generally only necessary to use a timer for folders that have new items being delivered directly to them. Folders that are not inbox style folders generally are not subject to builtin filtering, so ! generally have no problems filtering messages in 'real time'.""", BOOLEAN, RESTORE), ), "Filter_Now": ( ! (FolderIDOption, "folder_ids", "Folders to filter in a 'Filter Now' operation", [], ! """The list of folders that will be filtered by this process.""", FOLDER_ID, DO_NOT_RESTORE), ! ("include_sub", "Does the nominated folders include sub-folders?", False, ! """""", BOOLEAN, DO_NOT_RESTORE), ! ("only_unread", "Only filter unread messages?", False, ! """When scoring messages, should only messages that are unread be ! considered?""", BOOLEAN, RESTORE), ! ("only_unseen", "Only filter previously unseen ?", False, ! """When scoring messages, should only messages that have never ! previously Spam scored be considered?""", BOOLEAN, RESTORE), ! ("action_all", "Perform all filter actions?", True, ! """When scoring the messages, should all items be performed (such as moving the items based on the score) or should the items only be scored, ! but otherwise untouched.""", BOOLEAN, RESTORE), ), --- 168,273 ---- # These options control how a message is categorized "Filter" : ( ! ("filter_now", _("State of 'Filter Now' checkbox"), False, ! _("""Something useful."""), BOOLEAN, RESTORE), ! ("save_spam_info", _("Save spam score"), True, ! _("""Should the spam score and other information be saved in each message ! as it is filtered or scored?"""), BOOLEAN, RESTORE), (FolderIDOption, ! "watch_folder_ids", _("Folders to watch for new messages"), [], ! _("""The list of folders SpamBayes will watch for new messages, ! processing messages as defined by the filters."""), FOLDER_ID, DO_NOT_RESTORE), ! ("watch_include_sub", _("Does the nominated watch folders include sub-folders?"), False, ! _(""""""), BOOLEAN, DO_NOT_RESTORE), (FolderIDOption, ! "spam_folder_id", _("The folder used to track spam"), None, ! _("""The folder SpamBayes moves or copies spam to."""), FOLDER_ID, DO_NOT_RESTORE), ! ("spam_threshold", _("The score necessary to be considered 'certain' spam"), 90.0, ! _("""Any message with a Spam score greater than or equal to this value ! will be considered spam, and processed accordingly."""), REAL, RESTORE), ! ("spam_action", _("The action to take for new spam"), "Moved", ! _("""The action that should be taken as Spam messages arrive."""), FILTER_ACTION, RESTORE), ! ("spam_mark_as_read", _("Should filtered spam also be marked as 'read'"), False, ! _("""Determines if spam messages are marked as 'Read' as they are filtered. This can be set to 'True' if the new-mail folder counts bother you when the only new items are spam. It can be set to 'False' if you use the 'read' state of these messages to determine which items you are yet to review. This option does not affect the ! new-mail icon in the system tray."""), BOOLEAN, RESTORE), (FolderIDOption, ! "unsure_folder_id", _("The folder used to track uncertain messages"), None, ! _("""The folder SpamBayes moves or copies uncertain messages to."""), FOLDER_ID, DO_NOT_RESTORE), ! ("unsure_threshold", _("The score necessary to be considered 'unsure'"), 15.0, ! _("""Any message with a Spam score greater than or equal to this value (but less than the spam threshold) will be considered spam, and ! processed accordingly."""), REAL, RESTORE), ! ("unsure_action", _("The action to take for new uncertain messages"), FILTER_ACTION[1], ! _("""The action that should be taken as unsure messages arrive."""), FILTER_ACTION, RESTORE), ! ("unsure_mark_as_read", _("Should filtered uncertain message also be marked as 'read'"), False, ! _("""Determines if unsure messages are marked as 'Read' as they are ! filtered. See 'spam_mark_as_read' for more details."""), BOOLEAN, RESTORE), ! ("enabled", _("Is filtering enabled?"), False, ! _(""""""), BOOLEAN, RESTORE), # Options that allow the filtering to be done by a timer. ! ("timer_enabled", _("Should items be filtered by a timer?"), True, ! _("""Depending on a number of factors, SpamBayes may occasionally miss messages or conflict with builtin Outlook rules. If this option is set, SpamBayes will filter all messages in the background. This generally solves both of these problem, at the cost of having Spam stay ! in your inbox for a few extra seconds."""), BOOLEAN, RESTORE), ! ("timer_start_delay", _("The interval (in seconds) before the timer starts."), 2.0, ! _("""Once a new item is received in the inbox, SpamBayes will begin processing messages after the given delay. If a new message arrives ! during this period, the timer will be reset and the delay will start again."""), REAL, RESTORE), ! ("timer_interval", _("The interval between subsequent timer checks (in seconds)"), 1.0, ! _("""Once the new message timer finds a new message, how long should SpamBayes wait before checking for another new message, assuming no other new messages arrive. Should a new message arrive during this process, the timer will reset, meaning that timer_start_delay will ! elapse before the process begins again."""), REAL, RESTORE), ("timer_only_receive_folders", ! _("Should the timer only be used for 'Inbox' type folders?"), True, ! _("""The point of using a timer is to prevent the SpamBayes filter getting in the way the builtin Outlook rules. Therefore, is it generally only necessary to use a timer for folders that have new items being delivered directly to them. Folders that are not inbox style folders generally are not subject to builtin filtering, so ! generally have no problems filtering messages in 'real time'."""), BOOLEAN, RESTORE), ), "Filter_Now": ( ! (FolderIDOption, "folder_ids", _("Folders to filter in a 'Filter Now' operation"), [], ! _("""The list of folders that will be filtered by this process."""), FOLDER_ID, DO_NOT_RESTORE), ! ("include_sub", _("Does the nominated folders include sub-folders?"), False, ! _(""""""), BOOLEAN, DO_NOT_RESTORE), ! ("only_unread", _("Only filter unread messages?"), False, ! _("""When scoring messages, should only messages that are unread be ! considered?"""), BOOLEAN, RESTORE), ! ("only_unseen", _("Only filter previously unseen ?"), False, ! _("""When scoring messages, should only messages that have never ! previously Spam scored be considered?"""), BOOLEAN, RESTORE), ! ("action_all", _("Perform all filter actions?"), True, ! _("""When scoring the messages, should all items be performed (such as moving the items based on the score) or should the items only be scored, ! but otherwise untouched."""), BOOLEAN, RESTORE), ), *************** *** 306,326 **** # Migrate some "old" options to "new" options. Can be deleted in # a few versions :) ! # Binary007 last with experimental timer values. ! delay = options.get("Experimental", "timer_start_delay") ! interval = options.get("Experimental", "timer_interval") ! if delay and interval: ! options.set("Filter", "timer_enabled", True) ! options.set("Filter", "timer_start_delay", float(delay / 1000)) ! options.set("Filter", "timer_interval", float(interval / 1000)) ! # and reset the old options so they are not written to the new file ! # (actually, resetting isn't enough - must hack and clobber) ! del options._options["Experimental", "timer_start_delay"] ! del options._options["Experimental", "timer_interval"] ! ! torf = options.get("Experimental", "timer_only_receive_folders") ! if not torf: ! options.set("Filter", "timer_only_receive_folders", False) ! # and reset old ! del options._options["Experimental", "timer_only_receive_folders"] # Old code when we used a pickle. Still needed so old pickles can be --- 310,314 ---- # Migrate some "old" options to "new" options. Can be deleted in # a few versions :) ! pass # Old code when we used a pickle. Still needed so old pickles can be *************** *** 347,350 **** --- 335,340 ---- # End of old pickle code. + del _ + if __name__=='__main__': options = CreateConfig() Index: config_wizard.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/Outlook2000/config_wizard.py,v retrieving revision 1.9 retrieving revision 1.10 diff -C2 -d -r1.9 -r1.10 *** config_wizard.py 16 Dec 2003 05:06:33 -0000 1.9 --- config_wizard.py 2 Nov 2004 21:33:46 -0000 1.10 *************** *** 83,88 **** return new_folder except: ! msg = "There was an error creating the folder named '%s'\r\n" \ ! "Please restart Outlook and try again" % name manager.ReportError(msg) return None --- 83,88 ---- return new_folder except: ! msg = _("There was an error creating the folder named '%s'\r\n" \ ! "Please restart Outlook and try again") % name manager.ReportError(msg) return None Index: filter.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/Outlook2000/filter.py,v retrieving revision 1.38 retrieving revision 1.39 diff -C2 -d -r1.38 -r1.39 *** filter.py 17 Mar 2004 14:11:22 -0000 1.38 --- filter.py 2 Nov 2004 21:33:46 -0000 1.39 *************** *** 141,148 **** config = config.filter_now if not config.folder_ids: ! progress.error("You must specify at least one folder") return ! progress.set_status("Counting messages") num_msgs = 0 for f in mgr.message_store.GetFolderGenerator(config.folder_ids, config.include_sub): --- 141,148 ---- config = config.filter_now if not config.folder_ids: ! progress.error(_("You must specify at least one folder")) return ! progress.set_status(_("Counting messages")) num_msgs = 0 for f in mgr.message_store.GetFolderGenerator(config.folder_ids, config.include_sub): *************** *** 151,155 **** dispositions = {} for f in mgr.message_store.GetFolderGenerator(config.folder_ids, config.include_sub): ! progress.set_status("Filtering folder '%s'" % (f.name)) this_dispositions = filter_folder(f, mgr, config, progress) for key, val in this_dispositions.items(): --- 151,155 ---- dispositions = {} for f in mgr.message_store.GetFolderGenerator(config.folder_ids, config.include_sub): ! progress.set_status(_("Filtering folder '%s'") % (f.name)) this_dispositions = filter_folder(f, mgr, config, progress) for key, val in this_dispositions.items(): *************** *** 160,167 **** err_text = "" if dispositions.has_key("Error"): ! err_text = " (%d errors)" % dispositions["Error"] dget = dispositions.get ! text = "Found %d spam, %d unsure and %d good messages%s" % \ ! (dget("Yes",0), dget("Unsure",0), dget("No",0), err_text) progress.set_status(text) --- 160,167 ---- err_text = "" if dispositions.has_key("Error"): ! err_text = _(" (%d errors)") % dispositions["Error"] dget = dispositions.get ! text = _("Found %d spam, %d unsure and %d good messages%s") % \ ! (dget("Yes",0), dget("Unsure",0), dget("No",0), err_text) progress.set_status(text) Index: manager.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/Outlook2000/manager.py,v retrieving revision 1.97 retrieving revision 1.98 diff -C2 -d -r1.97 -r1.98 *** manager.py 14 Oct 2004 23:36:12 -0000 1.97 --- manager.py 2 Nov 2004 21:33:46 -0000 1.98 *************** *** 107,110 **** --- 107,111 ---- # stuff, which can include spambayes.Options, and assume sys.path in place. def import_early_core_spambayes_stuff(): + global bayes_i18n try: from spambayes import OptionsClass *************** *** 113,119 **** "..")) sys.path.insert(0, parent) def import_core_spambayes_stuff(ini_filenames): ! global bayes_classifier, bayes_tokenize, bayes_storage if "spambayes.Options" in sys.modules: # The only thing we are worried about here is spambayes.Options --- 114,122 ---- "..")) sys.path.insert(0, parent) + from spambayes import i18n + bayes_i18n = i18n def import_core_spambayes_stuff(ini_filenames): ! global bayes_classifier, bayes_tokenize, bayes_storage, bayes_options if "spambayes.Options" in sys.modules: # The only thing we are worried about here is spambayes.Options *************** *** 146,149 **** --- 149,154 ---- assert "spambayes.Options" in sys.modules, \ "Expected 'spambayes.Options' to be loaded here" + from spambayes.Options import options + bayes_options = options # Function to "safely" save a pickle, only overwriting *************** *** 340,343 **** --- 345,354 ---- self.application_directory = os.path.dirname(this_filename) + + # Load the environment for translation. + lang_manager = bayes_i18n.LanguageManager(self.application_directory) + # Set the system user default language. + lang_manager.set_language(lang_manager.locale_default_lang()) + # where windows would like our data stored (and where # we do, unless overwritten via a config file) *************** *** 397,400 **** --- 408,421 ---- import_core_spambayes_stuff(bayes_option_filenames) + # Set interface to use the user language in configuration file. + for language in bayes_options["globals", "language"][::-1]: + # We leave the default in there as the last option, to fall + # back on if necessary. + lang_manager.add_language(language) + self.LogDebug(1, "Asked to add languages: " + \ + ", ".join(bayes_options["globals", "language"])) + self.LogDebug(1, "Set language to " + \ + str(lang_manager.current_langs_codes)) + bayes_base = os.path.join(self.data_directory, "default_bayes_database") mdb_base = os.path.join(self.data_directory, "default_message_database") *************** *** 450,458 **** if not self.reported_startup_error: self.reported_startup_error = True ! full_message = \ "There was an error initializing the Spam plugin.\r\n\r\n" \ "Spam filtering has been disabled. Please re-configure\r\n" \ "and re-enable this plugin\r\n\r\n" \ ! "Error details:\r\n" + message # Disable the plugin if self.config is not None: --- 471,479 ---- if not self.reported_startup_error: self.reported_startup_error = True ! full_message = _(\ "There was an error initializing the Spam plugin.\r\n\r\n" \ "Spam filtering has been disabled. Please re-configure\r\n" \ "and re-enable this plugin\r\n\r\n" \ ! "Error details:\r\n") + message # Disable the plugin if self.config is not None: *************** *** 569,573 **** # Regarding the property type: # We originally wanted to use the "Integer" Outlook field, ! # but it seems this property type alone is not expose via the Object # model. So we resort to olPercent, and live with the % sign # (which really is OK!) --- 590,594 ---- # Regarding the property type: # We originally wanted to use the "Integer" Outlook field, ! # but it seems this property type alone is not exposed via the Object # model. So we resort to olPercent, and live with the % sign # (which really is OK!) *************** *** 640,646 **** self.options.merge_file(filename) except: ! msg = "The configuration file named below is invalid.\r\n" \ "Please either correct or remove this file\r\n\r\n" \ ! "Filename: " + filename self.ReportError(msg) --- 661,667 ---- self.options.merge_file(filename) except: ! msg = _("The configuration file named below is invalid.\r\n" \ "Please either correct or remove this file\r\n\r\n" \ ! "Filename: ") + filename self.ReportError(msg) *************** *** 710,717 **** print "FAILED to load old pickle" traceback.print_exc() ! msg = "There was an error loading your old\r\n" \ ! "SpamBayes configuration file.\r\n\r\n" \ ! "It is likely that you will need to re-configure\r\n" \ ! "SpamBayes before it will function correctly." self.ReportError(msg) # But we can't abort yet - we really should still try and --- 731,738 ---- print "FAILED to load old pickle" traceback.print_exc() ! msg = _("There was an error loading your old\r\n" \ ! "SpamBayes configuration file.\r\n\r\n" \ ! "It is likely that you will need to re-configure\r\n" \ ! "SpamBayes before it will function correctly.") self.ReportError(msg) # But we can't abort yet - we really should still try and *************** *** 739,746 **** os.remove(pickle_filename) except os.error: ! msg = "There was an error migrating and removing your old\r\n" \ ! "SpamBayes configuration file. Configuration changes\r\n" \ ! "you make are unlikely to be reflected next\r\n" \ ! "time you start Outlook. Please try rebooting." self.ReportError(msg) --- 760,767 ---- os.remove(pickle_filename) except os.error: ! msg = _("There was an error migrating and removing your old\r\n" \ ! "SpamBayes configuration file. Configuration changes\r\n" \ ! "you make are unlikely to be reflected next\r\n" \ ! "time you start Outlook. Please try rebooting.") self.ReportError(msg) *************** *** 804,810 **** # See bug 706520 assert fails in classifier # For now, just tell the user. ! msg = "It appears your SpamBayes training database is corrupt.\r\n\r\n" \ ! "We are working on solving this, but unfortunately you\r\n" \ ! "must re-train the system via the SpamBayes manager." self.ReportErrorOnce(msg) # and disable the addin, as we are hosed! --- 825,831 ---- # See bug 706520 assert fails in classifier # For now, just tell the user. ! msg = _("It appears your SpamBayes training database is corrupt.\r\n\r\n" \ ! "We are working on solving this, but unfortunately you\r\n" \ ! "must re-train the system via the SpamBayes manager.") self.ReportErrorOnce(msg) # and disable the addin, as we are hosed! *************** *** 819,829 **** ok_to_enable = operator.truth(config.watch_folder_ids) if not ok_to_enable: ! return "You must define folders to watch for new messages. " \ ! "Select the 'Filtering' tab to define these folders." ok_to_enable = operator.truth(config.spam_folder_id) if not ok_to_enable: ! return "You must define the folder to receive your certain spam. " \ ! "Select the 'Filtering' tab to define this folders." # Check that the user hasn't selected the same folder as both --- 840,850 ---- ok_to_enable = operator.truth(config.watch_folder_ids) if not ok_to_enable: ! return _("You must define folders to watch for new messages. " \ ! "Select the 'Filtering' tab to define these folders.") ok_to_enable = operator.truth(config.spam_folder_id) if not ok_to_enable: ! return _("You must define the folder to receive your certain spam. " \ ! "Select the 'Filtering' tab to define this folder.") # Check that the user hasn't selected the same folder as both *************** *** 835,843 **** unsure_folder = ms.GetFolder(config.unsure_folder_id) except ms.MsgStoreException, details: ! return "The unsure folder is invalid: %s" % (details,) try: spam_folder = ms.GetFolder(config.spam_folder_id) except ms.MsgStoreException, details: ! return "The spam folder is invalid: %s" % (details,) if ok_to_enable: for folder in ms.GetFolderGenerator(config.watch_folder_ids, --- 856,864 ---- unsure_folder = ms.GetFolder(config.unsure_folder_id) except ms.MsgStoreException, details: ! return _("The unsure folder is invalid: %s") % (details,) try: spam_folder = ms.GetFolder(config.spam_folder_id) except ms.MsgStoreException, details: ! return _("The spam folder is invalid: %s") % (details,) if ok_to_enable: for folder in ms.GetFolderGenerator(config.watch_folder_ids, *************** *** 845,857 **** bad_folder_type = None if unsure_folder is not None and unsure_folder == folder: ! bad_folder_type = "unsure" bad_folder_name = unsure_folder.GetFQName() if spam_folder == folder: ! bad_folder_type = "spam" bad_folder_name = spam_folder.GetFQName() if bad_folder_type is not None: ! return "You can not specify folder '%s' as both the " \ ! "%s folder, and as being watched." \ ! % (bad_folder_name, bad_folder_type) return None --- 866,878 ---- bad_folder_type = None if unsure_folder is not None and unsure_folder == folder: ! bad_folder_type = _("unsure") bad_folder_name = unsure_folder.GetFQName() if spam_folder == folder: ! bad_folder_type = _("spam") bad_folder_name = spam_folder.GetFQName() if bad_folder_type is not None: ! return _("You can not specify folder '%s' as both the " \ ! "%s folder, and as being watched.") \ ! % (bad_folder_name, bad_folder_type) return None Index: oastats.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/Outlook2000/oastats.py,v retrieving revision 1.7 retrieving revision 1.8 diff -C2 -d -r1.7 -r1.8 *** oastats.py 20 Oct 2004 00:03:47 -0000 1.7 --- oastats.py 2 Nov 2004 21:33:46 -0000 1.8 *************** *** 91,95 **** totals["num_unsure"]) if num_seen==0: ! return ["SpamBayes has processed zero messages"] chunks = [] push = chunks.append --- 91,95 ---- totals["num_unsure"]) if num_seen==0: ! return [_("SpamBayes has processed zero messages")] chunks = [] push = chunks.append *************** *** 130,151 **** % (decimal_points,) format_dict["perc"] = "%" ! push(("SpamBayes has processed %(num_seen)d messages - " \ "%(num_ham)d (%(perc_ham_s)s) good, " \ "%(num_spam)d (%(perc_spam_s)s) spam " \ ! "and %(num_unsure)d (%(perc_unsure_s)s) unsure" \ % format_dict) % format_dict) if num_recovered_good: ! push("%(num_recovered_good)d message(s) were manually " \ "classified as good (with %(num_recovered_good_fp)d " \ ! "being false positives)" % format_dict) else: ! push("No messages were manually classified as good") if num_deleted_spam: ! push("%(num_deleted_spam)d message(s) were manually " \ "classified as spam (with %(num_deleted_spam_fn)d " \ ! "being false negatives)" % format_dict) else: ! push("No messages were manually classified as spam") return chunks --- 130,151 ---- % (decimal_points,) format_dict["perc"] = "%" ! push((_("SpamBayes has processed %(num_seen)d messages - " \ "%(num_ham)d (%(perc_ham_s)s) good, " \ "%(num_spam)d (%(perc_spam_s)s) spam " \ ! "and %(num_unsure)d (%(perc_unsure_s)s) unsure") \ % format_dict) % format_dict) if num_recovered_good: ! push(_("%(num_recovered_good)d message(s) were manually " \ "classified as good (with %(num_recovered_good_fp)d " \ ! "being false positives)") % format_dict) else: ! push(_("No messages were manually classified as good")) if num_deleted_spam: ! push(_("%(num_deleted_spam)d message(s) were manually " \ "classified as spam (with %(num_deleted_spam_fn)d " \ ! "being false negatives)") % format_dict) else: ! push(_("No messages were manually classified as spam")) return chunks From anadelonbrin at users.sourceforge.net Tue Nov 2 22:34:59 2004 From: anadelonbrin at users.sourceforge.net (Tony Meyer) Date: Tue Nov 2 22:35:01 2004 Subject: [Spambayes-checkins] spambayes/Outlook2000 msgstore.py,1.87,1.88 Message-ID: Update of /cvsroot/spambayes/spambayes/Outlook2000 In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv7324/Outlook2000 Modified Files: msgstore.py Log Message: Wrap strings to translate in _(). Leave all log messages untranslated, because logs are more use to the spambayes@python.org people than they are to users, really, and I don't want to have to translate logs into English to try and help people. Also add "X-Exchange-Delivery-Time" to the faked up Exchange headers. Index: msgstore.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/Outlook2000/msgstore.py,v retrieving revision 1.87 retrieving revision 1.88 diff -C2 -d -r1.87 -r1.88 *** msgstore.py 16 Jul 2004 15:23:10 -0000 1.87 --- msgstore.py 2 Nov 2004 21:34:56 -0000 1.88 *************** *** 149,163 **** hr, exc_msg, exc, arg_err = exc_val if hr == mapi.MAPI_E_TABLE_TOO_BIG: ! err_msg = what + " failed as one of your\r\n" \ "Outlook folders is full. Futher operations are\r\n" \ "likely to fail until you clean up this folder.\r\n\r\n" \ "This message will not be reported again until SpamBayes\r\n"\ ! "is restarted." else: ! err_msg = what + " failed due to an unexpected Outlook error.\r\n" \ ! + GetCOMExceptionString(exc_val) + "\r\n\r\n" \ ! "It is recommended you restart Outlook at the earliest opportunity\r\n\r\n" \ ! "This message will not be reported again until SpamBayes\r\n"\ ! "is restarted." manager.ReportErrorOnce(err_msg) --- 149,163 ---- hr, exc_msg, exc, arg_err = exc_val if hr == mapi.MAPI_E_TABLE_TOO_BIG: ! err_msg = what + _(" failed as one of your\r\n" \ "Outlook folders is full. Futher operations are\r\n" \ "likely to fail until you clean up this folder.\r\n\r\n" \ "This message will not be reported again until SpamBayes\r\n"\ ! "is restarted.") else: ! err_msg = what + _(" failed due to an unexpected Outlook error.\r\n") \ ! + GetCOMExceptionString(exc_val) + "\r\n\r\n" + \ ! _("It is recommended you restart Outlook at the earliest opportunity\r\n\r\n" \ ! "This message will not be reported again until SpamBayes\r\n"\ ! "is restarted.") manager.ReportErrorOnce(err_msg) *************** *** 976,980 **** # This is designed to fake up some SMTP headers for messages # on an exchange server that do not have such headers of their own ! prop_ids = PR_SUBJECT_A, PR_DISPLAY_NAME_A, PR_DISPLAY_TO_A, PR_DISPLAY_CC_A hr, data = self.mapi_object.GetProps(prop_ids,0) subject = self._GetPotentiallyLargeStringProp(prop_ids[0], data[0]) --- 976,981 ---- # This is designed to fake up some SMTP headers for messages # on an exchange server that do not have such headers of their own ! prop_ids = PR_SUBJECT_A, PR_DISPLAY_NAME_A, PR_DISPLAY_TO_A, \ ! PR_DISPLAY_CC_A, PR_MESSAGE_DELIVERY_TIME hr, data = self.mapi_object.GetProps(prop_ids,0) subject = self._GetPotentiallyLargeStringProp(prop_ids[0], data[0]) *************** *** 982,985 **** --- 983,987 ---- to = self._GetPotentiallyLargeStringProp(prop_ids[2], data[2]) cc = self._GetPotentiallyLargeStringProp(prop_ids[3], data[3]) + delivery_time = data[4][1] headers = ["X-Exchange-Message: true"] if subject: headers.append("Subject: "+subject) *************** *** 987,990 **** --- 989,997 ---- if to: headers.append("To: "+to) if cc: headers.append("CC: "+cc) + if delivery_time: + from time import timezone + from email.Utils import formatdate + headers.append("X-Exchange-Delivery-Time: "+\ + formatdate(int(delivery_time)-timezone, True)) return "\n".join(headers) + "\n" *************** *** 1211,1220 **** self.MoveTo(folder) except MsgStoreException, details: ! ReportMAPIError(manager, "Moving a message", details.mapi_exception) def CopyToReportingError(self, manager, folder): try: self.MoveTo(folder) except MsgStoreException, details: ! ReportMAPIError(manager, "Copying a message", details.mapi_exception) def GetFolder(self): --- 1218,1229 ---- self.MoveTo(folder) except MsgStoreException, details: ! ReportMAPIError(manager, _("Moving a message"), ! details.mapi_exception) def CopyToReportingError(self, manager, folder): try: self.MoveTo(folder) except MsgStoreException, details: ! ReportMAPIError(manager, _("Copying a message"), ! details.mapi_exception) def GetFolder(self): From anadelonbrin at users.sourceforge.net Tue Nov 2 22:36:57 2004 From: anadelonbrin at users.sourceforge.net (Tony Meyer) Date: Tue Nov 2 22:37:01 2004 Subject: [Spambayes-checkins] spambayes/Outlook2000 train.py,1.38,1.39 Message-ID: Update of /cvsroot/spambayes/spambayes/Outlook2000 In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv7668/Outlook2000 Modified Files: train.py Log Message: Wrap strings to translate in _(). Leave all log messages untranslated, because logs are more use to the spambayes@python.org people than they are to users, really, and I don't want to have to translate logs into English to try and help people. Index: train.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/Outlook2000/train.py,v retrieving revision 1.38 retrieving revision 1.39 diff -C2 -d -r1.38 -r1.39 *** train.py 15 Oct 2004 02:04:55 -0000 1.38 --- train.py 2 Nov 2004 21:36:54 -0000 1.39 *************** *** 93,97 **** def real_trainer(classifier_data, config, message_store, progress): ! progress.set_status("Counting messages") num_msgs = 0 --- 93,97 ---- def real_trainer(classifier_data, config, message_store, progress): ! progress.set_status(_("Counting messages")) num_msgs = 0 *************** *** 104,108 **** for f in message_store.GetFolderGenerator(config.training.ham_folder_ids, config.training.ham_include_sub): ! progress.set_status("Processing good folder '%s'" % (f.name,)) train_folder(f, 0, classifier_data, progress) if progress.stop_requested(): --- 104,108 ---- for f in message_store.GetFolderGenerator(config.training.ham_folder_ids, config.training.ham_include_sub): ! progress.set_status(_("Processing good folder '%s'") % (f.name,)) train_folder(f, 0, classifier_data, progress) if progress.stop_requested(): *************** *** 110,114 **** for f in message_store.GetFolderGenerator(config.training.spam_folder_ids, config.training.spam_include_sub): ! progress.set_status("Processing spam folder '%s'" % (f.name,)) train_folder(f, 1, classifier_data, progress) if progress.stop_requested(): --- 110,114 ---- for f in message_store.GetFolderGenerator(config.training.spam_folder_ids, config.training.spam_include_sub): ! progress.set_status(_("Processing spam folder '%s'") % (f.name,)) train_folder(f, 1, classifier_data, progress) if progress.stop_requested(): *************** *** 121,125 **** # Setup the next "stage" in the progress dialog. progress.set_max_ticks(1) ! progress.set_status("Writing the database...") classifier_data.Save() --- 121,125 ---- # Setup the next "stage" in the progress dialog. progress.set_max_ticks(1) ! progress.set_status(_("Writing the database...")) classifier_data.Save() *************** *** 130,134 **** if not config.training.ham_folder_ids and not config.training.spam_folder_ids: ! progress.error("You must specify at least one spam or one good folder") return --- 130,134 ---- if not config.training.ham_folder_ids and not config.training.spam_folder_ids: ! progress.error(_("You must specify at least one spam or one good folder")) return *************** *** 155,161 **** # Saving is really slow sometimes, but we only have 1 tick for that anyway if rescore: ! stages = ("Training", .3), ("Saving", .1), ("Scoring", .6) else: ! stages = ("Training", .9), ("Saving", .1) progress.set_stages(stages) --- 155,161 ---- # Saving is really slow sometimes, but we only have 1 tick for that anyway if rescore: ! stages = (_("Training"), .3), (_("Saving"), .1), (_("Scoring"), .6) else: ! stages = (_("Training"), .9), (_("Saving"), .1) progress.set_stages(stages) *************** *** 190,194 **** bayes = classifier_data.bayes ! progress.set_status("Completed training with %d spam and %d good messages" % (bayes.nspam, bayes.nham)) def main(): --- 190,195 ---- bayes = classifier_data.bayes ! progress.set_status(_("Completed training with %d spam and %d good messages") % (bayes.nspam, bayes.nham)) ! def main(): From anadelonbrin at users.sourceforge.net Tue Nov 2 22:37:39 2004 From: anadelonbrin at users.sourceforge.net (Tony Meyer) Date: Tue Nov 2 22:37:48 2004 Subject: [Spambayes-checkins] spambayes/Outlook2000 config.py,1.32,1.33 Message-ID: Update of /cvsroot/spambayes/spambayes/Outlook2000 In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv7804/Outlook2000 Modified Files: config.py Log Message: Back out an experimental option I committed by mistake. Index: config.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/Outlook2000/config.py,v retrieving revision 1.32 retrieving revision 1.33 diff -C2 -d -r1.32 -r1.33 *** config.py 2 Nov 2004 21:33:46 -0000 1.32 --- config.py 2 Nov 2004 21:37:37 -0000 1.33 *************** *** 119,125 **** ("timer_interval", "obsolete", 1000, "", INTEGER, RESTORE), ("timer_only_receive_folders", "obsolete", True, "", BOOLEAN, RESTORE), - # Rather than fpfnunsure, do tte. DeleteAs/RecoverFrom just move - # the message, and a tte update is done on close. - ("train_to_exhaustion", "Train to exhaustion", False, "", BOOLEAN, RESTORE), ), "Training" : ( --- 119,122 ---- From anadelonbrin at users.sourceforge.net Wed Nov 3 02:15:07 2004 From: anadelonbrin at users.sourceforge.net (Tony Meyer) Date: Wed Nov 3 02:15:10 2004 Subject: [Spambayes-checkins] spambayes/spambayes classifier.py,1.28,1.29 Message-ID: Update of /cvsroot/spambayes/spambayes/spambayes In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv20461/spambayes Modified Files: classifier.py Log Message: Fix [ 922063 ] Intermittent sb_filter.py faliure with URL pickle This is still ugly experimental code, but it might as well be robust ugly experimental code . If something goes wrong loading the URL pickles, start with fresh ones (they are only caches, so that shouldn't hurt). When saving, save to a temp file first. Index: classifier.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/spambayes/classifier.py,v retrieving revision 1.28 retrieving revision 1.29 diff -C2 -d -r1.28 -r1.29 *** classifier.py 29 Oct 2004 00:14:42 -0000 1.28 --- classifier.py 3 Nov 2004 01:15:04 -0000 1.29 *************** *** 617,621 **** if os.path.exists(self.bad_url_cache_name): b_file = file(self.bad_url_cache_name, "r") ! self.bad_urls = pickle.load(b_file) b_file.close() else: --- 617,630 ---- if os.path.exists(self.bad_url_cache_name): b_file = file(self.bad_url_cache_name, "r") ! try: ! self.bad_urls = pickle.load(b_file) ! except IOError, ValueError: ! # Something went wrong loading it (bad pickle, ! # probably). Start afresh. ! if options["globals", "verbose"]: ! print >>sys.stderr, "Bad URL pickle, using new." ! self.bad_urls = {"url:non_resolving": (), ! "url:non_html": (), ! "url:unknown_error": ()} b_file.close() else: *************** *** 627,631 **** if os.path.exists(self.http_error_cache_name): h_file = file(self.http_error_cache_name, "r") ! self.http_error_urls = pickle.load(h_file) h_file.close() else: --- 636,647 ---- if os.path.exists(self.http_error_cache_name): h_file = file(self.http_error_cache_name, "r") ! try: ! self.http_error_urls = pickle.load(h_file) ! except IOError, ValueError: ! # Something went wrong loading it (bad pickle, ! # probably). Start afresh. ! if options["globals", "verbose"]: ! print >>sys.stderr, "Bad HHTP error pickle, using new." ! self.http_error_urls = {} h_file.close() else: *************** *** 636,645 **** # XXX be a good thing long-term (if a previously invalid URL # XXX becomes valid, for example). ! b_file = file(self.bad_url_cache_name, "w") ! pickle.dump(self.bad_urls, b_file) ! b_file.close() ! h_file = file(self.http_error_cache_name, "w") ! pickle.dump(self.http_error_urls, h_file) ! h_file.close() def slurp(self, proto, url): --- 652,668 ---- # XXX be a good thing long-term (if a previously invalid URL # XXX becomes valid, for example). ! for name, data in [(self.bad_url_cache_name, self.bad_urls), ! (self.http_error_cache_name, self.http_error_urls),]: ! # Save to a temp file first, in case something goes wrong. ! cache = open(name + ".tmp", "w") ! pickle.dump(data, cache) ! cache.close() ! try: ! os.rename(name + ".tmp", name) ! except OSError: ! # Atomic replace isn't possible with win32, so just ! # remove and rename. ! os.remove(name) ! os.rename(name + ".tmp", name) def slurp(self, proto, url): From anadelonbrin at users.sourceforge.net Wed Nov 3 03:05:52 2004 From: anadelonbrin at users.sourceforge.net (Tony Meyer) Date: Wed Nov 3 03:05:54 2004 Subject: [Spambayes-checkins] spambayes/scripts sb_mboxtrain.py,1.13,1.14 Message-ID: Update of /cvsroot/spambayes/spambayes/scripts In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv29252/scripts Modified Files: sb_mboxtrain.py Log Message: Fix [ 831864 ] sb_mboxtrain.py: flock vs. lockf This appears (from the Python documentation) to be a harmless fix, and shouldn't cause any problems. Tested on Mandrake 9.1, cygwin and some sort of Redhat system, although in a very limited way. Index: sb_mboxtrain.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/scripts/sb_mboxtrain.py,v retrieving revision 1.13 retrieving revision 1.14 diff -C2 -d -r1.13 -r1.14 *** sb_mboxtrain.py 23 Jul 2004 05:00:06 -0000 1.13 --- sb_mboxtrain.py 3 Nov 2004 02:05:49 -0000 1.14 *************** *** 210,214 **** raise ! fcntl.lockf(f, fcntl.LOCK_UN) f.close() if loud: --- 210,214 ---- raise ! fcntl.flock(f, fcntl.LOCK_UN) f.close() if loud: From anadelonbrin at users.sourceforge.net Wed Nov 3 03:49:33 2004 From: anadelonbrin at users.sourceforge.net (Tony Meyer) Date: Wed Nov 3 03:49:37 2004 Subject: [Spambayes-checkins] spambayes/scripts sb_dbexpimp.py,1.14,1.15 Message-ID: Update of /cvsroot/spambayes/spambayes/scripts In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv3646/scripts Modified Files: sb_dbexpimp.py Log Message: Fix [ 1022848 ] sb_dbexpimp.py crashes while importing into pickle file One little bit of the code (which just printed out the number of words in the db) assumed that useDBM would be True/False, when it may be other things these days. Fix that. Index: sb_dbexpimp.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/scripts/sb_dbexpimp.py,v retrieving revision 1.14 retrieving revision 1.15 diff -C2 -d -r1.14 -r1.15 *** sb_dbexpimp.py 30 May 2004 17:01:38 -0000 1.14 --- sb_dbexpimp.py 3 Nov 2004 02:49:30 -0000 1.15 *************** *** 230,234 **** print "Finished storing database" ! if useDBM: words = bayes.db.keys() words.remove(bayes.statekey) --- 230,234 ---- print "Finished storing database" ! if useDBM == "dbm" or useDBM == True: words = bayes.db.keys() words.remove(bayes.statekey) *************** *** 250,254 **** sys.exit() ! useDBM = False newDBM = True dbFN = None --- 250,254 ---- sys.exit() ! useDBM = "pickle" newDBM = True dbFN = None From anadelonbrin at users.sourceforge.net Fri Nov 5 03:34:31 2004 From: anadelonbrin at users.sourceforge.net (Tony Meyer) Date: Fri Nov 5 03:34:35 2004 Subject: [Spambayes-checkins] spambayes/spambayes/test test_sb_server.py, NONE, 1.1 test_sb-server.py, 1.7, NONE Message-ID: Update of /cvsroot/spambayes/spambayes/spambayes/test In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv19184/spambayes/test Added Files: test_sb_server.py Removed Files: test_sb-server.py Log Message: I knew this day would come :) I want to import some of the stuff in test_sb-server, but can't (because there is a '-' in the name). I could factor it all out, but this file doesn't have much CVS history, so might as well rename it to match the actual script it's testing. --- NEW FILE: test_sb_server.py --- #! /usr/bin/env python """Test the POP3 proxy is working correctly. When using the -z command line option, carries out a test that the POP3 proxy can be connected to, that incoming mail is classified, that pipelining is removed from the CAPA[bility] query, and that the web ui is present. The -t option runs a fake POP3 server on port 8110. This is the same server that the -z option uses, and may be separately run for other testing purposes. Usage: test_sb-server.py [options] options: -z : Runs a self-test and exits. -t : Runs a fake POP3 server on port 8110 (for testing). -h : Displays this help message. """ # This module is part of the spambayes project, which is Copyright 2002 # The Python Software Foundation and is covered by the Python Software # Foundation license. __author__ = "Richie Hindle " __credits__ = "All the Spambayes folk." try: True, False except NameError: # Maintain compatibility with Python 2.2 True, False = 1, 0 # This code originally formed a part of pop3proxy.py. If you are examining # the history of this file, you may need to go back to there. todo = """ Web training interface: o Functional tests. """ # One example of spam and one of ham - both are used to train, and are # then classified. Not a good test of the classifier, but a perfectly # good test of the POP3 proxy. The bodies of these came from the # spambayes project, and Richie added the headers because the # originals had no headers. spam1 = """From: friend@public.com Subject: Make money fast Hello tim_chandler , Want to save money ? Now is a good time to consider refinancing. Rates are low so you can cut your current payments and save money. http://64.251.22.101/interest/index%38%30%300%2E%68t%6D Take off list on site [s5] """ good1 = """From: chris@example.com Subject: ZPT and DTML Jean Jordaan wrote: > 'Fraid so ;> It contains a vintage dtml-calendar tag. > http://www.zope.org/Members/teyc/CalendarTag > > Hmm I think I see what you mean: one needn't manually pass on the > namespace to a ZPT? Yeah, Page Templates are a bit more clever, sadly, DTML methods aren't :-( Chris """ import asyncore import socket import operator import re import getopt import sys, os import sb_test_support sb_test_support.fix_sys_path() from spambayes import Dibbler from spambayes import tokenizer from spambayes.UserInterface import UserInterfaceServer from spambayes.ProxyUI import ProxyUserInterface from sb_server import BayesProxyListener from sb_server import state, _recreateState from spambayes.Options import options # HEADER_EXAMPLE is the longest possible header - the length of this one # is added to the size of each message. HEADER_EXAMPLE = '%s: xxxxxxxxxxxxxxxxxxxx\r\n' % \ options["Headers", "classification_header_name"] class TestListener(Dibbler.Listener): """Listener for TestPOP3Server. Works on port 8110, to co-exist with real POP3 servers.""" def __init__(self, socketMap=asyncore.socket_map): Dibbler.Listener.__init__(self, 8110, TestPOP3Server, (socketMap,), socketMap=socketMap) class TestPOP3Server(Dibbler.BrighterAsyncChat): """Minimal POP3 server, for testing purposes. Doesn't support UIDL. USER, PASS, APOP, DELE and RSET simply return "+OK" without doing anything. Also understands the 'KILL' command, to kill it. The mail content is the example messages above. """ def __init__(self, clientSocket, socketMap): # Grumble: asynchat.__init__ doesn't take a 'map' argument, # hence the two-stage construction. Dibbler.BrighterAsyncChat.__init__(self, map=socketMap) Dibbler.BrighterAsyncChat.set_socket(self, clientSocket, socketMap) self.maildrop = [spam1, good1] self.set_terminator('\r\n') self.okCommands = ['USER', 'PASS', 'APOP', 'NOOP', 'DELE', 'RSET', 'QUIT', 'KILL'] self.handlers = {'CAPA': self.onCapa, 'STAT': self.onStat, 'LIST': self.onList, 'RETR': self.onRetr, 'TOP': self.onTop} self.push("+OK ready\r\n") self.request = '' def collect_incoming_data(self, data): """Asynchat override.""" self.request = self.request + data def found_terminator(self): """Asynchat override.""" if ' ' in self.request: command, args = self.request.split(None, 1) else: command, args = self.request, '' command = command.upper() if command in self.okCommands: self.push("+OK (we hope)\r\n") if command == 'QUIT': self.close_when_done() if command == 'KILL': self.socket.shutdown(2) self.close() raise SystemExit else: handler = self.handlers.get(command, self.onUnknown) self.push(handler(command, args)) # Or push_slowly for testing self.request = '' def push_slowly(self, response): """Useful for testing.""" for c in response: self.push(c) time.sleep(0.02) def onCapa(self, command, args): """POP3 CAPA command. This lies about supporting pipelining for test purposes - the POP3 proxy *doesn't* support pipelining, and we test that it correctly filters out that capability from the proxied capability list. Ditto for STLS.""" lines = ["+OK Capability list follows", "PIPELINING", "STLS", "TOP", ".", ""] return '\r\n'.join(lines) def onStat(self, command, args): """POP3 STAT command.""" maildropSize = reduce(operator.add, map(len, self.maildrop)) maildropSize += len(self.maildrop) * len(HEADER_EXAMPLE) return "+OK %d %d\r\n" % (len(self.maildrop), maildropSize) def onList(self, command, args): """POP3 LIST command, with optional message number argument.""" if args: try: number = int(args) except ValueError: number = -1 if 0 < number <= len(self.maildrop): return "+OK %d\r\n" % len(self.maildrop[number-1]) else: return "-ERR no such message\r\n" else: returnLines = ["+OK"] for messageIndex in range(len(self.maildrop)): size = len(self.maildrop[messageIndex]) returnLines.append("%d %d" % (messageIndex + 1, size)) returnLines.append(".") return '\r\n'.join(returnLines) + '\r\n' def _getMessage(self, number, maxLines): """Implements the POP3 RETR and TOP commands.""" if 0 < number <= len(self.maildrop): message = self.maildrop[number-1] headers, body = message.split('\n\n', 1) bodyLines = body.split('\n')[:maxLines] message = headers + '\r\n\r\n' + '\n'.join(bodyLines) return "+OK\r\n%s\r\n.\r\n" % message else: return "-ERR no such message\r\n" def onRetr(self, command, args): """POP3 RETR command.""" try: number = int(args) except ValueError: number = -1 return self._getMessage(number, 12345) def onTop(self, command, args): """POP3 RETR command.""" try: number, lines = map(int, args.split()) except ValueError: number, lines = -1, -1 return self._getMessage(number, lines) def onUnknown(self, command, args): """Unknown POP3 command.""" return "-ERR Unknown command: %s\r\n" % repr(command) def test(): """Runs a self-test using TestPOP3Server, a minimal POP3 server that serves the example emails above. """ # Run a proxy and a test server in separate threads with separate # asyncore environments. import threading state.isTest = True testServerReady = threading.Event() def runTestServer(): testSocketMap = {} TestListener(socketMap=testSocketMap) testServerReady.set() asyncore.loop(map=testSocketMap) proxyReady = threading.Event() def runUIAndProxy(): httpServer = UserInterfaceServer(8881) proxyUI = ProxyUserInterface(state, _recreateState) httpServer.register(proxyUI) BayesProxyListener('localhost', 8110, ('', 8111)) state.bayes.learn(tokenizer.tokenize(spam1), True) state.bayes.learn(tokenizer.tokenize(good1), False) proxyReady.set() Dibbler.run() threading.Thread(target=runTestServer).start() testServerReady.wait() threading.Thread(target=runUIAndProxy).start() proxyReady.wait() # Connect to the proxy and the test server. proxy = socket.socket(socket.AF_INET, socket.SOCK_STREAM) proxy.connect(('localhost', 8111)) response = proxy.recv(100) assert response == "+OK ready\r\n" pop3Server = socket.socket(socket.AF_INET, socket.SOCK_STREAM) pop3Server.connect(('localhost', 8110)) response = pop3Server.recv(100) assert response == "+OK ready\r\n" # Verify that the test server claims to support pipelining. pop3Server.send("capa\r\n") response = pop3Server.recv(1000) assert response.find("PIPELINING") >= 0 # Ask for the capabilities via the proxy, and verify that the proxy # is filtering out the PIPELINING capability. proxy.send("capa\r\n") response = proxy.recv(1000) assert response.find("PIPELINING") == -1 # Verify that the test server claims to support STLS. pop3Server.send("capa\r\n") response = pop3Server.recv(1000) assert response.find("STLS") >= 0 # Ask for the capabilities via the proxy, and verify that the proxy # is filtering out the PIPELINING capability. proxy.send("capa\r\n") response = proxy.recv(1000) assert response.find("STLS") == -1 # Stat the mailbox to get the number of messages. proxy.send("stat\r\n") response = proxy.recv(100) count, totalSize = map(int, response.split()[1:3]) assert count == 2 # Loop through the messages ensuring that they have judgement # headers. for i in range(1, count+1): response = "" proxy.send("retr %d\r\n" % i) while response.find('\n.\r\n') == -1: response = response + proxy.recv(1000) assert response.find(options["Headers", "classification_header_name"]) >= 0 # Smoke-test the HTML UI. httpServer = socket.socket(socket.AF_INET, socket.SOCK_STREAM) httpServer.connect(('localhost', 8881)) httpServer.sendall("get / HTTP/1.0\r\n\r\n") response = '' while 1: packet = httpServer.recv(1000) if not packet: break response += packet assert re.search(r"(?s).*SpamBayes proxy.*", response) # Kill the proxy and the test server. proxy.sendall("kill\r\n") proxy.recv(100) pop3Server.sendall("kill\r\n") pop3Server.recv(100) def run(): # Read the arguments. try: opts, args = getopt.getopt(sys.argv[1:], 'htz') except getopt.error, msg: print >>sys.stderr, str(msg) + '\n\n' + __doc__ sys.exit() runSelfTest = False for opt, arg in opts: if opt == '-h': print >>sys.stderr, __doc__ sys.exit() elif opt == '-t': state.isTest = True state.runTestServer = True elif opt == '-z': state.isTest = True runSelfTest = True state.createWorkers() if runSelfTest: print "\nRunning self-test...\n" state.buildServerStrings() test() print "Self-test passed." # ...else it would have asserted. elif state.runTestServer: print "Running a test POP3 server on port 8110..." TestListener() asyncore.loop() else: print >>sys.stderr, __doc__ if __name__ == '__main__': run() --- test_sb-server.py DELETED --- From anadelonbrin at users.sourceforge.net Fri Nov 5 03:36:28 2004 From: anadelonbrin at users.sourceforge.net (Tony Meyer) Date: Fri Nov 5 03:36:32 2004 Subject: [Spambayes-checkins] spambayes/spambayes/test test_sb_imapfilter.py, 1.4, 1.5 Message-ID: Update of /cvsroot/spambayes/spambayes/spambayes/test In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv19506/spambayes/test Modified Files: test_sb_imapfilter.py Log Message: Fix typo in import. The test failed with Python 2.2 - not because sb_imapfilter did, but because there were lots of "if substr in string" instances in the test script. Replace those with find() != -1 so that the test can be run with 2.2. Index: test_sb_imapfilter.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/spambayes/test/test_sb_imapfilter.py,v retrieving revision 1.4 retrieving revision 1.5 diff -C2 -d -r1.4 -r1.5 *** test_sb_imapfilter.py 14 Oct 2004 04:01:10 -0000 1.4 --- test_sb_imapfilter.py 5 Nov 2004 02:36:25 -0000 1.5 *************** *** 15,19 **** from spambayes import Dibbler from spambayes.Options import options ! from spambayes.classifier import Classifer from sb_imapfilter import BadIMAPResponseError from spambayes.message import message_from_string --- 15,19 ---- from spambayes import Dibbler from spambayes.Options import options ! from spambayes.classifier import Classifier from sb_imapfilter import BadIMAPResponseError from spambayes.message import message_from_string *************** *** 89,94 **** class TestListener(Dibbler.Listener): ! """Listener for TestIMAP4Server. Works on port 8143, to co-exist ! with real IMAP4 servers.""" def __init__(self, socketMap=asyncore.socket_map): Dibbler.Listener.__init__(self, IMAP_PORT, TestIMAP4Server, --- 89,93 ---- class TestListener(Dibbler.Listener): ! """Listener for TestIMAP4Server.""" def __init__(self, socketMap=asyncore.socket_map): Dibbler.Listener.__init__(self, IMAP_PORT, TestIMAP4Server, *************** *** 239,243 **** args = args.upper() results = () ! if "UNDELETED" in args: for msg_id in UNDELETED_IDS: if uid: --- 238,242 ---- args = args.upper() results = () ! if args.find("UNDELETED") != -1: for msg_id in UNDELETED_IDS: if uid: *************** *** 259,263 **** for msg in msg_nums: response[msg] = [] ! if "UID" in msg_parts: if uid: for msg in msg_nums: --- 258,262 ---- for msg in msg_nums: response[msg] = [] ! if msg_parts.find("UID") != -1: if uid: for msg in msg_nums: *************** *** 267,271 **** response[msg].append("FETCH (UID %s)" % (IMAP_UIDS[int(msg)])) ! if "BODY.PEEK[]" in msg_parts: for msg in msg_nums: if uid: --- 266,270 ---- response[msg].append("FETCH (UID %s)" % (IMAP_UIDS[int(msg)])) ! if msg_parts.find("BODY.PEEK[]") != -1: for msg in msg_nums: if uid: *************** *** 276,280 **** (len(IMAP_MESSAGES[msg_uid])), IMAP_MESSAGES[msg_uid])) ! if "RFC822.HEADER" in msg_parts: for msg in msg_nums: if uid: --- 275,279 ---- (len(IMAP_MESSAGES[msg_uid])), IMAP_MESSAGES[msg_uid])) ! if msg_parts.find("RFC822.HEADER") != -1: for msg in msg_nums: if uid: *************** *** 286,290 **** response[msg].append(("FETCH (RFC822.HEADER {%s}" % (len(headers),), headers)) ! if "FLAGS INTERNALDATE" in msg_parts: # We make up flags & dates. for msg in msg_nums: --- 285,289 ---- response[msg].append(("FETCH (RFC822.HEADER {%s}" % (len(headers),), headers)) ! if msg_parts.find("FLAGS INTERNALDATE") != -1: # We make up flags & dates. for msg in msg_nums: *************** *** 514,518 **** # message 103 is replaced with one that does, this will fail with # Python 2.4/email 3.0. ! has_header = "X-Spambayes-Exception: " in new_msg.as_string() has_defect = hasattr(new_msg, "defects") and len(new_msg.defects) > 0 self.assert_(has_header or has_defect) --- 513,517 ---- # message 103 is replaced with one that does, this will fail with # Python 2.4/email 3.0. ! has_header = new_msg.as_string().find("X-Spambayes-Exception: ") != -1 has_defect = hasattr(new_msg, "defects") and len(new_msg.defects) > 0 self.assert_(has_header or has_defect) From anadelonbrin at users.sourceforge.net Fri Nov 5 03:37:36 2004 From: anadelonbrin at users.sourceforge.net (Tony Meyer) Date: Fri Nov 5 03:37:39 2004 Subject: [Spambayes-checkins] spambayes/spambayes/test test_sb_pop3dnd.py, NONE, 1.1 Message-ID: Update of /cvsroot/spambayes/spambayes/spambayes/test In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv19708/spambayes/test Added Files: test_sb_pop3dnd.py Log Message: Initial unittests for sb_pop3dnd. --- NEW FILE: test_sb_pop3dnd.py --- # Test sb_pop3dnd script. import sys import email import time import thread import imaplib import unittest import sb_test_support sb_test_support.fix_sys_path() from spambayes import Dibbler from spambayes.Options import options from spambayes.classifier import Classifier from spambayes.message import message_from_string from sb_pop3dnd import IMAPMessage, DynamicIMAPMessage, IMAPFileMessage from sb_pop3dnd import IMAPFileMessageFactory # We borrow the dummy POP3 server that test_sb_server uses. # And also the test messages. from test_sb_server import TestListener, good1, spam1 POP_PORT = 8110 class IMAPMessageTest(unittest.TestCase): def testIMAPMessage(self): msg = IMAPMessage() self.assertEqual(msg.date, None) msg = IMAPMessage("fake date") self.assertEqual(msg.date, "fake date") for att in ["date", "deleted", "flagged", "seen", "draft", "recent", "answered"]: self.assert_(att in msg.stored_attributes) for flag in ["deleted", "answered", "flagged", "seen", "draft", "recent"]: self.assertEqual(getattr(msg, flag), False) def testGetAllHeaders(self): msg = email.message_from_string(good1, _class=IMAPMessage) correct_msg = email.message_from_string(good1) # Without passing a list, we should get them all. # We get them in lowercase, because this is a twisted # requirement. headers = msg.getHeaders(False) for k, v in correct_msg.items(): self.assertEqual(headers[k.lower()], v) # Should work the same with negate headers = msg.getHeaders(True) for k, v in correct_msg.items(): self.assertEqual(headers[k.lower()], v) def testGetIndividualHeaders(self): msg = email.message_from_string(good1, _class=IMAPMessage) correct_msg = email.message_from_string(good1) # We get them in lowercase, because this is a twisted # requirement. We pass them in uppercard, because this # is a twisted requirement. It's not called twisted for # nothing! headers = msg.getHeaders(False, "SUBJECT") self.assertEqual(headers["subject"], correct_msg["Subject"]) # Negate should get all the other headers. headers = msg.getHeaders(True, "SUBJECT") self.assert_("subject" not in headers) for k, v in correct_msg.items(): if k == "Subject": continue self.assertEqual(headers[k.lower()], v) def testGetFlags(self): msg = IMAPMessage() all_flags = ["deleted", "answered", "flagged", "seen", "draft", "recent"] for flag in all_flags: setattr(msg, flag, True) flags = list(msg.getFlags()) for flag in all_flags: self.assert_("\\%s" % (flag.upper(),) in flags) for flag in all_flags: setattr(msg, flag, False) flags = list(msg.getFlags()) self.assertEqual(flags, []) def testGetInternalDate(self): msg = IMAPMessage() self.assertRaises(AssertionError, msg.getInternalDate) msg = IMAPMessage("fake date") self.assertEqual(msg.getInternalDate(), "fake date") def testGetBodyFile(self): msg = email.message_from_string(spam1, _class=IMAPMessage) correct_msg = email.message_from_string(spam1) body = msg.getBodyFile() # Our messages are designed for transmittal, so have # \r\n rather than \n as end-of-line. self.assertEqual(body.read().replace('\r\n', '\n'), correct_msg.get_payload()) def testGetSize(self): msg = email.message_from_string(spam1, _class=IMAPMessage) correct_msg = email.message_from_string(spam1) # Our messages are designed for transmittal, so have # \r\n rather than \n as end-of-line. self.assertEqual(msg.getSize(), len(correct_msg.as_string().replace('\n', '\r\n'))) def testGetUID(self): msg = IMAPMessage() msg.id = "fake id" # Heh self.assertEqual(msg.getUID(), "fake id") def testIsMultipart(self): msg = IMAPMessage() self.assertEqual(msg.isMultipart(), False) def testGetSubPart(self): msg = IMAPMessage() self.assertRaises(NotImplementedError, msg.getSubPart, None) def testClearFlags(self): msg = IMAPMessage() all_flags = ["deleted", "answered", "flagged", "seen", "draft", "recent"] for flag in all_flags: setattr(msg, flag, True) msg.clear_flags() for flag in all_flags: self.assertEqual(getattr(msg, flag), False) def testFlags(self): msg = IMAPMessage() all_flags = ["deleted", "answered", "flagged", "seen", "draft", "recent"] for flag in all_flags: setattr(msg, flag, True) flags = list(msg.flags()) for flag in all_flags: self.assert_("\\%s" % (flag.upper(),) in flags) for flag in all_flags: setattr(msg, flag, False) flags = list(msg.flags()) self.assertEqual(flags, []) def testTrain(self): # XXX To do pass def testStructure(self): # XXX To do pass def testBody(self): msg = email.message_from_string(good1, _class=IMAPMessage) correct_msg = email.message_from_string(good1) body = msg.body() # Our messages are designed for transmittal, so have # \r\n rather than \n as end-of-line. self.assertEqual(body.replace('\r\n', '\n'), correct_msg.get_payload()) def testHeaders(self): msg = email.message_from_string(good1, _class=IMAPMessage) correct_msg = email.message_from_string(good1) headers = msg.headers() correct_headers = "\r\b".join(["%s: %s" % (k, v) \ for k, v in correct_msg.items()]) class DynamicIMAPMessageTest(unittest.TestCase): def setUp(self): def fakemsg(body=False, headers=False): msg = [] if headers: msg.append("Header: Fake") if body: msg.append("\r\n") if body: msg.append("Fake Body") return "\r\n".join(msg) self.msg = DynamicIMAPMessage(fakemsg) def testDate(self): date = imaplib.Time2Internaldate(time.time())[1:-1] self.assertEqual(self.msg.date, date) def testLoad(self): self.assertEqual(self.msg.as_string(), "Header: Fake\r\n\r\nFake Body") class IMAPFileMessageTest(unittest.TestCase): def setUp(self): self.msg = IMAPFileMessage("filename", "directory") def testID(self): self.assertEqual(self.msg.id, "filename") def testDate(self): date = imaplib.Time2Internaldate(time.time())[1:-1] self.assertEqual(self.msg.date, date) class IMAPFileMessageFactoryTest(unittest.TestCase): def testCreateNoContent(self): factory = IMAPFileMessageFactory() msg = factory.create("key", "directory") self.assertEqual(msg.id, key) self.assert_(isinstance(msg, type(IMAPFileMessage()))) def suite(): suite = unittest.TestSuite() for cls in (IMAPMessageTest, DynamicIMAPMessageTest, IMAPFileMessageTest, ): suite.addTest(unittest.makeSuite(cls)) return suite if __name__=='__main__': def runTestServer(): import asyncore asyncore.loop() TestListener() thread.start_new_thread(runTestServer, ()) sb_test_support.unittest_main(argv=sys.argv + ['suite']) From anadelonbrin at users.sourceforge.net Fri Nov 5 04:00:00 2004 From: anadelonbrin at users.sourceforge.net (Tony Meyer) Date: Fri Nov 5 04:00:04 2004 Subject: [Spambayes-checkins] spambayes/scripts sb_upload.py,1.4,1.5 Message-ID: Update of /cvsroot/spambayes/spambayes/scripts In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv27894/scripts Modified Files: sb_upload.py Log Message: Correct docstring. Index: sb_upload.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/scripts/sb_upload.py,v retrieving revision 1.4 retrieving revision 1.5 diff -C2 -d -r1.4 -r1.5 *** sb_upload.py 7 Oct 2004 06:09:57 -0000 1.4 --- sb_upload.py 5 Nov 2004 02:59:51 -0000 1.5 *************** *** 8,16 **** interface, which will save the message in the 'unknown' cache, ready for you to classify it. It does not do any training, just saves it ! ready for you to classify. usage: %(progname)s [-h] [-n] [-s server] [-p port] [-r N] [-o section:option:value] ! [-t (ham|spam)] [-o section:option:value] Options: --- 8,16 ---- interface, which will save the message in the 'unknown' cache, ready for you to classify it. It does not do any training, just saves it ! ready for you to classify (unless you use the -t switch). usage: %(progname)s [-h] [-n] [-s server] [-p port] [-r N] [-o section:option:value] ! [-t (ham|spam)] Options: From anadelonbrin at users.sourceforge.net Fri Nov 5 04:03:15 2004 From: anadelonbrin at users.sourceforge.net (Tony Meyer) Date: Fri Nov 5 04:03:19 2004 Subject: [Spambayes-checkins] spambayes/spambayes Stats.py, 1.7, 1.8 mboxutils.py, 1.8, 1.9 message.py, 1.55, 1.56 Message-ID: Update of /cvsroot/spambayes/spambayes/spambayes In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv28691/spambayes Modified Files: Stats.py mboxutils.py message.py Log Message: Make message.insert_exception_string optionally put in the id header, for compatibility with sb_server and sb_pop3dnd. Make use of HTML in Stats.GetStats optional. Index: Stats.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/spambayes/Stats.py,v retrieving revision 1.7 retrieving revision 1.8 diff -C2 -d -r1.7 -r1.8 *** Stats.py 2 Nov 2004 06:33:23 -0000 1.7 --- Stats.py 5 Nov 2004 03:03:00 -0000 1.8 *************** *** 93,97 **** self.trn_ham += 1 ! def GetStats(self): if self.total == 0: return ["SpamBayes has processed zero messages"] --- 93,97 ---- self.trn_ham += 1 ! def GetStats(self, use_html=True): if self.total == 0: return ["SpamBayes has processed zero messages"] *************** *** 176,179 **** --- 176,186 ---- else: format_dict[key] = 'were' + # Possibly use HTML for breaks/tabs. + if use_html: + format_dict["br"] = "
" + format_dict["tab"] = "    " + else: + format_dict["br"] = "\r\n" + format_dict["tab"] = "\t" ## Our result should look something like this: *************** *** 200,232 **** push("SpamBayes has classified a total of " \ "%(num_seen)d message%(sp1)s:" \ ! "
    %(cls_ham)d " \ "(%(perc_cls_ham).0f%% of total) good" \ ! "
    %(cls_spam)d " \ "(%(perc_cls_spam).0f%% of total) spam" \ ! "
    %(cls_unsure)d " \ "(%(perc_cls_unsure).0f%% of total) unsure." % \ format_dict) push("%(correct)d message%(sp2)s %(wp1)s classified correctly " \ "(%(perc_correct).0f%% of total)" \ ! "
%(incorrect)d message%(sp3)s %(wp2)s classified " \ "incorrectly " \ "(%(perc_incorrect).0f%% of total)" \ ! "
    %(fp)d false positive%(sp4)s " \ "(%(perc_fp).0f%% of total)" \ ! "
    %(fn)d false negative%(sp5)s " \ "(%(perc_fn).0f%% of total)" % \ format_dict) push("%(trn_unsure_ham)d unsure%(sp6)s trained as good " \ "(%(unsure_ham_perc).0f%% of unsures)" \ ! "
%(trn_unsure_spam)d unsure%(sp7)s trained as spam " \ "(%(unsure_spam_perc).0f%% of unsures)" \ ! "
%(not_trn_unsure)d unsure%(sp8)s %(wp3)s not trained " \ "(%(unsure_not_perc).0f%% of unsures)" % \ format_dict) push("A total of %(trn_total)d message%(sp9)s have been trained:" \ ! "
    %(trn_ham)d good " \ "(%(trn_perc_ham)0.f%% good, %(trn_perc_unsure_ham)0.f%% " \ "unsure, %(trn_perc_fp).0f%% false positives)" \ ! "
    %(trn_spam)d spam " \ "(%(trn_perc_spam)0.f%% spam, %(trn_perc_unsure_spam)0.f%% " \ "unsure, %(trn_perc_fn).0f%% false negatives)" % \ --- 207,239 ---- push("SpamBayes has classified a total of " \ "%(num_seen)d message%(sp1)s:" \ ! "%(br)s%(tab)s%(cls_ham)d " \ "(%(perc_cls_ham).0f%% of total) good" \ ! "%(br)s%(tab)s%(cls_spam)d " \ "(%(perc_cls_spam).0f%% of total) spam" \ ! "%(br)s%(tab)s%(cls_unsure)d " \ "(%(perc_cls_unsure).0f%% of total) unsure." % \ format_dict) push("%(correct)d message%(sp2)s %(wp1)s classified correctly " \ "(%(perc_correct).0f%% of total)" \ ! "%(br)s%(incorrect)d message%(sp3)s %(wp2)s classified " \ "incorrectly " \ "(%(perc_incorrect).0f%% of total)" \ ! "%(br)s%(tab)s%(fp)d false positive%(sp4)s " \ "(%(perc_fp).0f%% of total)" \ ! "%(br)s%(tab)s%(fn)d false negative%(sp5)s " \ "(%(perc_fn).0f%% of total)" % \ format_dict) push("%(trn_unsure_ham)d unsure%(sp6)s trained as good " \ "(%(unsure_ham_perc).0f%% of unsures)" \ ! "%(br)s%(trn_unsure_spam)d unsure%(sp7)s trained as spam " \ "(%(unsure_spam_perc).0f%% of unsures)" \ ! "%(br)s%(not_trn_unsure)d unsure%(sp8)s %(wp3)s not trained " \ "(%(unsure_not_perc).0f%% of unsures)" % \ format_dict) push("A total of %(trn_total)d message%(sp9)s have been trained:" \ ! "%(br)s%(tab)s%(trn_ham)d good " \ "(%(trn_perc_ham)0.f%% good, %(trn_perc_unsure_ham)0.f%% " \ "unsure, %(trn_perc_fp).0f%% false positives)" \ ! "%(br)s%(tab)s%(trn_spam)d spam " \ "(%(trn_perc_spam)0.f%% spam, %(trn_perc_unsure_spam)0.f%% " \ "unsure, %(trn_perc_fn).0f%% false negatives)" % \ Index: mboxutils.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/spambayes/mboxutils.py,v retrieving revision 1.8 retrieving revision 1.9 diff -C2 -d -r1.8 -r1.9 *** mboxutils.py 25 May 2004 23:16:40 -0000 1.8 --- mboxutils.py 5 Nov 2004 03:03:00 -0000 1.9 *************** *** 117,121 **** function is imported by tokenizer, and our message class imports tokenizer, so we get a circular import problem. In any case, this ! function does need anything that our message class offers, so that shouldn't matter. """ --- 117,121 ---- function is imported by tokenizer, and our message class imports tokenizer, so we get a circular import problem. In any case, this ! function does not need anything that our message class offers, so that shouldn't matter. """ Index: message.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/spambayes/message.py,v retrieving revision 1.55 retrieving revision 1.56 diff -C2 -d -r1.55 -r1.56 *** message.py 1 Oct 2004 00:03:19 -0000 1.55 --- message.py 5 Nov 2004 03:03:00 -0000 1.56 *************** *** 495,499 **** # This is used by both sb_server and sb_imapfilter, so it's handy to have # it available separately. ! def insert_exception_header(string_msg, msg_id): """Insert an exception header into the given RFC822 message (as text). --- 495,499 ---- # This is used by both sb_server and sb_imapfilter, so it's handy to have # it available separately. ! def insert_exception_header(string_msg, msg_id=None): """Insert an exception header into the given RFC822 message (as text). *************** *** 510,520 **** header = email.Header.Header(dottedDetails, header_name=headerName) ! # Insert the exception header, and also insert the id header, # otherwise we might keep doing this message over and over again. # We also ensure that the line endings are /r/n as RFC822 requires. headers, body = re.split(r'\n\r?\n', string_msg, 1) header = re.sub(r'\r?\n', '\r\n', str(header)) ! headers += "\n%s: %s\r\n%s: %s\r\n\r\n" % \ ! (headerName, header, ! options["Headers", "mailid_header_name"], msg_id) return (headers + body, details) --- 510,522 ---- header = email.Header.Header(dottedDetails, header_name=headerName) ! # Insert the exception header, and optionally also insert the id header, # otherwise we might keep doing this message over and over again. # We also ensure that the line endings are /r/n as RFC822 requires. headers, body = re.split(r'\n\r?\n', string_msg, 1) header = re.sub(r'\r?\n', '\r\n', str(header)) ! headers += "\n%s: %s\r\n" % \ ! (headerName, header) ! if msg_id: ! headers += "%s: %s\r\n" % \ ! (options["Headers", "mailid_header_name"], msg_id) return (headers + body, details) From anadelonbrin at users.sourceforge.net Fri Nov 5 04:10:06 2004 From: anadelonbrin at users.sourceforge.net (Tony Meyer) Date: Fri Nov 5 04:10:11 2004 Subject: [Spambayes-checkins] spambayes/scripts sb_pop3dnd.py,1.10,1.11 Message-ID: Update of /cvsroot/spambayes/spambayes/scripts In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv29921/scripts Modified Files: sb_pop3dnd.py Log Message: Fix docstring. Stop using the web interface. This is against the point of the module, and everything exception configuration is now provided via the IMAP server itself. Fix bug in getHeaders where negation wouldn't work correctly. Fix use of assert. Remove code duplication in flags() Fix loading of dynamic messages to correctly generate the headers, so that envelope works. Change Factory and FileMessage to fit the new style. Change the fake email addresses to the same format as the notate_to option (i.e. @spambayes.invalid) Improve the "about" message to include the docstring. Add a dynamic stats message. Improve the dynamic status message to include everything that would normally be on the web interface. Add a "train as spam" folder, to separate out training and classifying as spam. Use the message.insert_exception_header utility function. Use twisted.Application in the new style to avoid deprecation warnings. Index: sb_pop3dnd.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/scripts/sb_pop3dnd.py,v retrieving revision 1.10 retrieving revision 1.11 diff -C2 -d -r1.10 -r1.11 *** sb_pop3dnd.py 14 Jul 2004 07:16:59 -0000 1.10 --- sb_pop3dnd.py 5 Nov 2004 03:10:04 -0000 1.11 *************** *** 1,6 **** #!/usr/bin/env python - from __future__ import generators - """POP3DND - provides drag'n'drop training ability for POP3 clients. --- 1,4 ---- *************** *** 9,15 **** other POP3 proxy). While messages classified as ham are simply passed through the proxy, messages that are classified as spam or unsure are ! intercepted and passed to the IMAP server. The IMAP server offers three folders - one where messages classified as spam end up, one for messages ! it is unsure about, and one for training ham. In other words, to use this application, setup your mail client to connect --- 7,13 ---- other POP3 proxy). While messages classified as ham are simply passed through the proxy, messages that are classified as spam or unsure are ! intercepted and passed to the IMAP server. The IMAP server offers four folders - one where messages classified as spam end up, one for messages ! it is unsure about, one for training ham, and one for training spam. In other words, to use this application, setup your mail client to connect *************** *** 20,64 **** spam and one for unsure messages. ! To train SpamBayes, use the spam folder, and the 'train_as_ham' folder. ! Any messages in these folders will be trained appropriately. This means ! that all messages that SpamBayes classifies as spam will also be trained ! as such. If you receive any 'false positives' (ham classified as spam), ! you *must* copy the message into the 'train_as_ham' folder to correct the ! training. You may also place any saved spam messages you have into this ! folder. ! ! So that SpamBayes knows about ham as well as spam, you will also need to ! move or copy mail into the 'train_as_ham' folder. These may come from ! the unsure folder, or from any other mail you have saved. It is a good ! idea to leave messages in the 'train_as_ham' and 'spam' folders, so that ! you can retrain from scratch if required. (However, you should always ! clear out your unsure folder, preferably moving or copying the messages ! into the appropriate training folder). This SpamBayes application is designed to work with Outlook Express, and provide the same sort of ease of use as the Outlook plugin. Although the ! majority of development and testing has been done with Outlook Express and ! Eudora, any mail client that supports both IMAP and POP3 should be able to ! use this application - if the client enables the user to work with an IMAP ! account and POP3 account side-by-side (and move messages between them), ! then it should work equally as well. ! ! This module includes the following classes: ! o IMAPMessage ! o DynamicIMAPMessage ! o IMAPFileMessage ! o IMAPFileMessageFactory ! o IMAPMailbox ! o SpambayesMailbox ! o SpambayesInbox ! o Trainer ! o SpambayesAccount ! o SpambayesIMAPServer ! o OneParameterFactory ! o MyBayesProxy ! o MyBayesProxyListener ! o IMAPState """ todo = """ o The RECENT flag should be unset at some point, but when? The --- 18,35 ---- spam and one for unsure messages. ! To train SpamBayes, use the 'train_as_spam' and 'train_as_ham' folders. ! Any messages in these folders will be trained appropriately. This SpamBayes application is designed to work with Outlook Express, and provide the same sort of ease of use as the Outlook plugin. Although the ! majority of development and testing has been done with Outlook Express, ! Eudora and Thunderbird, any mail client that supports both IMAP and POP3 ! should be able to use this application - if the client enables the user to ! work with an IMAP account and POP3 account side-by-side (and move messages ! between them), then it should work equally as well. """ + from __future__ import generators + todo = """ o The RECENT flag should be unset at some point, but when? The *************** *** 75,88 **** (with the <> operands), or get a part of a MIME message (by prepending a number). This should be added! - o If the user clicks the 'save and shutdown' button on the web - interface, this will only kill the POP3 proxy and web interface - threads, and not the IMAP server. We need to monitor the thread - that we kick off, and if it dies, we should die too. Need to figure - out how to do this in twisted. - o Apparently, twisted.internet.app is deprecated, and we should - use twisted.application instead. Need to figure out what that means! - o We could have a distinction between messages classified as spam - and messages to train as spam. At the moment we force users into - the 'incremental training' system available with the Outlook plug-in. o Suggestions? """ --- 46,49 ---- *************** *** 108,122 **** import errno import types import thread import getopt import imaplib import operator - import StringIO import email.Utils from twisted import cred from twisted.internet import defer from twisted.internet import reactor ! from twisted.internet.app import Application from twisted.internet.defer import maybeDeferred from twisted.internet.protocol import ServerFactory --- 69,89 ---- import errno import types + import email import thread import getopt import imaplib import operator import email.Utils + try: + import cStringIO as StringIO + except NameError: + import StringIO + from twisted import cred + import twisted.application.app from twisted.internet import defer from twisted.internet import reactor ! from twisted.internet import win32eventreactor from twisted.internet.defer import maybeDeferred from twisted.internet.protocol import ServerFactory *************** *** 129,138 **** from spambayes import message from spambayes.Options import options from spambayes.tokenizer import tokenize from spambayes import FileCorpus, Dibbler from spambayes.Version import get_version_string - from spambayes.ServerUI import ServerUserInterface - from spambayes.UserInterface import UserInterfaceServer from sb_server import POP3ProxyBase, State, _addressPortStr, _recreateState --- 96,104 ---- from spambayes import message + from spambayes.Stats import Stats from spambayes.Options import options from spambayes.tokenizer import tokenize from spambayes import FileCorpus, Dibbler from spambayes.Version import get_version_string from sb_server import POP3ProxyBase, State, _addressPortStr, _recreateState *************** *** 168,172 **** headers = {} for header, value in self.items(): ! if (header.lower() in names and not negate) or names == (): headers[header.lower()] = value return headers --- 134,139 ---- headers = {} for header, value in self.items(): ! if (header.upper() in names and not negate) or \ ! (header.upper() not in names and negate) or names == (): headers[header.lower()] = value return headers *************** *** 192,202 **** def getInternalDate(self): """Retrieve the date internally associated with this message.""" ! assert(self.date is not None, ! "Must set date to use IMAPMessage instance.") return self.date def getBodyFile(self): """Retrieve a file object containing the body of this message.""" ! # Note only body, not headers! s = StringIO.StringIO() s.write(self.body()) --- 159,169 ---- def getInternalDate(self): """Retrieve the date internally associated with this message.""" ! assert self.date is not None, \ ! "Must set date to use IMAPMessage instance." return self.date def getBodyFile(self): """Retrieve a file object containing the body of this message.""" ! # Note: only body, not headers! s = StringIO.StringIO() s.write(self.body()) *************** *** 256,273 **** def flags(self): """Return the message flags.""" ! all_flags = [] ! if self.deleted: ! all_flags.append("\\DELETED") ! if self.answered: ! all_flags.append("\\ANSWERED") ! if self.flagged: ! all_flags.append("\\FLAGGED") ! if self.seen: ! all_flags.append("\\SEEN") ! if self.draft: ! all_flags.append("\\DRAFT") ! if self.draft: ! all_flags.append("\\RECENT") ! return all_flags def train(self, classifier, isSpam): --- 223,227 ---- def flags(self): """Return the message flags.""" ! return list(self._flags_iter()) def train(self, classifier, isSpam): *************** *** 340,344 **** self.load() def load(self): ! self.set_payload(self.func(body=True, headers=True)) --- 294,303 ---- self.load() def load(self): ! # This only works for simple messages (non multi-part). ! self.set_payload(self.func(body=True)) ! # This only works for simple headers (no continuations). ! for headerstr in self.func(headers=True).split('\r\n'): ! header, value = headerstr.split(':') ! self[header] = value.strip() *************** *** 346,350 **** '''IMAP Message that persists as a file system artifact.''' ! def __init__(self, file_name, directory): """Constructor(message file name, corpus directory name).""" date = imaplib.Time2Internaldate(time.time())[1:-1] --- 305,309 ---- '''IMAP Message that persists as a file system artifact.''' ! def __init__(self, file_name=None, directory=None): """Constructor(message file name, corpus directory name).""" date = imaplib.Time2Internaldate(time.time())[1:-1] *************** *** 352,363 **** FileCorpus.FileMessage.__init__(self, file_name, directory) self.id = file_name - self.directory = directory class IMAPFileMessageFactory(FileCorpus.FileMessageFactory): '''MessageFactory for IMAPFileMessage objects''' ! def create(self, key, directory): '''Create a message object from a filename in a directory''' ! return IMAPFileMessage(key, directory) --- 311,328 ---- FileCorpus.FileMessage.__init__(self, file_name, directory) self.id = file_name class IMAPFileMessageFactory(FileCorpus.FileMessageFactory): '''MessageFactory for IMAPFileMessage objects''' ! def create(self, key, directory, content=None): '''Create a message object from a filename in a directory''' ! if content is None: ! return IMAPFileMessage(key, directory) ! msg = email.message_from_string(content, _class=IMAPFileMessage, ! strict=False) ! msg.id = key ! msg.file_name = key ! msg.directory = directory ! return msg *************** *** 396,401 **** self.nextUID = long(self.storage.keys()[-1]) + 1 # Calculate initial recent and unseen counts - # XXX Note that this will always end up with zero counts - # XXX until the flags are persisted. self.unseen_count = 0 self.recent_count = 0 --- 361,364 ---- *************** *** 416,420 **** def getUID(self, msg): """Return the UID of a message in the mailbox.""" ! # Note that IMAP messages are 1-based, our messages are 0-based d = self.storage return long(d.keys()[msg - 1]) --- 379,383 ---- def getUID(self, msg): """Return the UID of a message in the mailbox.""" ! # Note that IMAP messages are 1-based, our messages are 0-based. d = self.storage return long(d.keys()[msg - 1]) *************** *** 528,531 **** --- 491,496 ---- def _messagesIter(self, messages, uid): if uid: + if not self.storage.keys(): + return messages.last = long(self.storage.keys()[-1]) else: *************** *** 591,601 **** msg = [] if headers: ! msg.append("Subject:SpamBayes Status") ! msg.append('From:"SpamBayes" ') if body: msg.append('\r\n') if body: state.buildStatusStrings() ! msg.append(state.warning or "SpamBayes operating correctly.") return "\r\n".join(msg) --- 556,602 ---- msg = [] if headers: ! msg.append("Subject: SpamBayes Status") ! msg.append('From: "SpamBayes" ') if body: msg.append('\r\n') if body: state.buildStatusStrings() ! msg.append("POP3 proxy running on %s, proxying to %s." % \ ! (state.proxyPortsString, state.serversString)) ! msg.append("Active POP3 conversations: %s." % \ ! (state.activeSessions,)) ! msg.append("POP3 conversations this session: %s." % \ ! (state.totalSessions,)) ! msg.append("IMAP server running on %s." % \ ! (state.serverPortString,)) ! msg.append("Active IMAP4 conversations: %s." % \ ! (state.activeIMAPSessions,)) ! msg.append("IMAP4 conversations this session: %s." % \ ! (state.totalIMAPSessions,)) ! msg.append("Emails classified this session: %s spam, %s ham, " ! "%s unsure." % (state.numSpams, state.numHams, ! state.numUnsure)) ! msg.append("Total emails trained: Spam: %s Ham: %s" % \ ! (state.bayes.nspam, state.bayes.nham)) ! msg.append(state.warning or "SpamBayes is operating correctly.\r\n") ! return "\r\n".join(msg) ! ! def buildStatisticsMessage(self, body=False, headers=False): ! """Build a mesasge containing the current statistics. ! ! If body is True, then return the body; if headers is True ! return the headers. If both are true, then return both ! (and insert a newline between them). ! """ ! msg = [] ! if headers: ! msg.append("Subject: SpamBayes Statistics") ! msg.append('From: "SpamBayes" \r\n\r\n' \ ! 'See .\r\n' date = imaplib.Time2Internaldate(time.time())[1:-1] msg = email.message_from_string(about, _class=IMAPMessage, --- 604,611 ---- """Create the special messages that live in this mailbox.""" state.buildStatusStrings() ! state.buildServerStrings() ! about = 'Subject: About SpamBayes / POP3DND\r\n' \ ! 'From: "SpamBayes" \r\n\r\n' \ ! '%s\r\nSee .\r\n' % (__doc__,) date = imaplib.Time2Internaldate(time.time())[1:-1] msg = email.message_from_string(about, _class=IMAPMessage, *************** *** 614,621 **** msg = DynamicIMAPMessage(self.buildStatusMessage) self.addMessage(msg) # XXX Add other messages here, for example - # XXX statistics - # XXX information from sb_server homepage about number - # XXX of messages classified etc. # XXX one with a link to the configuration page # XXX (or maybe even the configuration page itself, --- 615,621 ---- msg = DynamicIMAPMessage(self.buildStatusMessage) self.addMessage(msg) + msg = DynamicIMAPMessage(self.buildStatisticsMessage) + self.addMessage(msg) # XXX Add other messages here, for example # XXX one with a link to the configuration page # XXX (or maybe even the configuration page itself, *************** *** 679,687 **** """Account for Spambayes server.""" ! def __init__(self, id, ham, spam, unsure, inbox): MemoryAccount.__init__(self, id) self.mailboxes = {"SPAM" : spam, "UNSURE" : unsure, "TRAIN_AS_HAM" : ham, "INBOX" : inbox} --- 679,688 ---- """Account for Spambayes server.""" ! def __init__(self, id, ham, spam, unsure, train_spam, inbox): MemoryAccount.__init__(self, id) self.mailboxes = {"SPAM" : spam, "UNSURE" : unsure, "TRAIN_AS_HAM" : ham, + "TRAIN_AS_SPAM" : train_spam, "INBOX" : inbox} *************** *** 745,749 **** ! class MyBayesProxy(POP3ProxyBase): """Proxies between an email client and a POP3 server, redirecting mail to the imap server as necessary. It acts on the following --- 746,750 ---- ! class RedirectingBayesProxy(POP3ProxyBase): """Proxies between an email client and a POP3 server, redirecting mail to the imap server as necessary. It acts on the following *************** *** 759,763 **** # information about who the message was from, or what the subject # was, if people thought that would be a good idea. ! intercept_message = 'From: "Spambayes" \r\n' \ 'Subject: Spambayes Intercept\r\n\r\nA message ' \ 'was intercepted by Spambayes (it scored %s).\r\n' \ --- 760,764 ---- # information about who the message was from, or what the subject # was, if people thought that would be a good idea. ! intercept_message = 'From: "Spambayes" \r\n' \ 'Subject: Spambayes Intercept\r\n\r\nA message ' \ 'was intercepted by Spambayes (it scored %s).\r\n' \ *************** *** 831,834 **** --- 832,839 ---- evidence=True) + # Note that the X-SpamBayes-MailID header will be worthless + # because we don't know the message id at this point. It's + # not necessary for anything anyway, so just don't set the + # [Headers] add_unique_id option. msg.addSBHeaders(prob, clues) *************** *** 864,878 **** messageText = self.intercept_message % (prob,) except: ! stream = cStringIO.StringIO() ! traceback.print_exc(None, stream) ! details = stream.getvalue() ! detailLines = details.strip().split('\n') ! dottedDetails = '\n.'.join(detailLines) ! headerName = 'X-Spambayes-Exception' ! header = Header(dottedDetails, header_name=headerName) ! headers, body = re.split(r'\n\r?\n', messageText, 1) ! header = re.sub(r'\r?\n', '\r\n', str(header)) ! headers += "\n%s: %s\r\n\r\n" % (headerName, header) ! messageText = headers + body print >>sys.stderr, details retval = ok + "\n" + messageText --- 869,876 ---- messageText = self.intercept_message % (prob,) except: ! messageText, details = \ ! message.insert_exception_header(messageText) ! ! # Print the exception and a traceback. print >>sys.stderr, details retval = ok + "\n" + messageText *************** *** 889,900 **** ! class MyBayesProxyListener(Dibbler.Listener): """Listens for incoming email client connections and spins off ! MyBayesProxy objects to serve them. """ - def __init__(self, serverName, serverPort, proxyPort, spam, unsure): proxyArgs = (serverName, serverPort, spam, unsure) ! Dibbler.Listener.__init__(self, proxyPort, MyBayesProxy, proxyArgs) print 'Listener on port %s is proxying %s:%d' % \ (_addressPortStr(proxyPort), serverName, serverPort) --- 887,898 ---- ! class RedirectingBayesProxyListener(Dibbler.Listener): """Listens for incoming email client connections and spins off ! RedirectingBayesProxy objects to serve them. """ def __init__(self, serverName, serverPort, proxyPort, spam, unsure): proxyArgs = (serverName, serverPort, spam, unsure) ! Dibbler.Listener.__init__(self, proxyPort, RedirectingBayesProxy, ! proxyArgs) print 'Listener on port %s is proxying %s:%d' % \ (_addressPortStr(proxyPort), serverName, serverPort) *************** *** 923,985 **** def setup(): ! # Setup state, app, boxes, trainers and account state.createWorkers() proxyListeners = [] - app = Application("SpambayesIMAPServer") ! spam_box = SpambayesMailbox("Spam", 0, options["Storage", ! "spam_cache"]) ! unsure_box = SpambayesMailbox("Unsure", 1, options["Storage", ! "unknown_cache"]) ham_train_box = SpambayesMailbox("TrainAsHam", 2, options["Storage", "ham_cache"]) ! inbox = SpambayesInbox(3) ! spam_trainer = Trainer(spam_box, True) ham_trainer = Trainer(ham_train_box, False) ! spam_box.addListener(spam_trainer) ham_train_box.addListener(ham_trainer) user_account = SpambayesAccount(options["imapserver", "username"], ham_train_box, spam_box, unsure_box, ! inbox) ! # add IMAP4 server f = OneParameterFactory() f.protocol = SpambayesIMAPServer f.parameter = user_account ! state.imap_port = options["imapserver", "port"] ! app.listenTCP(state.imap_port, f) ! # add POP3 proxy for (server, serverPort), proxyPort in zip(state.servers, state.proxyPorts): ! listener = MyBayesProxyListener(server, serverPort, proxyPort, ! spam_box, unsure_box) proxyListeners.append(listener) state.buildServerStrings() - # add web interface - httpServer = UserInterfaceServer(state.uiPort) - serverUI = ServerUserInterface(state, _recreateState) - httpServer.register(serverUI) - - return app - def run(): # Read the arguments. try: ! opts, args = getopt.getopt(sys.argv[1:], 'hbd:D:u:o:') except getopt.error, msg: print >>sys.stderr, str(msg) + '\n\n' + __doc__ sys.exit() - launchUI = False for opt, arg in opts: if opt == '-h': print >>sys.stderr, __doc__ sys.exit() - elif opt == '-b': - launchUI = True elif opt == '-o': options.set_from_cmdline(arg, sys.stderr) --- 921,977 ---- def setup(): ! # Setup state, server, boxes, trainers and account. ! state.imap_port = options["imapserver", "port"] state.createWorkers() proxyListeners = [] ! spam_box = SpambayesMailbox("Spam", 0, ! options["Storage", "spam_cache"]) ! unsure_box = SpambayesMailbox("Unsure", 1, ! options["Storage", "unknown_cache"]) ham_train_box = SpambayesMailbox("TrainAsHam", 2, options["Storage", "ham_cache"]) ! # We don't have a third cache location in the directory, so make one up. ! spam_train_cache = os.path.join(options["Storage", "ham_cache"], "..", ! "spam_to_train") ! spam_train_box = SpambayesMailbox("TrainAsSpam", 3, spam_train_cache) ! inbox = SpambayesInbox(4) ! spam_trainer = Trainer(spam_train_box, True) ham_trainer = Trainer(ham_train_box, False) ! spam_train_box.addListener(spam_trainer) ham_train_box.addListener(ham_trainer) user_account = SpambayesAccount(options["imapserver", "username"], ham_train_box, spam_box, unsure_box, ! spam_train_box, inbox) ! # Add IMAP4 server. f = OneParameterFactory() f.protocol = SpambayesIMAPServer f.parameter = user_account ! reactor.listenTCP(state.imap_port, f) ! # Add POP3 proxy. for (server, serverPort), proxyPort in zip(state.servers, state.proxyPorts): ! listener = RedirectingBayesProxyListener(server, serverPort, ! proxyPort, spam_box, ! unsure_box) proxyListeners.append(listener) state.buildServerStrings() def run(): # Read the arguments. try: ! opts, args = getopt.getopt(sys.argv[1:], 'ho:') except getopt.error, msg: print >>sys.stderr, str(msg) + '\n\n' + __doc__ sys.exit() for opt, arg in opts: if opt == '-h': print >>sys.stderr, __doc__ sys.exit() elif opt == '-o': options.set_from_cmdline(arg, sys.stderr) *************** *** 988,1001 **** print get_version_string("IMAP Server") print get_version_string("POP3 Proxy") ! print "and engine %s," % (get_version_string(),) from twisted.copyright import version as twisted_version ! print "with twisted version %s.\n" % (twisted_version,) ! # setup everything ! app = setup() ! # kick things off ! thread.start_new_thread(Dibbler.run, (launchUI,)) ! app.run(save=False) if __name__ == "__main__": --- 980,994 ---- print get_version_string("IMAP Server") print get_version_string("POP3 Proxy") ! print get_version_string() from twisted.copyright import version as twisted_version ! print "Twisted version %s.\n" % (twisted_version,) ! # Setup everything. ! setup() ! # Kick things off. The asyncore stuff doesn't play nicely ! # with twisted (or vice-versa), so put them in separate threads. ! thread.start_new_thread(Dibbler.run, ()) ! reactor.run() if __name__ == "__main__": From anadelonbrin at users.sourceforge.net Mon Nov 8 02:20:39 2004 From: anadelonbrin at users.sourceforge.net (Tony Meyer) Date: Mon Nov 8 02:20:42 2004 Subject: [Spambayes-checkins] website quotes.ht,1.10,1.11 Message-ID: Update of /cvsroot/spambayes/website In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv15058 Modified Files: quotes.ht Log Message: Add another quote. Index: quotes.ht =================================================================== RCS file: /cvsroot/spambayes/website/quotes.ht,v retrieving revision 1.10 retrieving revision 1.11 diff -C2 -d -r1.10 -r1.11 *** quotes.ht 9 Aug 2004 06:18:41 -0000 1.10 --- quotes.ht 8 Nov 2004 01:20:35 -0000 1.11 *************** *** 90,93 **** --- 90,100 ----

+

+ If you use Outlook, drop everything and get SpamBayes.
+ Scott Spanbauer with sage advice in a + PCWorld + article. +

+

Spamotomy users have a bit to say, too! From anadelonbrin at users.sourceforge.net Mon Nov 8 02:23:02 2004 From: anadelonbrin at users.sourceforge.net (Tony Meyer) Date: Mon Nov 8 02:23:12 2004 Subject: [Spambayes-checkins] website faq.txt,1.81,1.82 Message-ID: Update of /cvsroot/spambayes/website In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv15516 Modified Files: faq.txt Log Message: Update FAQ in various ways: 1. Get rid of things specific to certain alpha versions. People shouldn't be using those any more, and you can't download them anywhere. 2. Update training material to point to the wiki, and stop encouraging people to do lots of pretraining. 3. Correct a few entries that refer to ways things were done in old versions that have since changed. Index: faq.txt =================================================================== RCS file: /cvsroot/spambayes/website/faq.txt,v retrieving revision 1.81 retrieving revision 1.82 diff -C2 -d -r1.81 -r1.82 *** faq.txt 11 Aug 2004 04:50:42 -0000 1.81 --- faq.txt 8 Nov 2004 01:22:59 -0000 1.82 *************** *** 33,48 **** train it on representative samples of email you receive. After it's been trained, you use SpamBayes to classify new mail according to its spamminess ! and hamminess qualities. ! ! To train SpamBayes (which you don't need to do if you're going to be using ! the POP3 proxy to classify messages, but you'll get better results from the ! outset if you do) you need to save your incoming email for awhile, ! segregating it into two piles, known spam and known ham (ham is our nickname ! for good mail). It's best to train on recent email, because your interests ! and the nature of what spam looks like change over time. Once you've ! collected a fair portion of each, you can tell SpamBayes, "Here's my ! ham and my spam". It will then process that mail and save information about ! different patterns which appear in ham and spam. That information is then ! used during the filtering stage. When SpamBayes filters your email, it compares each unclassified message --- 33,38 ---- train it on representative samples of email you receive. After it's been trained, you use SpamBayes to classify new mail according to its spamminess ! and hamminess qualities. It's best to train on recent email, because your ! interests and the nature of what spam looks like change over time. When SpamBayes filters your email, it compares each unclassified message *************** *** 163,168 **** give it messages, tell it whether those messages are ham or spam, and it adjusts its probabilities accordingly. How to train it is covered below. ! By default it lives in a file called "hammie.db", "statistics_database.db" ! or (for the Outlook plugin) "default_bayes_database". 2. The tokenizer/classifier. This is the core engine of the system. The --- 153,158 ---- give it messages, tell it whether those messages are ham or spam, and it adjusts its probabilities accordingly. How to train it is covered below. ! By default it lives in a file called "hammie.db", or (for the Outlook ! plugin) "default_bayes_database". 2. The tokenizer/classifier. This is the core engine of the system. The *************** *** 547,552 **** ! Will SpamBayes work with Outlook 2000 connecting to an Exchange 2000 server? ! ---------------------------------------------------------------------------- Yes. --- 537,542 ---- ! Will SpamBayes work with Outlook connecting to an Exchange server? ! ------------------------------------------------------------------ Yes. *************** *** 555,561 **** -------------------------------------------------------- ! Yes, in version 008 and above of the plugin. You can find this on the ! filtering tab of the SpamBayes manager dialog. However, you should also ! see the `envelope icon question`_. .. _`envelope icon question`: #how-can-I-get-rid-of-the-envelope-tray-icon-for-spam --- 545,550 ---- -------------------------------------------------------- ! Yes. You can find this on the filtering tab of the SpamBayes manager ! dialog. However, you should also see the `envelope icon question`_. .. _`envelope icon question`: #how-can-I-get-rid-of-the-envelope-tray-icon-for-spam *************** *** 575,583 **** back in to recover from a corrupted database, or for any other reason. ! This directory is located in the "Application Data" directory. If you have ! version 008 of the plug-in, or higher, you can locate this directory by ! using the `Show Data Folder` button on the `Advanced` tab of the main ! `SpamBayes` manager dialog. If you need to locate it by hand, on Windows ! 2000/XP, it will probably be: C:\\Documents and Settings\\[username]\\Application Data\\Spambayes --- 564,571 ---- back in to recover from a corrupted database, or for any other reason. ! This directory is located in the "Application Data" directory. You can ! locate this directory by using the `Show Data Folder` button on the ! `Advanced` tab of the main `SpamBayes` manager dialog. If you need to ! locate it by hand, on Windows 2000/XP, it will probably be: C:\\Documents and Settings\\[username]\\Application Data\\Spambayes *************** *** 612,624 **** you need to have done these things to enable the button: ! 1. Trained at least 5 ham and 5 spam ! ! 2. Set at least one folder to watch ! ! 3. Set folders to move spam to, and to move unsures to ! 4. Changed the action to "copy" or "move", rather than "untouched" ! 5. Ticked the "enable SpamBayes" checkbox on the first tab of the dialog. --- 600,608 ---- you need to have done these things to enable the button: ! 1. Set at least one folder (not your unsure or spam folder) to watch ! 2. Set folders to move spam to, and to move unsures to ! 3. Ticked the "enable SpamBayes" checkbox on the first tab of the dialog. *************** *** 764,770 **** Basically, you need to create a file "default_configuration.ini", and ! put it either in the directory that SpamBayes was installed into, or in ! the default data directory (the `backup question`_ has instructions for ! finding this directory). Inside this file, you need to have a section "General", and an option --- 748,754 ---- Basically, you need to create a file "default_configuration.ini", and ! put it either in the bin directory in the directory that SpamBayes was ! installed into, or in the default data directory (the `backup question`_ ! has instructions for finding this directory). Inside this file, you need to have a section "General", and an option *************** *** 831,840 **** select the SpamBayes toolbar and click "Delete". - With the 008.1 and earlier versions of the plug-in, some entries may be left - in the registry. These should be harmless, but if they bother you (and you - are confident mucking about with the registry, which we do *not* recommend), - then you can remove those keys yourself. Newer versions of the installer - correctly remove these entries. - .. _`backup question`: #can-i-back-up-the-outlook-database-should-i-do-this .. _`a bug with the plug-in`: http://sourceforge.net/tracker/index.php?func=detail&aid=675811&group_id=61702&atid=498103 --- 815,818 ---- *************** *** 894,907 **** Follow the "Review messages" link and you'll see a list of the emails that the system has seen so far. Check the appropriate boxes and hit Train. The ! messages disappear (eventually you'll be able to get back to them, for ! instance to correct any training mistakes) and if you go back to the home ! page you'll see that the "Total emails trained" has increased. Once you've done this on a few spams and a few hams, you'll find that the X-Spambayes-Classification header is getting it right most of the time. The ! more you train it the more accurate it gets. There's no need to train it on ! every message you receive, but you should train on a few spams and a few ! hams on a regular basis. You should also try to train it on about the same ! number of spams as hams. You can train it on lots of messages in one go by either using the sb_filter --- 872,883 ---- Follow the "Review messages" link and you'll see a list of the emails that the system has seen so far. Check the appropriate boxes and hit Train. The ! messages disappear and if you go back to the home page you'll see that the ! "Total emails trained" has increased. Once you've done this on a few spams and a few hams, you'll find that the X-Spambayes-Classification header is getting it right most of the time. The ! more you train it the more accurate it gets, but not that you should try to ! train it on about the same number of spams as hams. The `SpamBayes wiki`_ ! has some `information about training`_ that you may wish to read. You can train it on lots of messages in one go by either using the sb_filter *************** *** 911,914 **** --- 887,893 ---- using Outlook Express dbx files. + .. _`SpamBayes wiki`: http://entrian.com/sbwiki + .. _`information about training`: http://entrian.com/sbwiki/TrainingIdeas + How do I train SpamBayes (forward/bounce method)? *************** *** 936,940 **** containing nothing but ham, you can train SpamBayes using a command like:: ! sb_mboxtrain.py -g ~/tmp/newham -s ~/tmp/newspam The above command is OS-centric (e.g., UNIX, or Windows command prompt). --- 915,919 ---- containing nothing but ham, you can train SpamBayes using a command like:: ! python sb_mboxtrain.py -g ~/tmp/newham -s ~/tmp/newspam The above command is OS-centric (e.g., UNIX, or Windows command prompt). *************** *** 1016,1020 **** 2. It is quite important that you have trained on roughly equal numbers of ! ham and spam (don't go above a 2::1 ratio, for example). 3. Have you trained on a reasonable number of hams and spams? You should --- 995,999 ---- 2. It is quite important that you have trained on roughly equal numbers of ! ham and spam (don't go above a 4::1 ratio, for example). 3. Have you trained on a reasonable number of hams and spams? You should *************** *** 1170,1174 **** Sadly, not much is done in the way of testing these days. Hopefully this ! will change, though, and if you're interested it's definately an option. Check out the README-DEVEL for information about how to get started. This is the way to go if you have a new idea, too - even if you convince someone else to --- 1149,1153 ---- Sadly, not much is done in the way of testing these days. Hopefully this ! will change, though, and if you're interested it's definitely an option. Check out the README-DEVEL for information about how to get started. This is the way to go if you have a new idea, too - even if you convince someone else to *************** *** 1213,1218 **** couple other tools, `POPFile `_ and `CRM114 `_. A demonstration script which performs ! n-way classification was also recently added to the ``contrib`` directory of ! the SpamBayes CVS repository. --- 1192,1197 ---- couple other tools, `POPFile `_ and `CRM114 `_. A demonstration script which performs ! n-way classification in also in the ``contrib`` directory of the SpamBayes ! source. *************** *** 1228,1246 **** To use a pickle, set the option "persistent_use_database" to False in your `configuration file <#how-do-i-configure-spambayes>`_, ! in the section "Storage" (if you have been using SpamBayes for a while, ! check that you don't have an old version of this option elsewhere in your ! configuration file, in the pop3proxy or hammiefilter sections). You may ! also wish to change the name of the storage file (to end with "pck", for ! example), but this is not necessary - to do so, change the ! "persistent_storage_file" option (also in the "Storage" section). If you specify your database on the command line ("sb_server.py -d hammie.db", ! for example), then you should use the "-D" switch instead. Note, however, ! that it is likely that these switches will change in a future release, and ! using the configuration file is a much safer option. Note that if you have an existing database, which is not a pickle, you can not keep using it - this will cause errors. You need to either retrain ! from scratch, or use the dbExpImp script to convert it to a pickle. --- 1207,1221 ---- To use a pickle, set the option "persistent_use_database" to False in your `configuration file <#how-do-i-configure-spambayes>`_, ! in the section "Storage". You may also wish to change the name of the ! storage file (to end with "pck", for example), but this is not necessary ! - to do so, change the "persistent_storage_file" option (also in the ! "Storage" section). If you specify your database on the command line ("sb_server.py -d hammie.db", ! for example), then you should use the "-p" switch instead. Note that if you have an existing database, which is not a pickle, you can not keep using it - this will cause errors. You need to either retrain ! from scratch, or use the sb_dbexpimp.py script to convert it to a pickle. *************** *** 1255,1271 **** for example. ! In releases up to and including 1.0a4, you need to edit the ! bayescustomize.ini script (the configuration page on the website tells you ! where this is located). In the ``[html_ui]`` section (create one if there ! isn't one already), add the line: ``allow_remote_connections:True``. If ! you can, you might want to firewall outside access to port 8880, to stop ! unauthorised users from messing with the web interface. ! ! In versions after 1.0a4, you can specify IP addresses or ranges that you ! want to be allowed access (two or three machines, for example). You can ! also do this via the web configuration, without having to alter the ! configuration file manually. The option you are after is called ! ``Allowed remote connections``. In versions after 1.0a5, you can also ! set the interface to use HTTP-AUTH, either Basic or Digest. --- 1230,1238 ---- for example. ! You can specify IP addresses or ranges that you ! want to be allowed access (two or three machines, for example), via the web ! configuration. The option you are after is called ! ``Allowed remote connections``. You can also set the interface to use ! HTTP-AUTH, either Basic or Digest. *************** *** 1284,1324 **** ============================ - Pop3proxy doesn't work with fetchmail. - -------------------------------------- - - This is a known problem in releases up to and including 1.0a4, fixed in CVS - on 28th July 2003. To work around it, use fetchmail's ``fetchall`` option. - My database keeps getting corrupted. ------------------------------------ ! You may be using the 'dumbdbm' system for your database. ! 'dumbdbm' is the default database system - the one that gets fallen back on ! when nothing else is available. It is not usually a good choice, and in ! SpamBayes' case, always the wrong one. Some versions of dumbdbm have a bug ! that will cause database corruption, but you shouldn't be using it anyway, ! as it is very inefficient. Instead, either ! `use a pickle <#how-do-i-use-a-pickle-for-storage>`_ or install `pybsddb`_ ! (bsddb3) and use that instead. If you are not sure which database systems you have available, and/or which one you are currently using, there is a script in the utilities folder called `which_database.py`_ that will display ! this information (Windows users should run it from a command prompt). This ! file is only included in releases after 1.0a4 - if you are using an earlier ! version, you can download it from cvs, or just go by the name(s) of the ! database file(s). If you have a single file, probably called ``hammie.db``, ! then you are probably not using dumbdbm. If you have three files (probably ! called ``hammie.db.dir``, ``hammie.db.dat`` and ``hammie.db.bak``), then you ! most likely are using dumbdbm, and should stop. Note that users of the ! pop3proxy_service can not currently use which_database.py. - Support for dumbdbm has been dropped since release 1.0a6. - - Note that none of this applies to the Outlook plug-in, which avoids it - on your behalf. .. _which_database.py: http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/*checkout*/spambayes/spambayes/utilities/which_database.py?rev=HEAD&content-type=text/plain ! .. _pybsddb: http://pybsddb.sourceforge.net/ --- 1251,1281 ---- ============================ My database keeps getting corrupted. ------------------------------------ ! Despite the efforts of the developers, there are still occasional problems ! with database corruption. Known potential causes include: ! ! 1. Accessing the database files from more than one process concurrently. ! ! 2. Interupting SpamBayes in the midst of training (through a program or ! machine crash, for example). ! ! If you experience consisent corruption, or can provide a set of steps that ! will consisently cause the database to be corrupted, please email ! the `mailing list`_, describing your situation. ! ! Otherwise, you should simply retrain from scratch. You may wish to change ! to an alternative database system to try and avoid these problems. If you are not sure which database systems you have available, and/or which one you are currently using, there is a script in the utilities folder called `which_database.py`_ that will display ! this information (Windows users should run it from a command prompt). Note ! that users of the pop3proxy_service can not currently use which_database.py. .. _which_database.py: http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/*checkout*/spambayes/spambayes/utilities/which_database.py?rev=HEAD&content-type=text/plain ! .. _mailing list: mailto:spambayes@python.org *************** *** 1328,1333 **** If you get a message that looks like: DBRunRecoveryError: (-30982, 'DB_RUNRECOVERY: Fatal error, run database ! recovery -- fatal region error detected; run recovery') ! This, sadly, means that your training database is corrupted, and you have no choice but to delete it and train again from scratch. We don't know what causes this to happen, but we are trying to fix it. If you find it happens --- 1285,1290 ---- If you get a message that looks like: DBRunRecoveryError: (-30982, 'DB_RUNRECOVERY: Fatal error, run database ! recovery -- fatal region error detected; run recovery'), ! this, sadly, means that your training database is corrupted, and you have no choice but to delete it and train again from scratch. We don't know what causes this to happen, but we are trying to fix it. If you find it happens *************** *** 1337,1340 **** --- 1294,1301 ---- reproduce the problem, so tracking it down is proving very difficult. + Note that the "database recovery" that you are told to run does not apply. + This is a message provided by the underlying bsddb database system, and + cannot be used in this case. + If you don't want to risk it happening again, switch to using the pickle storage (web interface: Configuration / Advanced Configuration / *************** *** 1359,1424 **** - The readme says that I can delete the files after doing "setup.py install", but then I can't find pop3proxy_service.py or pop3proxy_tray.py. - -------------------------------------------------------------------------------------------------------------------------------------------- - - This is a mistake in either the readme or setup.py in the 1.0a6 release. - It's fixed in the 1.0a7 release, so that pop3proxy_service.py and - pop3proxy_tray.py will also be installed to the Python scripts directory - (if you are running Windows). - - - I can't train via the web interface in 1.0a6! - --------------------------------------------- - - There is a known problem with the 1.0a6 release, which is fixed in 1.0a7. - Download the newer release from the download page. - - To workaround the problem if you're stuck on 1.0a6: you can't use the - database after making any changes via the web interface configuration pages. - To work around this, either restart SpamBayes after using the configuration - pages, or upgrade to 1.0a7. - - The '500' error you receive will end with "Object does not support item - assignment". It may also show up on other pages than the review messages - one, such as looking up a word in the database. - - - sb_imapfilter prints out "Skipping unparseable message", but the message vanishes! - ---------------------------------------------------------------------------------- - - This is a known problem with the 1.0a9 (0.9) release, and will be fixed in - the next release. Unless you have something set to expunge/purge the IMAP - folder, the original message will still be there, marked as deleted, so you - can get it back, although malformed messages are most likely to be spam, - anyway. - - If you need a fix for this before the next release, you can get sb_imapfilter.py - from CVS (revision 1.26), and use it instead of the one included with 1.0a9 (0.9). - You should also get message.py (revision 1.46), and replace the message.py - in your Python Lib/site-packages/spambayes folder with it. - - Note that in addition to the message disappearing, you'll find a new message - (almost certainly unsure) which is blank, apart from the SpamBayes headers. - You may safely delete these messages. If you are training and come across - one of these messages, you'll also have the ham/spam count in your database - increase, without any tokens increasing their count, but that shouldn't have - any effect, as long as it doesn't happen regularly. - - - The 1.0a9 (0.9) installer is missing the pop3proxy_service file. - ---------------------------------------------------------------- - - There are two bugs here - one is that the readme_proxy.html file installed - by the 1.0a9 (0.9) installer talks about a directory that doesn't exist, - namely {Program Files}/SpamBayes/Proxy. This should be {Program Files}/SpamBayes/bin, - but that won't help you, because the executable that you need to install - the service isn't installed. - - This will be fixed in the next release (in the 'bin' directory will be a - file called "sb_server.exe"). Until then, if you want to install sb_server - as a service, you will need to do this from source. You can, of course, - run sb_server.exe or sb_tray.exe, without having the service installed. - - Why does the spambayes@python.org mailing list get spam? -------------------------------------------------------- --- 1320,1323 ---- *************** *** 1535,1539 **** results than a more general approach that just generates tokens and throws them at the classifier. See also the file NEWTRICKS.txt in the source ! distribution - we're filing neat ideas here. If you're interested in trying out other people's cool ideas, as well as your --- 1434,1438 ---- results than a more general approach that just generates tokens and throws them at the classifier. See also the file NEWTRICKS.txt in the source ! distribution - we're filing neat ideas here, and also check out the `wiki`_. If you're interested in trying out other people's cool ideas, as well as your *************** *** 1542,1545 **** --- 1441,1446 ---- and give us some feedback about how they work for you. + .. _wiki: http://entrian.com/sbwiki + Are there plans to develop a server-side SpamBayes solution? From anadelonbrin at users.sourceforge.net Mon Nov 8 03:01:19 2004 From: anadelonbrin at users.sourceforge.net (Tony Meyer) Date: Mon Nov 8 03:01:21 2004 Subject: [Spambayes-checkins] spambayes/spambayes Options.py,1.115,1.116 Message-ID: Update of /cvsroot/spambayes/spambayes/spambayes In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv23992/spambayes Modified Files: Options.py Log Message: Clarify the help text for "Hammie":"train_on_filter" to make it clearer that it doesn't apply to the POP3 proxy or IMAP filter. (It is exposed via the sb_server web interface as you can use that to configure sb_filter if you want - particularly if you're using sb_upload as well). Index: Options.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/spambayes/Options.py,v retrieving revision 1.115 retrieving revision 1.116 diff -C2 -d -r1.115 -r1.116 *** Options.py 2 Nov 2004 21:27:42 -0000 1.115 --- Options.py 8 Nov 2004 02:01:14 -0000 1.116 *************** *** 494,498 **** with a procmail-based solution. If you do enable this, please make sure to retrain any mistakes. Otherwise, your word database will ! slowly become useless.""", BOOLEAN, RESTORE), ), --- 494,500 ---- with a procmail-based solution. If you do enable this, please make sure to retrain any mistakes. Otherwise, your word database will ! slowly become useless. Note that this option is only used by ! sb_filter, and will have no effect on sb_server's POP3 proxy, or ! the IMAP filter.""", BOOLEAN, RESTORE), ), From anadelonbrin at users.sourceforge.net Mon Nov 8 05:57:41 2004 From: anadelonbrin at users.sourceforge.net (Tony Meyer) Date: Mon Nov 8 05:57:45 2004 Subject: [Spambayes-checkins] spambayes/Outlook2000 addin.py,1.134,1.135 Message-ID: Update of /cvsroot/spambayes/spambayes/Outlook2000 In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv28846/Outlook2000 Modified Files: addin.py Log Message: Add two extra items to the "spam clues" for the message: 1. Score/class when the message was last filtered. This is useful if you don't have the spam field displayed, and will be useful for copies received on the mailing list. 2. Whether or not the message has been trained (and if so, as what). This is possibly useful for the user, but could definitely be useful for copies received on the mailing list. Index: addin.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/Outlook2000/addin.py,v retrieving revision 1.134 retrieving revision 1.135 diff -C2 -d -r1.134 -r1.135 *** addin.py 2 Nov 2004 21:33:46 -0000 1.134 --- addin.py 8 Nov 2004 04:57:39 -0000 1.135 *************** *** 460,463 **** --- 460,485 ---- push("# ham trained on: %d
\n" % c.nham) push("# spam trained on: %d
\n" % c.nspam) + # Score when the message was classified - this will hopefully help + # people realise that it may not necessarily be the same, and will + # help diagnosing any 'wrong' scoring reported. + original_score = msgstore_message.GetField(mgr.config.general.field_score_name) + if original_score >= mgr.config.filter.spam_threshold: + original_class = "spam" + elif original_score >= mgr.config.filter.unsure_threshold: + original_class = "unsure" + else: + original_class = "good" + push("
\n") + if original_score is None: + push("This message has not been filtered.") + else: + push("When this message was last filtered, it was classified " \ + "as %s (it scored %d%%)." % (original_class, original_score*100)) + # Report whether this message has been trained or not. + push("
\n") + trained_as = mgr.classifier_data.message_db.get(msgstore_message.searchkey) + push("This message has %sbeen trained%s." % \ + {0 : ("", "as ham"), 1 : ("", "as spam"), None : ("not ", "")} + [trained_as]) # Format the clues. push("

%s Significant Tokens

\n
" % len(clues))
***************
*** 666,671 ****
              # Must train before moving, else we lose the message!
              subject = msgstore_message.GetSubject()
!             print "Moving and spam training message '%s' - " % (subject,),
!             TrainAsSpam(msgstore_message, self.manager, save_db = False)
              # Do the new message state if necessary.
              try:
--- 688,693 ----
              # Must train before moving, else we lose the message!
              subject = msgstore_message.GetSubject()
!                 print "Moving and spam training message '%s' - " % (subject,),
!                 TrainAsSpam(msgstore_message, self.manager, save_db = False)
              # Do the new message state if necessary.
              try:
***************
*** 729,734 ****
                                          self.manager.score(msgstore_message))
                  # Must train before moving, else we lose the message!
!                 print "Recovering to folder '%s' and ham training message '%s' - " % (restore_folder.name, subject),
!                 TrainAsHam(msgstore_message, self.manager, save_db = False)
                  # Do the new message state if necessary.
                  try:
--- 751,756 ----
                                          self.manager.score(msgstore_message))
                  # Must train before moving, else we lose the message!
!                     print "Recovering to folder '%s' and ham training message '%s' - " % (restore_folder.name, subject),
!                     TrainAsHam(msgstore_message, self.manager, save_db = False)
                  # Do the new message state if necessary.
                  try:

From anadelonbrin at users.sourceforge.net  Mon Nov  8 06:02:12 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Mon Nov  8 06:02:14 2004
Subject: [Spambayes-checkins] spambayes/Outlook2000 addin.py,1.135,1.136
Message-ID: 

Update of /cvsroot/spambayes/spambayes/Outlook2000
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv29733/Outlook2000

Modified Files:
	addin.py 
Log Message:
Add a missing space to the last checkin.

Index: addin.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/Outlook2000/addin.py,v
retrieving revision 1.135
retrieving revision 1.136
diff -C2 -d -r1.135 -r1.136
*** addin.py	8 Nov 2004 04:57:39 -0000	1.135
--- addin.py	8 Nov 2004 05:02:09 -0000	1.136
***************
*** 480,484 ****
      trained_as = mgr.classifier_data.message_db.get(msgstore_message.searchkey)
      push("This message has %sbeen trained%s." % \
!          {0 : ("", "as ham"), 1 : ("", "as spam"), None : ("not ", "")}
           [trained_as])
      # Format the clues.
--- 480,484 ----
      trained_as = mgr.classifier_data.message_db.get(msgstore_message.searchkey)
      push("This message has %sbeen trained%s." % \
!          {0 : ("", " as ham"), 1 : ("", " as spam"), None : ("not ", "")}
           [trained_as])
      # Format the clues.

From anadelonbrin at users.sourceforge.net  Tue Nov  9 01:46:14 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Tue Nov  9 01:46:21 2004
Subject: [Spambayes-checkins] spambayes/scripts sb_imapfilter.py,1.41,1.42
Message-ID: 

Update of /cvsroot/spambayes/spambayes/scripts
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv14302/scripts

Modified Files:
	sb_imapfilter.py 
Log Message:
Update some comments.

Improve the order of a if statement condition.

Implement [ 940547 ] imapfilter interface available when using -l switch

Index: sb_imapfilter.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/scripts/sb_imapfilter.py,v
retrieving revision 1.41
retrieving revision 1.42
diff -C2 -d -r1.41 -r1.42
*** sb_imapfilter.py	13 Oct 2004 02:42:04 -0000	1.41
--- sb_imapfilter.py	9 Nov 2004 00:46:12 -0000	1.42
***************
*** 27,33 ****
              -l minutes  : period of time between filtering operations
              -b          : Launch a web browser showing the user interface.
-                           (If not specified, and neither the -c or -t
-                           options are used, then this will default to the
-                           value in your configuration file).
              -o section:option:value :
                            set [section, option] in the options database
--- 27,30 ----
***************
*** 58,69 ****
  
  todo = """
-     o IMAPMessage and IMAPFolder currently carry out very simple checks
-       of responses received from IMAP commands, but if the response is not
-       "OK", then the filter terminates.  Handling of these errors could be
-       much nicer.
-     o Develop a test script, like spambayes/test/test_pop3proxy.py that
-       runs through some tests (perhaps with a *real* imap server, rather
-       than a dummy one).  This would make it easier to carry out the tests
-       against each server whenever a change is made.
      o IMAP supports authentication via other methods than the plain-text
        password method that we are using at the moment.  Neither of the
--- 55,58 ----
***************
*** 76,85 ****
  """
  
! # This module is part of the spambayes project, which is Copyright 2002-4
  # The Python Software Foundation and is covered by the Python Software
  # Foundation license.
  
  __author__ = "Tony Meyer , Tim Stone"
! __credits__ = "All the Spambayes folk."
  
  from __future__ import generators
--- 65,74 ----
  """
  
! # This module is part of the SpamBayes project, which is Copyright 2002-4
  # The Python Software Foundation and is covered by the Python Software
  # Foundation license.
  
  __author__ = "Tony Meyer , Tim Stone"
! __credits__ = "All the SpamBayes folk."
  
  from __future__ import generators
***************
*** 98,101 ****
--- 87,91 ----
  import getopt
  import types
+ import thread
  import traceback
  import email
***************
*** 174,178 ****
          SelectFolder, rather than here, for purposes of speed."""
          # We may never have logged in, in which case we do nothing.
!         if self.do_expunge and self.logged_in:
              # Expunge messages from the ham, spam and unsure folders.
              for fol in ["spam_folder",
--- 164,168 ----
          SelectFolder, rather than here, for purposes of speed."""
          # We may never have logged in, in which case we do nothing.
!         if self.connected and self.logged_in and self.do_expunge:
              # Expunge messages from the ham, spam and unsure folders.
              for fol in ["spam_folder",
***************
*** 940,949 ****
      print "and engine %s.\n" % (get_version_string(),)
  
-     if (launchUI and (doClassify or doTrain)):
-         print """-b option is exclusive with -c and -t options.
- The user interface will be launched, but no classification
- or training will be performed.
- """
- 
      if options["globals", "verbose"]:
          print "Loading database %s..." % (bdbname),
--- 930,933 ----
***************
*** 988,993 ****
      imap_filter = IMAPFilter(classifier)
  
!     # Web interface
!     if not (doClassify or doTrain):
          if server == "":
              imap = None
--- 972,988 ----
      imap_filter = IMAPFilter(classifier)
  
!     # Web interface.  We have changed the rules about this many times.
!     # With 1.0.x, the rule is that the interface is served if we are
!     # not classifying or training.  However, this runs into the problem
!     # that if we run with -l, we might still want to edit the options,
!     # and we don't want to start a separate instance, because then the
!     # database is accessed from two processes.
!     # With 1.1.x, the rule is that the interface is also served if the
!     # -l option is used, which means it is only not served if we are
!     # doing a one-off classification/train.  In that case, there would
!     # probably not be enough time to get to the interface and interact
!     # with it (and we don't want it to die halfway through!), and we
!     # don't want to slow classification/training down, either.
!     if sleepTime or not (doClassify or doTrain):
          if server == "":
              imap = None
***************
*** 997,1003 ****
          httpServer.register(IMAPUserInterface(classifier, imap, pwd,
                                                IMAPSession))
!         Dibbler.run(launchBrowser=launchUI or options["html_ui",
!                                                       "launch_browser"])
!     else:
          while True:
              imap = IMAPSession(server, port, imapDebug, doExpunge)
--- 992,1003 ----
          httpServer.register(IMAPUserInterface(classifier, imap, pwd,
                                                IMAPSession))
!         launchBrowser=launchUI or options["html_ui", "launch_browser"]
!         if sleepTime:
!             # Run in a separate thread, as we have more work to do.
!             thread.start_new_thread(Dibbler.run, (),
!                                     {"launchBrowser":launchBrowser})
!         else:
!             Dibbler.run(launchBrowser=launchBrowser)
!     if doClassify or doTrain:
          while True:
              imap = IMAPSession(server, port, imapDebug, doExpunge)

From anadelonbrin at users.sourceforge.net  Tue Nov  9 03:30:36 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Tue Nov  9 03:30:40 2004
Subject: [Spambayes-checkins] spambayes/scripts sb_imapfilter.py, 1.42,
	1.43 sb_pop3dnd.py, 1.11, 1.12
Message-ID: 

Update of /cvsroot/spambayes/spambayes/scripts
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv6343/scripts

Modified Files:
	sb_imapfilter.py sb_pop3dnd.py 
Log Message:
Use email.message_from_string(text, _class) rather than our wrapper functions, to
 avoid the Python 2.4 DeprecationWarnings about the strict argument.

Index: sb_imapfilter.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/scripts/sb_imapfilter.py,v
retrieving revision 1.42
retrieving revision 1.43
diff -C2 -d -r1.42 -r1.43
*** sb_imapfilter.py	9 Nov 2004 00:46:12 -0000	1.42
--- sb_imapfilter.py	9 Nov 2004 02:30:33 -0000	1.43
***************
*** 604,611 ****
          self.uid = new_id
  
- # This performs a similar function to email.message_from_string()
- def imapmessage_from_string(s, _class=IMAPMessage, strict=False):
-     return email.message_from_string(s, _class, strict)
- 
  
  class IMAPFolder(object):
--- 604,607 ----

Index: sb_pop3dnd.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/scripts/sb_pop3dnd.py,v
retrieving revision 1.11
retrieving revision 1.12
diff -C2 -d -r1.11 -r1.12
*** sb_pop3dnd.py	5 Nov 2004 03:10:04 -0000	1.11
--- sb_pop3dnd.py	9 Nov 2004 02:30:33 -0000	1.12
***************
*** 827,831 ****
  
              try:
!                 msg = message.sbheadermessage_from_string(messageText)
                  # Now find the spam disposition and add the header.
                  (prob, clues) = state.bayes.spamprob(msg.asTokens(),\
--- 827,832 ----
  
              try:
!                 msg = email.message_from_string(messageText,
!                                                 _class=message.SBHeaderMessage)
                  # Now find the spam disposition and add the header.
                  (prob, clues) = state.bayes.spamprob(msg.asTokens(),\

From anadelonbrin at users.sourceforge.net  Tue Nov  9 03:30:36 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Tue Nov  9 03:30:40 2004
Subject: [Spambayes-checkins] spambayes/spambayes message.py, 1.56,
	1.57 smtpproxy.py, 1.7, 1.8
Message-ID: 

Update of /cvsroot/spambayes/spambayes/spambayes
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv6343/spambayes

Modified Files:
	message.py smtpproxy.py 
Log Message:
Use email.message_from_string(text, _class) rather than our wrapper functions, to
 avoid the Python 2.4 DeprecationWarnings about the strict argument.

Index: message.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/spambayes/message.py,v
retrieving revision 1.56
retrieving revision 1.57
diff -C2 -d -r1.56 -r1.57
*** message.py	5 Nov 2004 03:03:00 -0000	1.56
--- message.py	9 Nov 2004 02:30:33 -0000	1.57
***************
*** 237,247 ****
          # non-persistent state includes all of email.Message.Message state
  
!     # This function (and it's hackishness) can be avoided by using the
!     # message_from_string and sbheadermessage_from_string functions
!     # at the end of the module.  i.e. instead of doing this:
      #   >>> msg = spambayes.message.SBHeaderMessage()
      #   >>> msg.setPayload(substance)
      # you do this:
!     #   >>> msg = sbheadermessage_from_string(substance)
      # imapfilter has an example of this in action
      def setPayload(self, payload):
--- 237,247 ----
          # non-persistent state includes all of email.Message.Message state
  
!     # This function (and it's hackishness) can be avoided by using
!     # email.message_from_string(text, _class=SBHeaderMessage)
!     # i.e. instead of doing this:
      #   >>> msg = spambayes.message.SBHeaderMessage()
      #   >>> msg.setPayload(substance)
      # you do this:
!     #   >>> msg = email.message_from_string(substance, _class=SBHeaderMessage)
      # imapfilter has an example of this in action
      def setPayload(self, payload):
***************
*** 485,495 ****
          del self[options['Headers','trained_header_name']]
  
- # These perform similar functions to email.message_from_string()
- def message_from_string(s, _class=Message, strict=False):
-     return email.message_from_string(s, _class, strict)
- 
- def sbheadermessage_from_string(s, _class=SBHeaderMessage, strict=False):
-     return email.message_from_string(s, _class, strict)
- 
  # Utility function to insert an exception header into the given RFC822 text.
  # This is used by both sb_server and sb_imapfilter, so it's handy to have
--- 485,488 ----

Index: smtpproxy.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/spambayes/smtpproxy.py,v
retrieving revision 1.7
retrieving revision 1.8
diff -C2 -d -r1.7 -r1.8
*** smtpproxy.py	16 Mar 2004 05:08:31 -0000	1.7
--- smtpproxy.py	9 Nov 2004 02:30:33 -0000	1.8
***************
*** 128,135 ****
  import sys
  import os
  
  from spambayes import Dibbler
  from spambayes import storage
! from spambayes.message import sbheadermessage_from_string
  from spambayes.tokenizer import textparts
  from spambayes.tokenizer import try_to_repair_damaged_base64
--- 128,136 ----
  import sys
  import os
+ import email
  
  from spambayes import Dibbler
  from spambayes import storage
! from spambayes import message
  from spambayes.tokenizer import textparts
  from spambayes.tokenizer import try_to_repair_damaged_base64
***************
*** 385,389 ****
  
      def extractSpambayesID(self, data):
!         msg = sbheadermessage_from_string(data)
  
          # The nicest MUA is one that forwards the header intact.
--- 386,390 ----
  
      def extractSpambayesID(self, data):
!         msg = email.message_from_string(data, _class=message.SBHeaderMessage)
  
          # The nicest MUA is one that forwards the header intact.
***************
*** 436,440 ****
              self.train_cached_message(id, isSpam)
          # Otherwise, train on the forwarded/bounced message.
!         msg = sbheadermessage_from_string(msg)
          id = msg.setIdFromPayload()
          msg.delSBHeaders()
--- 437,441 ----
              self.train_cached_message(id, isSpam)
          # Otherwise, train on the forwarded/bounced message.
!         msg = email.message_from_string(msg, _class=message.SBHeaderMessage)
          id = msg.setIdFromPayload()
          msg.delSBHeaders()

From anadelonbrin at users.sourceforge.net  Tue Nov  9 03:37:43 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Tue Nov  9 03:37:46 2004
Subject: [Spambayes-checkins] spambayes/scripts sb_server.py,1.27,1.28
Message-ID: 

Update of /cvsroot/spambayes/spambayes/scripts
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv7591/scripts

Modified Files:
	sb_server.py 
Log Message:
Implement [ 870524 ] Make the message-proxy timeout configurable

Also add a test for it in test_sb_server (this does vastly increase the time that
 that test script takes to run, because it has to wait for the timeout).

Use email.message_from_string(text, _class) rather than our wrapper functions, to
 avoid the Python 2.4 DeprecationWarnings about the strict argument.

Index: sb_server.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/scripts/sb_server.py,v
retrieving revision 1.27
retrieving revision 1.28
diff -C2 -d -r1.27 -r1.28
*** sb_server.py	10 Aug 2004 06:48:09 -0000	1.27
--- sb_server.py	9 Nov 2004 02:37:40 -0000	1.28
***************
*** 65,69 ****
   o Deployment: Windows executable?  atlaxwin and ctypes?  Or just
     webbrowser?
-  o Save the stats (num classified, etc.) between sessions.
   o "Reload database" button.
  
--- 65,68 ----
***************
*** 98,102 ****
  """
  
! import os, sys, re, errno, getopt, time, traceback, socket, cStringIO
  from thread import start_new_thread
  from email.Header import Header
--- 97,101 ----
  """
  
! import os, sys, re, errno, getopt, time, traceback, socket, cStringIO, email
  from thread import start_new_thread
  from email.Header import Header
***************
*** 240,248 ****
              self.response = ''
  
!         # Time out after 30 seconds for message-retrieval commands if
!         # all the headers are down.  The rest of the message will proxy
!         # straight through.
          if self.command in ['TOP', 'RETR'] and \
!            self.seenAllHeaders and time.time() > self.startTime + 30:
              self.onResponse()
              self.response = ''
--- 239,249 ----
              self.response = ''
  
!         # Time out after some seconds (30 by default) for message-retrieval
!         # commands if all the headers are down.  The rest of the message
!         # will proxy straight through.
!         # See also [ 870524 ] Make the message-proxy timeout configurable
          if self.command in ['TOP', 'RETR'] and \
!            self.seenAllHeaders and time.time() > \
!            self.startTime + options["pop3proxy", "retrieval_timeout"]:
              self.onResponse()
              self.response = ''
***************
*** 469,473 ****
  
              try:
!                 msg = spambayes.message.sbheadermessage_from_string(messageText)
                  msg.setId(state.getNewMessageName())
                  # Now find the spam disposition and add the header.
--- 470,475 ----
  
              try:
!                 msg = email.message_from_string(messageText,
!                           _class=spambayes.message.SBHeaderMessage)
                  msg.setId(state.getNewMessageName())
                  # Now find the spam disposition and add the header.

From anadelonbrin at users.sourceforge.net  Tue Nov  9 03:37:44 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Tue Nov  9 03:37:46 2004
Subject: [Spambayes-checkins] spambayes/spambayes Options.py, 1.116,
	1.117 ProxyUI.py, 1.51, 1.52
Message-ID: 

Update of /cvsroot/spambayes/spambayes/spambayes
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv7591/spambayes

Modified Files:
	Options.py ProxyUI.py 
Log Message:
Implement [ 870524 ] Make the message-proxy timeout configurable

Also add a test for it in test_sb_server (this does vastly increase the time that
 that test script takes to run, because it has to wait for the timeout).

Use email.message_from_string(text, _class) rather than our wrapper functions, to
 avoid the Python 2.4 DeprecationWarnings about the strict argument.

Index: Options.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/spambayes/Options.py,v
retrieving revision 1.116
retrieving revision 1.117
diff -C2 -d -r1.116 -r1.117
*** Options.py	8 Nov 2004 02:01:14 -0000	1.116
--- Options.py	9 Nov 2004 02:37:41 -0000	1.117
***************
*** 771,774 ****
--- 771,783 ----
       field to trust this only address.""",
       IP_LIST, RESTORE),
+ 
+     ("retrieval_timeout", "Retrieval timeout", 30,
+      """When proxying mesasges, time out after this length of time if
+      all the headers have been received.  The rest of the mesasge will
+      proxy straight through.  Some clients have a short timeout period,
+      and will give up on waiting for the message if this is too long.
+      Note that the shorter this is, the less of long messages will be
+      used for classifications (i.e. results may be effected).""",
+      REAL, RESTORE),
    ),
  

Index: ProxyUI.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/spambayes/ProxyUI.py,v
retrieving revision 1.51
retrieving revision 1.52
diff -C2 -d -r1.51 -r1.52
*** ProxyUI.py	29 Oct 2004 00:14:42 -0000	1.51
--- ProxyUI.py	9 Nov 2004 02:37:41 -0000	1.52
***************
*** 154,157 ****
--- 154,159 ----
      ('pop3proxy',           'allow_remote_connections'),
      ('smtpproxy',           'allow_remote_connections'),
+     ('POP3 Proxy Options',  None),
+     ('pop3proxy',           'retrieval_timeout'),
  )
  

From anadelonbrin at users.sourceforge.net  Tue Nov  9 03:37:44 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Tue Nov  9 03:37:49 2004
Subject: [Spambayes-checkins] 
	spambayes/spambayes/test test_sb_server.py, 1.1, 1.2
Message-ID: 

Update of /cvsroot/spambayes/spambayes/spambayes/test
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv7591/spambayes/test

Modified Files:
	test_sb_server.py 
Log Message:
Implement [ 870524 ] Make the message-proxy timeout configurable

Also add a test for it in test_sb_server (this does vastly increase the time that
 that test script takes to run, because it has to wait for the timeout).

Use email.message_from_string(text, _class) rather than our wrapper functions, to
 avoid the Python 2.4 DeprecationWarnings about the strict argument.

Index: test_sb_server.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/spambayes/test/test_sb_server.py,v
retrieving revision 1.1
retrieving revision 1.2
diff -C2 -d -r1.1 -r1.2
*** test_sb_server.py	5 Nov 2004 02:34:28 -0000	1.1
--- test_sb_server.py	9 Nov 2004 02:37:41 -0000	1.2
***************
*** 81,84 ****
--- 81,85 ----
  import operator
  import re
+ import time
  import getopt
  import sys, os
***************
*** 113,117 ****
      UIDL.  USER, PASS, APOP, DELE and RSET simply return "+OK"
      without doing anything.  Also understands the 'KILL' command, to
!     kill it.  The mail content is the example messages above.
      """
  
--- 114,119 ----
      UIDL.  USER, PASS, APOP, DELE and RSET simply return "+OK"
      without doing anything.  Also understands the 'KILL' command, to
!     kill it, and a 'SLOW' command, to change to really slow retrieval.
!     The mail content is the example messages above.
      """
  
***************
*** 123,127 ****
          self.maildrop = [spam1, good1]
          self.set_terminator('\r\n')
!         self.okCommands = ['USER', 'PASS', 'APOP', 'NOOP',
                             'DELE', 'RSET', 'QUIT', 'KILL']
          self.handlers = {'CAPA': self.onCapa,
--- 125,129 ----
          self.maildrop = [spam1, good1]
          self.set_terminator('\r\n')
!         self.okCommands = ['USER', 'PASS', 'APOP', 'NOOP', 'SLOW',
                             'DELE', 'RSET', 'QUIT', 'KILL']
          self.handlers = {'CAPA': self.onCapa,
***************
*** 132,135 ****
--- 134,138 ----
          self.push("+OK ready\r\n")
          self.request = ''
+         self.push_delay = 0.0 # 0.02 is a useful value for testing.
  
      def collect_incoming_data(self, data):
***************
*** 148,165 ****
              if command == 'QUIT':
                  self.close_when_done()
!             if command == 'KILL':
                  self.socket.shutdown(2)
                  self.close()
                  raise SystemExit
          else:
              handler = self.handlers.get(command, self.onUnknown)
!             self.push(handler(command, args))   # Or push_slowly for testing
          self.request = ''
  
      def push_slowly(self, response):
!         """Useful for testing."""
!         for c in response:
!             self.push(c)
!             time.sleep(0.02)
  
      def onCapa(self, command, args):
--- 151,179 ----
              if command == 'QUIT':
                  self.close_when_done()
!             elif command == 'KILL':
                  self.socket.shutdown(2)
                  self.close()
                  raise SystemExit
+             elif command == 'SLOW':
+                 self.push_delay = 1.0
          else:
              handler = self.handlers.get(command, self.onUnknown)
!             self.push_slowly(handler(command, args))
          self.request = ''
  
      def push_slowly(self, response):
!         """Sometimes we push out the response slowly to try and generate
!         timeouts.  If the delay is 0, this just does a regular push."""
!         if self.push_delay:
!             for c in response.split('\n'):
!                 if c and c[-1] == '\r':
!                     self.push(c + '\n')
!                 else:
!                     # We want to trigger onServerLine, so need the '\r',
!                     # so modify the message just a wee bit.
!                     self.push(c + '\r\n')
!                 time.sleep(self.push_delay * len(c))
!         else:
!             self.push(response)
  
      def onCapa(self, command, args):
***************
*** 291,295 ****
  
      # Ask for the capabilities via the proxy, and verify that the proxy
!     # is filtering out the PIPELINING capability.
      proxy.send("capa\r\n")
      response = proxy.recv(1000)
--- 305,309 ----
  
      # Ask for the capabilities via the proxy, and verify that the proxy
!     # is filtering out the STLS capability.
      proxy.send("capa\r\n")
      response = proxy.recv(1000)
***************
*** 311,314 ****
--- 325,341 ----
          assert response.find(options["Headers", "classification_header_name"]) >= 0
  
+     # Check that the proxy times out when it should.
+     options["pop3proxy", "retrieval_timeout"] = 30
+     options["Headers", "include_evidence"] = False
+     assert spam1.find('\n\n') > options["pop3proxy", "retrieval_timeout"]
+     print "This test is rather slow..."
+     proxy.send("slow\r\n")
+     response = proxy.recv(100)
+     assert response.find("OK") != -1
+     proxy.send("retr 1\r\n")
+     response = proxy.recv(1000)
+     assert len(response) < len(spam1)
+     print "Slow test done.  Thanks for waiting!"
+ 
      # Smoke-test the HTML UI.
      httpServer = socket.socket(socket.AF_INET, socket.SOCK_STREAM)

From anadelonbrin at users.sourceforge.net  Tue Nov  9 04:13:03 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Tue Nov  9 04:13:07 2004
Subject: [Spambayes-checkins] spambayes CHANGELOG.txt,1.47,1.48
Message-ID: 

Update of /cvsroot/spambayes/spambayes
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv14948

Modified Files:
	CHANGELOG.txt 
Log Message:
Bring up-to-date.

Index: CHANGELOG.txt
===================================================================
RCS file: /cvsroot/spambayes/spambayes/CHANGELOG.txt,v
retrieving revision 1.47
retrieving revision 1.48
diff -C2 -d -r1.47 -r1.48
*** CHANGELOG.txt	12 Oct 2004 23:53:24 -0000	1.47
--- CHANGELOG.txt	9 Nov 2004 03:13:00 -0000	1.48
***************
*** 3,6 ****
--- 3,36 ----
  Release 1.1a1
  =============
+ Tony Meyer        09/11/2004  Implement [ 870524 ] Make the message-proxy timeout configurable
+ Tony Meyer        09/11/2004  Use email.message_from_string(text, _class) rather than our wrapper functions.
+ Tony Meyer        09/11/2004  Implement [ 940547 ] imapfilter interface available when using -l switch
+ Tony Meyer        08/11/2004  Outlook: Add two extra items to the "spam clues" for the message: last filtered score/class and if it has been trained.
+ Tony Meyer        05/11/2004  Add unittests for sb_pop3dnd.py
+ Tony Meyer        05/11/2004  sb_pop3dnd: remove use of the web interface
+ Tony Meyer        05/11/2004  sb_pop3dnd: fix bug in getHeaders where negation wouldn't work correctly
+ Tony Meyer        05/11/2004  sb_pop3dnd: fix loading of dynamic messages to correctly generate the headers, so that envelope works.
+ Tony Meyer        05/11/2004  sb_pop3dnd: change the fake email addresses to the same format as the notate_to option (i.e. @spambayes.invalid)
+ Tony Meyer        05/11/2004  sb_pop3dnd: improve the "about" message to include the docstring.
+ Tony Meyer        05/11/2004  sb_pop3dnd: add a dynamic stats message.
+ Tony Meyer        05/11/2004  sb_pop3dnd: improve the dynamic status message to include everything that would normally be on the web interface.
+ Tony Meyer        05/11/2004  sb_pop3dnd: add a "train as spam" folder, to separate out training and classifying as spam.
+ Tony Meyer        05/11/2004  sb_pop3dnd: use twisted.Application in the new style to avoid deprecation warnings.
+ Tony Meyer        03/11/2004  Add [ 1052816 ] I18N - mostly the patch from Hernan Martinez Foffani
+ Tony Meyer        03/11/2004  Fix [ 1022848 ] sb_dbexpimp.py crashes while importing into pickle file
+ Tony Meyer        03/11/2004  Fix [ 831864 ] sb_mboxtrain.py: flock vs. lockf
+ Tony Meyer        03/11/2004  Fix [ 922063 ] Intermittent sb_filter.py faliure with URL pickle
+ Tony Meyer        03/11/2004  Outlook: Also add an "X-Exchange-Delivery-Time" header to the faked up Exchange headers.
+ Tony Meyer        02/11/2004  Improve the web interface statistics
+ Tony Meyer        29/10/2004  If possible, use the builtin (faster, C-implemented) set class, falling back to sets.Set, then back to our compatsets.Set
+ Tony Meyer        28/10/2004  Add [ 715248 ] Pickle classifier should save to a temp file first
+ Tony Meyer        28/10/2004  Add [ 938992 ] Allow longer background filtering delays
+ Tony Meyer        27/10/2004  Add a variety of improvements to sb_culler.py contributed by Andrew Dalke
+ Tony Meyer        27/10/2004  Update sb_culler.py to match current open_storage() usage
+ Tony Meyer        21/10/2004  Fix [ 1051081 ] uncaught socket timeoutexception slurping URLs
+ Tony Meyer        20/10/2004  Outlook: Let the statistics have a variable number of decimal places for the percentages (1  by default).
+ Tony Meyer        18/10/2004  Make msgs.Msg objects pickleable
+ Tony Meyer        18/10/2004  Copy Skip's -o command line option (available in all the regular scripts) to timcv.py.
+ Tony Meyer        18/10/2004  TestDriver: If show_histograms was False, then the global ham/spam histogram never had the stats computed, but this gets used later, so the script would die with an AtrributeError. Fix that.
  Tony Meyer        13/10/2004  Add Classifier.use_bigrams option to the Advanced options page for sb_server and imapfilter.
  Tony Meyer        13/10/2004  Fix mySQL storage option for the case where the server does not support rollbacks.

From anadelonbrin at users.sourceforge.net  Tue Nov  9 22:47:01 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Tue Nov  9 22:47:05 2004
Subject: [Spambayes-checkins] spambayes/windows .cvsignore,1.3,1.3.4.1
Message-ID: 

Update of /cvsroot/spambayes/spambayes/windows
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv14158/windows

Modified Files:
      Tag: release_1_0-branch
	.cvsignore 
Log Message:
Backport ignoring *pyc

Index: .cvsignore
===================================================================
RCS file: /cvsroot/spambayes/spambayes/windows/.cvsignore,v
retrieving revision 1.3
retrieving revision 1.3.4.1
diff -C2 -d -r1.3 -r1.3.4.1
*** .cvsignore	12 Feb 2004 21:11:26 -0000	1.3
--- .cvsignore	9 Nov 2004 21:46:53 -0000	1.3.4.1
***************
*** 1,2 ****
--- 1,3 ----
  SpamBayes-Setup.exe
  spambayes-*.exe
+ *.pyc

From anadelonbrin at users.sourceforge.net  Tue Nov  9 22:48:21 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Tue Nov  9 22:48:25 2004
Subject: [Spambayes-checkins] 
	spambayes/windows/docs/images .cvsignore, NONE, 1.1.2.1
Message-ID: 

Update of /cvsroot/spambayes/spambayes/windows/docs/images
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv14373/windows/docs/images

Added Files:
      Tag: release_1_0-branch
	.cvsignore 
Log Message:
Ignore Windows thumbs.db file.

--- NEW FILE: .cvsignore ---
Thumbs.db

From anadelonbrin at users.sourceforge.net  Tue Nov  9 23:03:31 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Tue Nov  9 23:03:35 2004
Subject: [Spambayes-checkins] spambayes/spambayes classifier.py, 1.23,
	1.23.4.1
Message-ID: 

Update of /cvsroot/spambayes/spambayes/spambayes
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv19075/spambayes

Modified Files:
      Tag: release_1_0-branch
	classifier.py 
Log Message:
Backport:

Fix [ 922063 ] Intermittent sb_filter.py faliure with URL pickle
Fix [ 1051081 ] uncaught socket timeoutexception slurping URLs

Index: classifier.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/spambayes/classifier.py,v
retrieving revision 1.23
retrieving revision 1.23.4.1
diff -C2 -d -r1.23 -r1.23.4.1
*** classifier.py	6 Feb 2004 21:43:00 -0000	1.23
--- classifier.py	9 Nov 2004 22:03:27 -0000	1.23.4.1
***************
*** 527,533 ****
          'synthetic' tokens get bigram'ed, too.
  
!         The bigram token is simply "unigram1 unigram2" - a space should
          be sufficient as a separator, since spaces aren't in any other
!         tokens, apart from 'synthetic' ones.
  
          If the experimental "Classifier":"x-use_bigrams" option is
--- 527,536 ----
          'synthetic' tokens get bigram'ed, too.
  
!         The bigram token is simply "bi:unigram1 unigram2" - a space should
          be sufficient as a separator, since spaces aren't in any other
!         tokens, apart from 'synthetic' ones.  The "bi:" prefix is added
!         to avoid conflict with tokens we generate (like "subject: word",
!         which could be "word" in a subject, or a bigram of "subject:" and
!         "word").
  
          If the experimental "Classifier":"x-use_bigrams" option is
***************
*** 607,611 ****
          if os.path.exists(self.bad_url_cache_name):
              b_file = file(self.bad_url_cache_name, "r")
!             self.bad_urls = pickle.load(b_file)
              b_file.close()
          else:
--- 610,623 ----
          if os.path.exists(self.bad_url_cache_name):
              b_file = file(self.bad_url_cache_name, "r")
!             try:
!                 self.bad_urls = pickle.load(b_file)
!             except IOError, ValueError:
!                 # Something went wrong loading it (bad pickle,
!                 # probably).  Start afresh.
!                 if options["globals", "verbose"]:
!                     print >>sys.stderr, "Bad URL pickle, using new."
!                 self.bad_urls = {"url:non_resolving": (),
!                                  "url:non_html": (),
!                                  "url:unknown_error": ()}
              b_file.close()
          else:
***************
*** 617,621 ****
          if os.path.exists(self.http_error_cache_name):
              h_file = file(self.http_error_cache_name, "r")
!             self.http_error_urls = pickle.load(h_file)
              h_file.close()
          else:
--- 629,640 ----
          if os.path.exists(self.http_error_cache_name):
              h_file = file(self.http_error_cache_name, "r")
!             try:
!                 self.http_error_urls = pickle.load(h_file)
!             except IOError, ValueError:
!                 # Something went wrong loading it (bad pickle,
!                 # probably).  Start afresh.
!                 if options["globals", "verbose"]:
!                     print >>sys.stderr, "Bad HHTP error pickle, using new."
!                 self.http_error_urls = {}
              h_file.close()
          else:
***************
*** 626,635 ****
          # XXX be a good thing long-term (if a previously invalid URL
          # XXX becomes valid, for example).
!         b_file = file(self.bad_url_cache_name, "w")
!         pickle.dump(self.bad_urls, b_file)
!         b_file.close()
!         h_file = file(self.http_error_cache_name, "w")
!         pickle.dump(self.http_error_urls, h_file)
!         h_file.close()
  
      def slurp(self, proto, url):
--- 645,661 ----
          # XXX be a good thing long-term (if a previously invalid URL
          # XXX becomes valid, for example).
!         for name, data in [(self.bad_url_cache_name, self.bad_urls),
!                            (self.http_error_cache_name, self.http_error_urls),]:
!             # Save to a temp file first, in case something goes wrong.
!             cache = open(name + ".tmp", "w")
!             pickle.dump(data, cache)
!             cache.close()
!             try:
!                 os.rename(name + ".tmp", name)
!             except OSError:
!                 # Atomic replace isn't possible with win32, so just
!                 # remove and rename.
!                 os.remove(name)
!                 os.rename(name + ".tmp", name)
  
      def slurp(self, proto, url):
***************
*** 698,711 ****
                  return ["url:unknown_error"]
  
!             # Anything that isn't text/html is ignored
!             content_type = f.info().get('content-type')
!             if content_type is None or \
!                not content_type.startswith("text/html"):
!                 self.bad_urls["url:non_html"] += (url,)
!                 return ["url:non_html"]
  
!             page = f.read()
!             headers = str(f.info())
!             f.close()
              fake_message_string = headers + "\r\n" + page
  
--- 724,743 ----
                  return ["url:unknown_error"]
  
!             try:
!                 # Anything that isn't text/html is ignored
!                 content_type = f.info().get('content-type')
!                 if content_type is None or \
!                    not content_type.startswith("text/html"):
!                     self.bad_urls["url:non_html"] += (url,)
!                     return ["url:non_html"]
  
!                 page = f.read()
!                 headers = str(f.info())
!                 f.close()
!             except socket.error:
!                 # This is probably a temporary error, like a timeout.
!                 # For now, just bail out.
!                 return []
!             
              fake_message_string = headers + "\r\n" + page
  

From anadelonbrin at users.sourceforge.net  Tue Nov  9 23:07:32 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Tue Nov  9 23:07:35 2004
Subject: [Spambayes-checkins] spambayes/spambayes storage.py,1.41,1.41.4.1
Message-ID: 

Update of /cvsroot/spambayes/spambayes/spambayes
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv20118/spambayes

Modified Files:
      Tag: release_1_0-branch
	storage.py 
Log Message:
Backport:

[ 715248 ] Pickle classifier should save to a temp file first
Fix mySQL storage option for the case where the server does not support rollbacks.

Index: storage.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/spambayes/storage.py,v
retrieving revision 1.41
retrieving revision 1.41.4.1
diff -C2 -d -r1.41 -r1.41.4.1
*** storage.py	2 Apr 2004 18:10:52 -0000	1.41
--- storage.py	9 Nov 2004 22:07:29 -0000	1.41.4.1
***************
*** 63,66 ****
--- 63,67 ----
          return not not val
  
+ import os
  import sys
  import types
***************
*** 138,144 ****
              print >> sys.stderr, 'Persisting',self.db_name,'as a pickle'
  
!         fp = open(self.db_name, 'wb')
!         pickle.dump(self, fp, PICKLE_TYPE)
!         fp.close()
  
      def close(self):
--- 139,167 ----
              print >> sys.stderr, 'Persisting',self.db_name,'as a pickle'
  
!         # Be as defensive as possible; keep always a safe copy.
!         tmp = self.db_name + '.tmp'
!         try: 
!             fp = open(tmp, 'wb') 
!             pickle.dump(self, fp, PICKLE_TYPE) 
!             fp.close() 
!         except IOError, e: 
!             if options["globals", "verbose"]: 
!                 print 'Failed update: ' + str(e)
!             if fp is not None: 
!                 os.remove(tmp) 
!             raise
!         try:
!             # With *nix we can just rename, and (as long as permissions
!             # are correct) the old file will vanish.  With win32, this
!             # won't work - the Python help says that there may not be
!             # a way to do an atomic replace, so we rename the old one,
!             # put the new one there, and then delete the old one.  If
!             # something goes wrong, there is at least a copy of the old
!             # one.
!             os.rename(tmp, self.db_name)
!         except OSError:
!             os.rename(self.db_name, self.db_name + '.bak')
!             os.rename(tmp, self.db_name)
!             os.remove(self.db_name + '.bak')
  
      def close(self):
***************
*** 535,539 ****
              c.execute("select count(*) from bayes")
          except MySQLdb.ProgrammingError:
!             self.db.rollback()
              self.create_bayes()
  
--- 558,568 ----
              c.execute("select count(*) from bayes")
          except MySQLdb.ProgrammingError:
!             try:
!                 self.db.rollback()
!             except MySQLdb.NotSupportedError:
!                 # Server doesn't support rollback, so just assume that
!                 # we can keep going and create the db.  This should only
!                 # happen once, anyway.
!                 pass
              self.create_bayes()
  

From anadelonbrin at users.sourceforge.net  Tue Nov  9 23:09:51 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Tue Nov  9 23:09:54 2004
Subject: [Spambayes-checkins] spambayes/spambayes TestDriver.py,1.4,1.4.6.1
Message-ID: 

Update of /cvsroot/spambayes/spambayes/spambayes
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv20570/spambayes

Modified Files:
      Tag: release_1_0-branch
	TestDriver.py 
Log Message:
Backport:

TestDriver: If show_histograms was False, then the global ham/spam histogram never had the stats computed, but this gets used later, so the script would die with an AtrributeError. Fix that.

Index: TestDriver.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/spambayes/TestDriver.py,v
retrieving revision 1.4
retrieving revision 1.4.6.1
diff -C2 -d -r1.4 -r1.4.6.1
*** TestDriver.py	5 Sep 2003 01:15:28 -0000	1.4
--- TestDriver.py	9 Nov 2004 22:09:48 -0000	1.4.6.1
***************
*** 206,209 ****
--- 206,211 ----
              besthamcut = options["Categorization", "ham_cutoff"]
              bestspamcut = options["Categorization", "spam_cutoff"]
+             self.global_ham_hist.compute_stats()
+             self.global_spam_hist.compute_stats()
          nham = self.global_ham_hist.n
          nspam = self.global_spam_hist.n

From anadelonbrin at users.sourceforge.net  Tue Nov  9 23:27:27 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Tue Nov  9 23:27:29 2004
Subject: [Spambayes-checkins] spambayes/scripts sb_imapfilter.py, 1.30,
	1.30.4.1
Message-ID: 

Update of /cvsroot/spambayes/spambayes/scripts
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv24210/scripts

Modified Files:
      Tag: release_1_0-branch
	sb_imapfilter.py 
Log Message:
Backport:

Fix [ 959937 ] "Invalid server" message not always correct

Index: sb_imapfilter.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/scripts/sb_imapfilter.py,v
retrieving revision 1.30
retrieving revision 1.30.4.1
diff -C2 -d -r1.30 -r1.30.4.1
*** sb_imapfilter.py	3 May 2004 02:12:32 -0000	1.30
--- sb_imapfilter.py	9 Nov 2004 22:27:23 -0000	1.30.4.1
***************
*** 203,212 ****
          try:
              BaseIMAP.__init__(self, server, port)
!         except:
!             # A more specific except would be good here, but I get
!             # (in Python 2.2) a generic 'error' and a 'gaierror'
!             # if I pass a valid domain that isn't an IMAP server
!             # or invalid domain (respectively)
!             print "Invalid server or port, please check these settings."
              sys.exit(-1)
          self.debug = debug
--- 203,208 ----
          try:
              BaseIMAP.__init__(self, server, port)
!         except (BaseIMAP.error, socket.gaierror, socket.error):
!             print "Cannot connect to server %s on port %s" % (server, port)
              sys.exit(-1)
          self.debug = debug

From anadelonbrin at users.sourceforge.net  Tue Nov  9 23:38:03 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Tue Nov  9 23:38:06 2004
Subject: [Spambayes-checkins] 
	spambayes/windows/py2exe setup_all.py, 1.17.4.2, 1.17.4.3
Message-ID: 

Update of /cvsroot/spambayes/spambayes/windows/py2exe
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv26721/windows/py2exe

Modified Files:
      Tag: release_1_0-branch
	setup_all.py 
Log Message:
Backport:

Fix [941639] and [986353].  Use a non-standard extension for our py2exe created zip to get around Windows extensions that automatically expand zip files.

Index: setup_all.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/windows/py2exe/setup_all.py,v
retrieving revision 1.17.4.2
retrieving revision 1.17.4.3
diff -C2 -d -r1.17.4.2 -r1.17.4.3
*** setup_all.py	26 Jun 2004 03:38:41 -0000	1.17.4.2
--- setup_all.py	9 Nov 2004 22:37:47 -0000	1.17.4.3
***************
*** 161,164 ****
        data_files = outlook_data_files + proxy_data_files + common_data_files,
        options = {"py2exe" : py2exe_options},
!       zipfile = "lib/spambayes.zip",
  )
--- 161,164 ----
        data_files = outlook_data_files + proxy_data_files + common_data_files,
        options = {"py2exe" : py2exe_options},
!       zipfile = "lib/spambayes.modules",
  )

From anadelonbrin at users.sourceforge.net  Tue Nov  9 23:41:16 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Tue Nov  9 23:41:18 2004
Subject: [Spambayes-checkins] spambayes/spambayes Version.py, 1.31.4.2,
	1.31.4.3
Message-ID: 

Update of /cvsroot/spambayes/spambayes/spambayes
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv27449/spambayes

Modified Files:
      Tag: release_1_0-branch
	Version.py 
Log Message:
Backport:

For proxy handler for version checking, the proxy port needs to be an integer, not a string.

Index: Version.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/spambayes/Version.py,v
retrieving revision 1.31.4.2
retrieving revision 1.31.4.3
diff -C2 -d -r1.31.4.2 -r1.31.4.3
*** Version.py	8 Jul 2004 23:51:24 -0000	1.31.4.2
--- Version.py	9 Nov 2004 22:41:13 -0000	1.31.4.3
***************
*** 134,144 ****
          if ':' in server:
              server, port = server.split(':', 1)
          else:
              port = 8080
!         username = options["globals", "proxy_username"]
!         password = options["globals", "proxy_password"]
          proxy_support = urllib2.ProxyHandler({"http" :
!                                               "http://%s:%s@%s:%d" % \
!                                               (username, password, server,
                                                 port)})
          opener = urllib2.build_opener(proxy_support, urllib2.HTTPHandler)
--- 134,149 ----
          if ':' in server:
              server, port = server.split(':', 1)
+             port = int(port)
          else:
              port = 8080
!         if options["globals", "proxy_username"]:
!             user_pass_string = "%s:%s" % \
!                                (options["globals", "proxy_username"],
!                                 options["globals", "proxy_password"])
!         else:
!             user_pass_string = ""
          proxy_support = urllib2.ProxyHandler({"http" :
!                                               "http://%s@%s:%d" % \
!                                               (user_pass_string, server,
                                                 port)})
          opener = urllib2.build_opener(proxy_support, urllib2.HTTPHandler)

From anadelonbrin at users.sourceforge.net  Tue Nov  9 23:49:00 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Tue Nov  9 23:49:04 2004
Subject: [Spambayes-checkins] spambayes/scripts sb_imapfilter.py, 1.30.4.1,
	1.30.4.2
Message-ID: 

Update of /cvsroot/spambayes/spambayes/scripts
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv29194/scripts

Modified Files:
      Tag: release_1_0-branch
	sb_imapfilter.py 
Log Message:
Backport:

imapfilter: Quote the search string that tries to find the message again that was just saved.

Index: sb_imapfilter.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/scripts/sb_imapfilter.py,v
retrieving revision 1.30.4.1
retrieving revision 1.30.4.2
diff -C2 -d -r1.30.4.1 -r1.30.4.2
*** sb_imapfilter.py	9 Nov 2004 22:27:23 -0000	1.30.4.1
--- sb_imapfilter.py	9 Nov 2004 22:48:41 -0000	1.30.4.2
***************
*** 520,526 ****
          # have to use it for IMAP operations.
          imap.SelectFolder(self.folder.name)
!         response = imap.uid("SEARCH", "(UNDELETED HEADER " + \
!                             options["Headers", "mailid_header_name"] + \
!                             " " + self.id + ")")
          self._check(response, 'search')
          new_id = response[1][0]
--- 520,526 ----
          # have to use it for IMAP operations.
          imap.SelectFolder(self.folder.name)
!         response = imap.uid("SEARCH", "(UNDELETED HEADER %s \"%s\")" % \
!                             (options["Headers", "mailid_header_name"],
!                              self.id.replace('\\',r'\\').replace('"',r'\"')))
          self._check(response, 'search')
          new_id = response[1][0]

From anadelonbrin at users.sourceforge.net  Tue Nov  9 23:51:09 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Tue Nov  9 23:51:11 2004
Subject: [Spambayes-checkins] spambayes/scripts sb_mboxtrain.py, 1.11.4.2,
	1.11.4.3
Message-ID: 

Update of /cvsroot/spambayes/spambayes/scripts
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv29812/scripts

Modified Files:
      Tag: release_1_0-branch
	sb_mboxtrain.py 
Log Message:
Backport:

Fix [ 831864 ] sb_mboxtrain.py: flock vs. lockf

Index: sb_mboxtrain.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/scripts/sb_mboxtrain.py,v
retrieving revision 1.11.4.2
retrieving revision 1.11.4.3
diff -C2 -d -r1.11.4.2 -r1.11.4.3
*** sb_mboxtrain.py	15 Oct 2004 05:45:41 -0000	1.11.4.2
--- sb_mboxtrain.py	9 Nov 2004 22:51:06 -0000	1.11.4.3
***************
*** 210,214 ****
              raise
  
!     fcntl.lockf(f, fcntl.LOCK_UN)
      f.close()
      if loud:
--- 210,214 ----
              raise
  
!     fcntl.flock(f, fcntl.LOCK_UN)
      f.close()
      if loud:

From anadelonbrin at users.sourceforge.net  Tue Nov  9 23:53:04 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Tue Nov  9 23:53:07 2004
Subject: [Spambayes-checkins] spambayes/scripts sb_dbexpimp.py, 1.12.4.1,
	1.12.4.2
Message-ID: 

Update of /cvsroot/spambayes/spambayes/scripts
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv30099/scripts

Modified Files:
      Tag: release_1_0-branch
	sb_dbexpimp.py 
Log Message:
Backport:

Fix [ 1022848 ] sb_dbexpimp.py crashes while importing into pickle file

Index: sb_dbexpimp.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/scripts/sb_dbexpimp.py,v
retrieving revision 1.12.4.1
retrieving revision 1.12.4.2
diff -C2 -d -r1.12.4.1 -r1.12.4.2
*** sb_dbexpimp.py	10 Jun 2004 05:17:12 -0000	1.12.4.1
--- sb_dbexpimp.py	9 Nov 2004 22:53:02 -0000	1.12.4.2
***************
*** 230,234 ****
      print "Finished storing database"
  
!     if useDBM:
          words = bayes.db.keys()
          words.remove(bayes.statekey)
--- 230,234 ----
      print "Finished storing database"
  
!     if useDBM == "dbm" or useDBM == True:
          words = bayes.db.keys()
          words.remove(bayes.statekey)
***************
*** 250,254 ****
          sys.exit()
  
!     useDBM = False
      newDBM = True
      dbFN = None
--- 250,254 ----
          sys.exit()
  
!     useDBM = "pickle"
      newDBM = True
      dbFN = None

From anadelonbrin at users.sourceforge.net  Tue Nov  9 23:53:58 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Tue Nov  9 23:54:02 2004
Subject: [Spambayes-checkins] spambayes CHANGELOG.txt,1.44.4.2,1.44.4.3
Message-ID: 

Update of /cvsroot/spambayes/spambayes
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv30267

Modified Files:
      Tag: release_1_0-branch
	CHANGELOG.txt 
Log Message:
Bring up-to-date.

Index: CHANGELOG.txt
===================================================================
RCS file: /cvsroot/spambayes/spambayes/CHANGELOG.txt,v
retrieving revision 1.44.4.2
retrieving revision 1.44.4.3
diff -C2 -d -r1.44.4.2 -r1.44.4.3
*** CHANGELOG.txt	19 Jul 2004 03:21:45 -0000	1.44.4.2
--- CHANGELOG.txt	9 Nov 2004 22:53:55 -0000	1.44.4.3
***************
*** 1,4 ****
--- 1,26 ----
  [Note that all dates are in English, not American format - i.e. day/month/year]
  
+ Release 1.0.1
+ =============
+ Tony Meyer        03/11/2004  Fix [ 1022848 ] sb_dbexpimp.py crashes while importing into pickle file
+ Tony Meyer        03/11/2004  Fix [ 831864 ] sb_mboxtrain.py: flock vs. lockf
+ Tony Meyer        03/11/2004  Fix [ 922063 ] Intermittent sb_filter.py failure with URL pickle
+ Tony Meyer        29/09/2004  Fix [ 1036601 ] typo on advanced config web page
+ Tony Meyer        28/10/2004  Add [ 715248 ] Pickle classifier should save to a temp file first
+ Tony Meyer        21/10/2004  Fix [ 1051081 ] uncaught socket timeoutexception slurping URLs
+ Tony Meyer        18/10/2004  TestDriver: If show_histograms was False, then the global ham/spam histogram never had the stats computed, but this gets used later, so the script would die with an AtrributeError. Fix that.
+ Tony Meyer        13/10/2004  Fix mySQL storage option for the case where the server does not support rollbacks.
+ Sjoerd Mullender  02/10/2004  imapfilter: Quote the search string that tries to find the message again that was just saved.
+ Tony Meyer        30/09/2004  Fix [ 903905 ] IMAP Configuration Error
+ Tony Meyer        15/09/2004  sb_upload: Clarify docstring so that it's more clear what this script does. The -n / --null command line option didn't actually do anything; change it so that it does.
+ Tony Meyer        23/07/2004  For proxy handler for version checking, the proxy port needs to be an integer, not a string.
+ Tony Meyer        19/07/2004  Fix [ 990700 ] Changes to asyncore in Python 2.4 break ServerLineReader
+ Kenny Pitt        17/07/2004  Fix [941639] and [986353].  Use a non-standard extension for our py2exe created zip to get around Windows extensions that automatically expand zip files.
+ Tony Meyer        14/07/2004  Fix [ 790757 ] signal handler created with wrong # of args
+ Tony Meyer        14/07/2004  Fix [ 944109 ] notate_to/subject option valid values should be dynamic
+ Tony Meyer        14/07/2004  Fix [ 959937 ] "Invalid server" message not always correct
+ Skip Montanaro    10/07/2004  tte.py: 2.3 compatibility: add reversed() function
+ Tony Meyer        09/07/2004  Using -u with sb_server had been broken.  Fix this.
+ 
  1.0 Final
  =========

From anadelonbrin at users.sourceforge.net  Wed Nov 10 23:08:47 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Wed Nov 10 23:08:52 2004
Subject: [Spambayes-checkins] spambayes/windows spambayes.iss,1.17,1.18
Message-ID: 

Update of /cvsroot/spambayes/spambayes/windows
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv12448/windows

Modified Files:
	spambayes.iss 
Log Message:
I emailed spambayes-dev about this on Oct 22, but then clean forgot about checking
 the change in!  Thankfully a spambayes@python.org message reminded me, or this bug
 would have made it through to 1.0.1...

It appears that our installers aren't offering to create a startup icon for
sb_server users as it should be.

The problem is the "Check: InstallingProxy" line in the Inno script.
Although the selection code has been run by the time the tasks are offered,
it still has the default (False) value.  I've played around with the code,
but can't figure a way around this (although my Pascal is extremely rusty).  Maybe
it's an Inno bug or something.

We can fix it by removing that Check.  Outlook users don't get that page
anyway, so they won't see the option (this is what happens with the desktop
icon).  However, since we want it checked by default, it will appear in the
text box of additional tasks, even though it doesn't happen.  I say this doesn't matter,
and the vast quiet on spambayes-dev tells me people either agree or don't care .

Index: spambayes.iss
===================================================================
RCS file: /cvsroot/spambayes/spambayes/windows/spambayes.iss,v
retrieving revision 1.17
retrieving revision 1.18
diff -C2 -d -r1.17 -r1.18
*** spambayes.iss	10 Jun 2004 04:38:26 -0000	1.17
--- spambayes.iss	10 Nov 2004 22:08:44 -0000	1.18
***************
*** 53,57 ****
  
  [Tasks]
! Name: startup; Description: "Execute SpamBayes each time Windows starts"; Check: InstallingProxy
  Name: desktop; Description: "Add an icon to the desktop"; Flags: unchecked;
  
--- 53,57 ----
  
  [Tasks]
! Name: startup; Description: "Execute SpamBayes each time Windows starts";
  Name: desktop; Description: "Add an icon to the desktop"; Flags: unchecked;
  

From anadelonbrin at users.sourceforge.net  Wed Nov 10 23:15:39 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Wed Nov 10 23:15:43 2004
Subject: [Spambayes-checkins] spambayes/windows spambayes.iss, 1.15.4.3,
	1.15.4.4
Message-ID: 

Update of /cvsroot/spambayes/spambayes/windows
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv13954/windows

Modified Files:
      Tag: release_1_0-branch
	spambayes.iss 
Log Message:
Backport fix for installation of startup icon.

Index: spambayes.iss
===================================================================
RCS file: /cvsroot/spambayes/spambayes/windows/spambayes.iss,v
retrieving revision 1.15.4.3
retrieving revision 1.15.4.4
diff -C2 -d -r1.15.4.3 -r1.15.4.4
*** spambayes.iss	21 Sep 2004 08:04:39 -0000	1.15.4.3
--- spambayes.iss	10 Nov 2004 22:15:37 -0000	1.15.4.4
***************
*** 5,11 ****
  [Setup]
  ; Version specific constants
! AppVerName=SpamBayes 1.0
! AppVersion=1.0
! OutputBaseFilename=spambayes-1.0
  ; Normal constants.  Be careful about changing 'AppName'
  AppName=SpamBayes
--- 5,11 ----
  [Setup]
  ; Version specific constants
! AppVerName=SpamBayes 1.0.1
! AppVersion=1.0.1
! OutputBaseFilename=spambayes-1.0.1
  ; Normal constants.  Be careful about changing 'AppName'
  AppName=SpamBayes
***************
*** 53,57 ****
  
  [Tasks]
! Name: startup; Description: "Execute SpamBayes each time Windows starts"; Check: InstallingProxy
  Name: desktop; Description: "Add an icon to the desktop"; Flags: unchecked;
  
--- 53,57 ----
  
  [Tasks]
! Name: startup; Description: "Execute SpamBayes each time Windows starts";
  Name: desktop; Description: "Add an icon to the desktop"; Flags: unchecked;
  
***************
*** 118,122 ****
                   'If this message persists, you may need to log off from Windows, and try again.'
        Result := CheckNoAppMutex('InternetMailTransport', closeit);
-     end;
      // And finally, the SpamBayes server
      if Result then begin
--- 118,121 ----
***************
*** 149,153 ****
    Prompts, Values: array of String;
  begin
- 
      // First open the custom wizard page
      ScriptDlgPageOpen();
--- 148,151 ----

From anadelonbrin at users.sourceforge.net  Thu Nov 11 02:47:03 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Thu Nov 11 02:47:06 2004
Subject: [Spambayes-checkins] spambayes/src sb_bnfilter.c,1.1,1.2
Message-ID: 

Update of /cvsroot/spambayes/spambayes/src
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv27946/src

Added Files:
	sb_bnfilter.c 
Log Message:
Merge Toby's bnfilter_in_c branch, since I can't see any reason why this can't be
 in 1.1.


From anadelonbrin at users.sourceforge.net  Thu Nov 11 02:48:45 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Thu Nov 11 02:48:47 2004
Subject: [Spambayes-checkins] spambayes MANIFEST.in,1.9,1.10
Message-ID: 

Update of /cvsroot/spambayes/spambayes
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv28238

Modified Files:
	MANIFEST.in 
Log Message:
Add the new src directory (for c files) to the manifest.

Index: MANIFEST.in
===================================================================
RCS file: /cvsroot/spambayes/spambayes/MANIFEST.in,v
retrieving revision 1.9
retrieving revision 1.10
diff -C2 -d -r1.9 -r1.10
*** MANIFEST.in	5 Nov 2003 12:45:23 -0000	1.9
--- MANIFEST.in	11 Nov 2004 01:48:43 -0000	1.10
***************
*** 1,3 ****
--- 1,4 ----
  recursive-include spambayes/resources *.html *.psp *.gif
+ recursive-include spambayes/src *.c
  recursive-include spambayes *.py *.txt
  recursive-include pspam *.py *.txt *.ini *.sh

From anadelonbrin at users.sourceforge.net  Thu Nov 11 22:21:59 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Thu Nov 11 22:22:03 2004
Subject: [Spambayes-checkins] spambayes/Outlook2000 addin.py,1.136,1.137
Message-ID: 

Update of /cvsroot/spambayes/spambayes/Outlook2000
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv17984/Outlook2000

Modified Files:
	addin.py 
Log Message:
Update timer checks to match what the dialog currently allows.

Correct addition to 'show spam clues' to give the right classification.

Fix indentation error that I introduced recently - sorry!

Index: addin.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/Outlook2000/addin.py,v
retrieving revision 1.136
retrieving revision 1.137
diff -C2 -d -r1.136 -r1.137
*** addin.py	8 Nov 2004 05:02:09 -0000	1.136
--- addin.py	11 Nov 2004 21:21:55 -0000	1.137
***************
*** 288,292 ****
              elif start_delay < 0.4 or interval < 0.4:
                  too = "too often"
!             elif start_delay > 30 or interval > 30:
                  too = "too infrequently"
              if too:
--- 288,292 ----
              elif start_delay < 0.4 or interval < 0.4:
                  too = "too often"
!             elif start_delay > 60 or interval > 60:
                  too = "too infrequently"
              if too:
***************
*** 463,467 ****
      # people realise that it may not necessarily be the same, and will
      # help diagnosing any 'wrong' scoring reported.
!     original_score = msgstore_message.GetField(mgr.config.general.field_score_name)
      if original_score >= mgr.config.filter.spam_threshold:
          original_class = "spam"
--- 463,468 ----
      # people realise that it may not necessarily be the same, and will
      # help diagnosing any 'wrong' scoring reported.
!     original_score = 100 * msgstore_message.GetField(\
!         mgr.config.general.field_score_name)
      if original_score >= mgr.config.filter.spam_threshold:
          original_class = "spam"
***************
*** 475,479 ****
      else:
          push("When this message was last filtered, it was classified " \
!              "as %s (it scored %d%%)." % (original_class, original_score*100))
      # Report whether this message has been trained or not.
      push("
\n") --- 476,480 ---- else: push("When this message was last filtered, it was classified " \ ! "as %s (it scored %d%%)." % (original_class, original_score)) # Report whether this message has been trained or not. push("
\n") *************** *** 688,693 **** # Must train before moving, else we lose the message! subject = msgstore_message.GetSubject() ! print "Moving and spam training message '%s' - " % (subject,), ! TrainAsSpam(msgstore_message, self.manager, save_db = False) # Do the new message state if necessary. try: --- 689,694 ---- # Must train before moving, else we lose the message! subject = msgstore_message.GetSubject() ! print "Moving and spam training message '%s' - " % (subject,), ! TrainAsSpam(msgstore_message, self.manager, save_db = False) # Do the new message state if necessary. try: *************** *** 751,756 **** self.manager.score(msgstore_message)) # Must train before moving, else we lose the message! ! print "Recovering to folder '%s' and ham training message '%s' - " % (restore_folder.name, subject), ! TrainAsHam(msgstore_message, self.manager, save_db = False) # Do the new message state if necessary. try: --- 752,757 ---- self.manager.score(msgstore_message)) # Must train before moving, else we lose the message! ! print "Recovering to folder '%s' and ham training message '%s' - " % (restore_folder.name, subject), ! TrainAsHam(msgstore_message, self.manager, save_db = False) # Do the new message state if necessary. try: From kpitt at users.sourceforge.net Thu Nov 11 22:55:49 2004 From: kpitt at users.sourceforge.net (Kenny Pitt) Date: Thu Nov 11 22:55:52 2004 Subject: [Spambayes-checkins] spambayes/Outlook2000/dialogs dialog_map.py, 1.40, 1.41 Message-ID: Update of /cvsroot/spambayes/spambayes/Outlook2000/dialogs In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv25701/Outlook2000/dialogs Modified Files: dialog_map.py Log Message: Add a separate Statistics tab to make room for more detailed statistics. Index: dialog_map.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/Outlook2000/dialogs/dialog_map.py,v retrieving revision 1.40 retrieving revision 1.41 diff -C2 -d -r1.40 -r1.41 *** dialog_map.py 28 Oct 2004 04:29:00 -0000 1.40 --- dialog_map.py 11 Nov 2004 21:55:40 -0000 1.41 *************** *** 404,408 **** (TabProcessor, "IDC_TAB", """IDD_GENERAL IDD_FILTER IDD_TRAINING ! IDD_ADVANCED"""), (CommandButtonProcessor, "IDC_ABOUT_BTN", ShowAbout, ()), ), --- 404,408 ---- (TabProcessor, "IDC_TAB", """IDD_GENERAL IDD_FILTER IDD_TRAINING ! IDD_STATISTICS IDD_ADVANCED"""), (CommandButtonProcessor, "IDC_ABOUT_BTN", ShowAbout, ()), ), *************** *** 473,476 **** --- 473,479 ---- ), + "IDD_STATISTICS" : ( + (StatsProcessor, "IDC_STATISTICS"), + ), "IDD_ADVANCED" : ( (BoolButtonProcessor, "IDC_BUT_TIMER_ENABLED", "Filter.timer_enabled", *************** *** 481,485 **** (EditNumberProcessor, "IDC_DELAY2_TEXT IDC_DELAY2_SLIDER", "Filter.timer_interval", 0, 10, 20, 60), (BoolButtonProcessor, "IDC_INBOX_TIMER_ONLY", "Filter.timer_only_receive_folders"), - (StatsProcessor, "IDC_STATISTICS"), (CommandButtonProcessor, "IDC_SHOW_DATA_FOLDER", ShowDataFolder, ()), (DialogCommand, "IDC_BUT_SHOW_DIAGNOSTICS", "IDD_DIAGNOSTIC"), --- 484,487 ---- From kpitt at users.sourceforge.net Thu Nov 11 22:55:49 2004 From: kpitt at users.sourceforge.net (Kenny Pitt) Date: Thu Nov 11 22:55:52 2004 Subject: [Spambayes-checkins] spambayes/Outlook2000/dialogs/resources dialogs.h, 1.21, 1.22 dialogs.rc, 1.47, 1.48 Message-ID: Update of /cvsroot/spambayes/spambayes/Outlook2000/dialogs/resources In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv25701/Outlook2000/dialogs/resources Modified Files: dialogs.h dialogs.rc Log Message: Add a separate Statistics tab to make room for more detailed statistics. Index: dialogs.h =================================================================== RCS file: /cvsroot/spambayes/spambayes/Outlook2000/dialogs/resources/dialogs.h,v retrieving revision 1.21 retrieving revision 1.22 diff -C2 -d -r1.21 -r1.22 *** dialogs.h 29 Sep 2003 02:14:26 -0000 1.21 --- dialogs.h 11 Nov 2004 21:55:46 -0000 1.22 *************** *** 9,12 **** --- 9,13 ---- #define IDD_FOLDER_SELECTOR 105 #define IDD_ADVANCED 106 + #define IDD_STATISTICS 107 #define IDD_GENERAL 108 #define IDD_FILTER_SPAM 110 Index: dialogs.rc =================================================================== RCS file: /cvsroot/spambayes/spambayes/Outlook2000/dialogs/resources/dialogs.rc,v retrieving revision 1.47 retrieving revision 1.48 diff -C2 -d -r1.47 -r1.48 *** dialogs.rc 1 Oct 2004 14:37:37 -0000 1.47 --- dialogs.rc 11 Nov 2004 21:55:46 -0000 1.48 *************** *** 52,58 **** "Button",BS_AUTOCHECKBOX | WS_TABSTOP,16,12,162,10 PUSHBUTTON "Diagnostics...",IDC_BUT_SHOW_DIAGNOSTICS,171,190,70,14 ! GROUPBOX "Statistics",IDC_STATIC,7,125,234,58 LTEXT "some stats\nand some more\nline 3\nline 4\nline 5", ! IDC_STATISTICS,12,134,223,43,SS_SUNKEN END --- 52,65 ---- "Button",BS_AUTOCHECKBOX | WS_TABSTOP,16,12,162,10 PUSHBUTTON "Diagnostics...",IDC_BUT_SHOW_DIAGNOSTICS,171,190,70,14 ! END ! ! IDD_STATISTICS DIALOGEX 0, 0, 248, 209 ! STYLE DS_SETFONT | WS_CHILD ! CAPTION "Statistics" ! FONT 8, "Tahoma", 400, 0, 0x0 ! BEGIN ! GROUPBOX "Statistics",IDC_STATIC,7,3,234,201 LTEXT "some stats\nand some more\nline 3\nline 4\nline 5", ! IDC_STATISTICS,12,12,223,186 END From anadelonbrin at users.sourceforge.net Fri Nov 12 03:48:29 2004 From: anadelonbrin at users.sourceforge.net (Tony Meyer) Date: Fri Nov 12 03:48:32 2004 Subject: [Spambayes-checkins] spambayes/spambayes/test test_sb_dbexpimp.py, NONE, 1.1 Message-ID: Update of /cvsroot/spambayes/spambayes/spambayes/test In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv30228/spambayes/test Added Files: test_sb_dbexpimp.py Log Message: Unit tests for the sb_dbexpimp.py script. --- NEW FILE: test_sb_dbexpimp.py --- # Test sb_dbexpimp script. import os import sys import unittest from spambayes.tokenizer import tokenize from spambayes.storage import open_storage from spambayes.storage import PickledClassifier, DBDictClassifier import sb_test_support sb_test_support.fix_sys_path() import sb_dbexpimp # We borrow the test messages that test_sb_server uses. from test_sb_server import good1, spam1 # WARNING! # If these files exist when running this test, they will be deleted. TEMP_PICKLE_NAME = os.path.join(os.path.dirname(__file__), "temp.pik") TEMP_CSV_NAME = os.path.join(os.path.dirname(__file__), "temp.csv") TEMP_DBM_NAME = os.path.join(os.path.dirname(__file__), "temp.dbm") class dbexpimpTest(unittest.TestCase): def tearDown(self): try: os.remove(TEMP_PICKLE_NAME) os.remove(TEMP_CSV_NAME) os.remove(TEMP_DBM_NAME) except OSError: pass def test_csv_import(self): """Check that we don't import the old object craft csv module.""" self.assert_(hasattr(sb_dbexpimp.csv, "reader")) def test_pickle_export(self): # Create a pickled classifier to export. bayes = PickledClassifier(TEMP_PICKLE_NAME) # Stuff some messages in it so it's not empty. bayes.learn(tokenize(spam1), True) bayes.learn(tokenize(good1), False) # Save. bayes.store() # Export. sb_dbexpimp.runExport(TEMP_PICKLE_NAME, "pickle", TEMP_CSV_NAME) # Verify that the CSV holds all the original data (and, by using # the CSV module to open it, that it is valid CSV data). fp = open(TEMP_CSV_NAME, "rb") reader = sb_dbexpimp.csv.reader(fp) (nham, nspam) = reader.next() self.assertEqual(int(nham), bayes.nham) self.assertEqual(int(nspam), bayes.nspam) for (word, hamcount, spamcount) in reader: word = sb_dbexpimp.uunquote(word) self.assert_(word in bayes._wordinfokeys()) wi = bayes._wordinfoget(word) self.assertEqual(int(hamcount), wi.hamcount) self.assertEqual(int(spamcount), wi.spamcount) def test_dbm_export(self): # Create a dbm classifier to export. bayes = DBDictClassifier(TEMP_DBM_NAME) # Stuff some messages in it so it's not empty. bayes.learn(tokenize(spam1), True) bayes.learn(tokenize(good1), False) # Save & Close. bayes.store() bayes.close() # Export. sb_dbexpimp.runExport(TEMP_DBM_NAME, "dbm", TEMP_CSV_NAME) # Reopen the original. bayes = open_storage(TEMP_DBM_NAME, "dbm") # Verify that the CSV holds all the original data (and, by using # the CSV module to open it, that it is valid CSV data). fp = open(TEMP_CSV_NAME, "rb") reader = sb_dbexpimp.csv.reader(fp) (nham, nspam) = reader.next() self.assertEqual(int(nham), bayes.nham) self.assertEqual(int(nspam), bayes.nspam) for (word, hamcount, spamcount) in reader: word = sb_dbexpimp.uunquote(word) self.assert_(word in bayes._wordinfokeys()) wi = bayes._wordinfoget(word) self.assertEqual(int(hamcount), wi.hamcount) self.assertEqual(int(spamcount), wi.spamcount) def test_import_to_pickle(self): # Create a CSV file to import. temp = open(TEMP_CSV_NAME, "wb") temp.write("3,4\n") csv_data = {"this":(2,1), "is":(0,1), "a":(3,4), 'test':(1,1), "of":(1,0), "the":(1,2), "import":(3,1)} for word, (ham, spam) in csv_data.items(): temp.write("%s,%s,%s\n" % (word, ham, spam)) temp.close() sb_dbexpimp.runImport(TEMP_PICKLE_NAME, "pickle", True, TEMP_CSV_NAME) # Open the converted file and verify that it has all the data from # the CSV file (and by opening it, that it is a valid pickle). bayes = open_storage(TEMP_PICKLE_NAME, "pickle") self.assertEqual(bayes.nham, 3) self.assertEqual(bayes.nspam, 4) for word, (ham, spam) in csv_data.items(): word = sb_dbexpimp.uquote(word) self.assert_(word in bayes._wordinfokeys()) wi = bayes._wordinfoget(word) self.assertEqual(wi.hamcount, ham) self.assertEqual(wi.spamcount, spam) def test_import_to_dbm(self): # Create a CSV file to import. temp = open(TEMP_CSV_NAME, "wb") temp.write("3,4\n") csv_data = {"this":(2,1), "is":(0,1), "a":(3,4), 'test':(1,1), "of":(1,0), "the":(1,2), "import":(3,1)} for word, (ham, spam) in csv_data.items(): temp.write("%s,%s,%s\n" % (word, ham, spam)) temp.close() sb_dbexpimp.runImport(TEMP_DBM_NAME, "dbm", True, TEMP_CSV_NAME) # Open the converted file and verify that it has all the data from # the CSV file (and by opening it, that it is a valid dbm file). bayes = open_storage(TEMP_DBM_NAME, "dbm") self.assertEqual(bayes.nham, 3) self.assertEqual(bayes.nspam, 4) for word, (ham, spam) in csv_data.items(): word = sb_dbexpimp.uquote(word) self.assert_(word in bayes._wordinfokeys()) wi = bayes._wordinfoget(word) self.assertEqual(wi.hamcount, ham) self.assertEqual(wi.spamcount, spam) def suite(): suite = unittest.TestSuite() for cls in (dbexpimpTest, ): suite.addTest(unittest.makeSuite(cls)) return suite if __name__=='__main__': sb_test_support.unittest_main(argv=sys.argv + ['suite']) From anadelonbrin at users.sourceforge.net Mon Nov 15 07:19:16 2004 From: anadelonbrin at users.sourceforge.net (Tony Meyer) Date: Mon Nov 15 07:19:19 2004 Subject: [Spambayes-checkins] spambayes/spambayes/test test_sb_dbexpimp.py, 1.1, 1.2 Message-ID: Update of /cvsroot/spambayes/spambayes/spambayes/test In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv13734/spambayes/test Modified Files: test_sb_dbexpimp.py Log Message: Add tests for merging. Rather than just a comment in the script, ensure that the temp testing files don't exist before running the test script. Index: test_sb_dbexpimp.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/spambayes/test/test_sb_dbexpimp.py,v retrieving revision 1.1 retrieving revision 1.2 diff -C2 -d -r1.1 -r1.2 *** test_sb_dbexpimp.py 12 Nov 2004 02:48:27 -0000 1.1 --- test_sb_dbexpimp.py 15 Nov 2004 06:19:14 -0000 1.2 *************** *** 15,25 **** # We borrow the test messages that test_sb_server uses. from test_sb_server import good1, spam1 - # WARNING! - # If these files exist when running this test, they will be deleted. TEMP_PICKLE_NAME = os.path.join(os.path.dirname(__file__), "temp.pik") TEMP_CSV_NAME = os.path.join(os.path.dirname(__file__), "temp.csv") TEMP_DBM_NAME = os.path.join(os.path.dirname(__file__), "temp.dbm") class dbexpimpTest(unittest.TestCase): --- 15,38 ---- # We borrow the test messages that test_sb_server uses. + # I doubt it really makes much difference, but if we wanted more than + # one message of each type (the tests should all handle this ok) then + # Richie's hammer.py script has code for generating any number of + # randomly composed email messages. from test_sb_server import good1, spam1 TEMP_PICKLE_NAME = os.path.join(os.path.dirname(__file__), "temp.pik") TEMP_CSV_NAME = os.path.join(os.path.dirname(__file__), "temp.csv") TEMP_DBM_NAME = os.path.join(os.path.dirname(__file__), "temp.dbm") + # The chances of anyone having files with these names in the test + # directory is minute, but we don't want to wipe anything, so make + # sure that they don't already exist. Our tearDown code gets rid + # of our copies (whether the tests pass or fail) so they shouldn't + # be ours. + for fn in [TEMP_PICKLE_NAME, TEMP_CSV_NAME, TEMP_DBM_NAME]: + if os.path.exists(fn): + print fn, "already exists. Please remove this file before " \ + "running these tests (a file by that name will be " \ + "created and destroyed as part of the tests)." + sys.exit(1) class dbexpimpTest(unittest.TestCase): *************** *** 32,36 **** pass ! def test_csv_import(self): """Check that we don't import the old object craft csv module.""" self.assert_(hasattr(sb_dbexpimp.csv, "reader")) --- 45,49 ---- pass ! def test_csv_module_import(self): """Check that we don't import the old object craft csv module.""" self.assert_(hasattr(sb_dbexpimp.csv, "reader")) *************** *** 132,135 **** --- 145,232 ---- self.assertEqual(wi.spamcount, spam) + def test_merge_to_pickle(self): + # Create a pickled classifier to merge with. + bayes = PickledClassifier(TEMP_PICKLE_NAME) + # Stuff some messages in it so it's not empty. + bayes.learn(tokenize(spam1), True) + bayes.learn(tokenize(good1), False) + # Save. + bayes.store() + # Create a CSV file to import. + nham, nspam = 3,4 + temp = open(TEMP_CSV_NAME, "wb") + temp.write("%d,%d\n" % (nham, nspam)) + csv_data = {"this":(2,1), "is":(0,1), "a":(3,4), 'test':(1,1), + "of":(1,0), "the":(1,2), "import":(3,1)} + for word, (ham, spam) in csv_data.items(): + temp.write("%s,%s,%s\n" % (word, ham, spam)) + temp.close() + sb_dbexpimp.runImport(TEMP_PICKLE_NAME, "pickle", False, + TEMP_CSV_NAME) + # Open the converted file and verify that it has all the data from + # the CSV file (and by opening it, that it is a valid pickle), + # and the data from the original pickle. + bayes2 = open_storage(TEMP_PICKLE_NAME, "pickle") + self.assertEqual(bayes2.nham, nham + bayes.nham) + self.assertEqual(bayes2.nspam, nspam + bayes.nspam) + words = bayes._wordinfokeys() + words.extend(csv_data.keys()) + for word in words: + word = sb_dbexpimp.uquote(word) + self.assert_(word in bayes2._wordinfokeys()) + h, s = csv_data.get(word, (0,0)) + wi = bayes._wordinfoget(word) + if wi: + h += wi.hamcount + s += wi.spamcount + wi2 = bayes2._wordinfoget(word) + self.assertEqual(h, wi2.hamcount) + self.assertEqual(s, wi2.spamcount) + + def test_merge_to_dbm(self): + # Create a dbm classifier to merge with. + bayes = DBDictClassifier(TEMP_DBM_NAME) + # Stuff some messages in it so it's not empty. + bayes.learn(tokenize(spam1), True) + bayes.learn(tokenize(good1), False) + # Save data to check against. + original_nham = bayes.nham + original_nspam = bayes.nspam + original_data = {} + for key in bayes._wordinfokeys(): + original_data[key] = bayes._wordinfoget(key) + # Save & Close. + bayes.store() + bayes.close() + # Create a CSV file to import. + nham, nspam = 3,4 + temp = open(TEMP_CSV_NAME, "wb") + temp.write("%d,%d\n" % (nham, nspam)) + csv_data = {"this":(2,1), "is":(0,1), "a":(3,4), 'test':(1,1), + "of":(1,0), "the":(1,2), "import":(3,1)} + for word, (ham, spam) in csv_data.items(): + temp.write("%s,%s,%s\n" % (word, ham, spam)) + temp.close() + sb_dbexpimp.runImport(TEMP_DBM_NAME, "dbm", False, TEMP_CSV_NAME) + # Open the converted file and verify that it has all the data from + # the CSV file (and by opening it, that it is a valid dbm file), + # and the data from the original dbm database. + bayes2 = open_storage(TEMP_DBM_NAME, "dbm") + self.assertEqual(bayes2.nham, nham + original_nham) + self.assertEqual(bayes2.nspam, nspam + original_nspam) + words = original_data.keys()[:] + words.extend(csv_data.keys()) + for word in words: + word = sb_dbexpimp.uquote(word) + self.assert_(word in bayes2._wordinfokeys()) + h, s = csv_data.get(word, (0,0)) + wi = original_data.get(word, None) + if wi: + h += wi.hamcount + s += wi.spamcount + wi2 = bayes2._wordinfoget(word) + self.assertEqual(h, wi2.hamcount) + self.assertEqual(s, wi2.spamcount) + def suite(): From anadelonbrin at users.sourceforge.net Mon Nov 15 07:22:07 2004 From: anadelonbrin at users.sourceforge.net (Tony Meyer) Date: Mon Nov 15 07:22:10 2004 Subject: [Spambayes-checkins] spambayes/scripts sb_dbexpimp.py,1.15,1.16 Message-ID: Update of /cvsroot/spambayes/spambayes/scripts In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv14184/scripts Modified Files: sb_dbexpimp.py Log Message: Unittest is paying for itself already! (I really didn't expect to actually find a bug!). Because wordinfo might be just a cache with dbm classifiers, a merged import would lose data for any 'singletons'. Need to use _wordinfoget() instead. Also fail if the csv file doesn't exist that we are trying to import from rather than keeping going, which made no sense. And while I'm here, also stop bothering to remove the .dat and .dir files that dumbdbm create (long time since they were supported), and remove the verbose flag, which doesn't actually do anything. Index: sb_dbexpimp.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/scripts/sb_dbexpimp.py,v retrieving revision 1.15 retrieving revision 1.16 diff -C2 -d -r1.15 -r1.16 *** sb_dbexpimp.py 3 Nov 2004 02:49:30 -0000 1.15 --- sb_dbexpimp.py 15 Nov 2004 06:22:05 -0000 1.16 *************** *** 47,51 **** -e : export -i : import - -v : verbose mode (some additional diagnostic messages) -f: FN : flat file to export to or import from -p: FN : name of pickled database file to use --- 47,50 ---- *************** *** 177,198 **** pass - try: - os.unlink(dbFN+".dat") - except OSError: - pass - - try: - os.unlink(dbFN+".dir") - except OSError: - pass - bayes = spambayes.storage.open_storage(dbFN, useDBM) ! try: ! fp = open(inFN, 'rb') ! except IOError, e: ! if e.errno != errno.ENOENT: ! raise ! rdr = csv.reader(fp) (nham, nspam) = rdr.next() --- 176,182 ---- pass bayes = spambayes.storage.open_storage(dbFN, useDBM) ! fp = open(inFN, 'rb') rdr = csv.reader(fp) (nham, nspam) = rdr.next() *************** *** 215,221 **** word = uunquote(word) ! try: ! wi = bayes.wordinfo[word] ! except KeyError: wi = bayes.WordInfoClass() --- 199,206 ---- word = uunquote(word) ! # Can't use wordinfo[word] here, because wordinfo ! # is only a cache with dbm! Need to use _wordinfoget instead. ! wi = bayes._wordinfoget(word) ! if wi is None: wi = bayes.WordInfoClass() *************** *** 269,274 **** elif opt == '-m': newDBM = False - elif opt == '-v': - options["globals", "verbose"] = True elif opt in ('-o', '--option'): options.set_from_cmdline(arg, sys.stderr) --- 254,257 ---- From anadelonbrin at users.sourceforge.net Wed Nov 17 01:01:23 2004 From: anadelonbrin at users.sourceforge.net (Tony Meyer) Date: Wed Nov 17 01:01:26 2004 Subject: [Spambayes-checkins] spambayes/Outlook2000 addin.py,1.137,1.138 Message-ID: Update of /cvsroot/spambayes/spambayes/Outlook2000 In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv11031/Outlook2000 Modified Files: addin.py Log Message: Fix bug identified by 'DUI-DWI'. Not sure how this got past the testing, but the messageinfo database uses '0' and '1' as keys, not 0 and 1, so showing clues for a trained message would fail. Index: addin.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/Outlook2000/addin.py,v retrieving revision 1.137 retrieving revision 1.138 diff -C2 -d -r1.137 -r1.138 *** addin.py 11 Nov 2004 21:21:55 -0000 1.137 --- addin.py 17 Nov 2004 00:01:06 -0000 1.138 *************** *** 481,485 **** trained_as = mgr.classifier_data.message_db.get(msgstore_message.searchkey) push("This message has %sbeen trained%s." % \ ! {0 : ("", " as ham"), 1 : ("", " as spam"), None : ("not ", "")} [trained_as]) # Format the clues. --- 481,485 ---- trained_as = mgr.classifier_data.message_db.get(msgstore_message.searchkey) push("This message has %sbeen trained%s." % \ ! {'0' : ("", " as ham"), '1' : ("", " as spam"), None : ("not ", "")} [trained_as]) # Format the clues. From anadelonbrin at users.sourceforge.net Mon Nov 22 01:02:46 2004 From: anadelonbrin at users.sourceforge.net (Tony Meyer) Date: Mon Nov 22 01:02:49 2004 Subject: [Spambayes-checkins] spambayes/scripts sb_imapfilter.py,1.43,1.44 Message-ID: Update of /cvsroot/spambayes/spambayes/scripts In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv15368/scripts Modified Files: sb_imapfilter.py Log Message: Fix typo found by Thomas Heller. Switch from using msg.asTokens to msg.tokenize. Index: sb_imapfilter.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/scripts/sb_imapfilter.py,v retrieving revision 1.43 retrieving revision 1.44 diff -C2 -d -r1.43 -r1.44 *** sb_imapfilter.py 9 Nov 2004 02:30:33 -0000 1.43 --- sb_imapfilter.py 22 Nov 2004 00:02:28 -0000 1.44 *************** *** 520,524 **** command = "append %s %s %s %s" % (self.folder.name, flgs, tme, self.as_string) ! raise BadIMAPReponseError(command) if self.previous_folder is None: --- 520,524 ---- command = "append %s %s %s %s" % (self.folder.name, flgs, tme, self.as_string) ! raise BadIMAPResponseError(command) if self.previous_folder is None: *************** *** 710,714 **** continue msg.delSBHeaders() ! classifier.unlearn(msg.asTokens(), not isSpam) # Once the message has been untrained, it's training memory --- 710,714 ---- continue msg.delSBHeaders() ! classifier.unlearn(msg.tokenize(), not isSpam) # Once the message has been untrained, it's training memory *************** *** 723,727 **** saved_headers = msg.currentSBHeaders() msg.delSBHeaders() ! classifier.learn(msg.asTokens(), isSpam) num_trained += 1 msg.RememberTrained(isSpam) --- 723,727 ---- saved_headers = msg.currentSBHeaders() msg.delSBHeaders() ! classifier.learn(msg.tokenize(), isSpam) num_trained += 1 msg.RememberTrained(isSpam) *************** *** 754,758 **** # the errors and move it soon enough. continue ! (prob, clues) = classifier.spamprob(msg.asTokens(), evidence=True) # Add headers and remember classification. --- 754,758 ---- # the errors and move it soon enough. continue ! (prob, clues) = classifier.spamprob(msg.tokenize(), evidence=True) # Add headers and remember classification. From anadelonbrin at users.sourceforge.net Mon Nov 22 01:10:19 2004 From: anadelonbrin at users.sourceforge.net (Tony Meyer) Date: Mon Nov 22 01:10:21 2004 Subject: [Spambayes-checkins] spambayes/contrib README,1.2,1.3 Message-ID: Update of /cvsroot/spambayes/spambayes/contrib In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv17293/contrib Modified Files: README Log Message: Bring a bit more up-to-date. Index: README =================================================================== RCS file: /cvsroot/spambayes/spambayes/contrib/README,v retrieving revision 1.2 retrieving revision 1.3 diff -C2 -d -r1.2 -r1.3 *** README 25 Mar 2004 19:53:15 -0000 1.2 --- README 22 Nov 2004 00:10:15 -0000 1.3 *************** *** 18,30 **** mod_spambayes.py - Plugin for Amit Patel's proxy3 web proxy. - mkzip.py - ??? - spamcounts.py - print spam and ham counts and spam probability for a messages for for select tokens ! sb_bnfilter.py - alternative to sb_filter that avoids re-initialising ! spambayes for consecutive requests using a short-lived server process. ! This is intended to give the performance advantages of sb_xmlrpcserver, ! without the administrative complications. ! sb_bnserver.py - component of sb_bnfilter.py --- 18,29 ---- mod_spambayes.py - Plugin for Amit Patel's proxy3 web proxy. spamcounts.py - print spam and ham counts and spam probability for a messages for for select tokens ! findbest.py - Find the next "best" unsure message to train on. ! pycksum.py - A fuzzy checksum program designed for email messages. ! ! sb_culler.py - Andrew Dalke's POP3 culler. ! ! tte.py - A utility script for 'train to exhaustion'. \ No newline at end of file From anadelonbrin at users.sourceforge.net Mon Nov 22 01:11:55 2004 From: anadelonbrin at users.sourceforge.net (Tony Meyer) Date: Mon Nov 22 01:11:58 2004 Subject: [Spambayes-checkins] spambayes/scripts sb_dbexpimp.py,1.16,1.17 Message-ID: Update of /cvsroot/spambayes/spambayes/scripts In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv17695/scripts Modified Files: sb_dbexpimp.py Log Message: Update docstring. Index: sb_dbexpimp.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/scripts/sb_dbexpimp.py,v retrieving revision 1.16 retrieving revision 1.17 diff -C2 -d -r1.16 -r1.17 *** sb_dbexpimp.py 15 Nov 2004 06:22:05 -0000 1.16 --- sb_dbexpimp.py 22 Nov 2004 00:11:52 -0000 1.17 *************** *** 3,17 **** """sb_dbexpimp.py - Bayes database export/import - Classes: - - - Abstract: - This utility has the primary function of exporting and importing ! a spambayes database into/from a flat file. This is useful in a number of scenarios. ! Platform portability of database - flat files can be exported and ! imported across platforms (winduhs and linux, for example) Database implementation changes - databases can survive database --- 3,12 ---- """sb_dbexpimp.py - Bayes database export/import This utility has the primary function of exporting and importing ! a spambayes database into/from a CSV file. This is useful in a number of scenarios. ! Platform portability of database - CSV files can be exported and ! imported across platforms (Windows and Linux, for example). Database implementation changes - databases can survive database *************** *** 21,25 **** Database reorganization - an export followed by an import reorgs an existing database, improving performance, at least in ! some database implementations Database sharing - it is possible to distribute particular databases --- 16,20 ---- Database reorganization - an export followed by an import reorgs an existing database, improving performance, at least in ! some database implementations. Database sharing - it is possible to distribute particular databases *************** *** 29,43 **** Database merging - multiple databases can be merged into one quite easily by specifying -m on an import. This will add the two database ! nham and nspams together (assuming the two databases do not share ! corpora) and for wordinfo conflicts, will add spamcount and hamcount ! together. ! ! Spambayes software release migration - an export can be executed before ! a release upgrade, as part of the installation script. Then, after the ! new software is installed, an import can be executed, which will ! effectively preserve existing training. This eliminates the need for ! retraining every time a release is installed. ! ! Others? I'm sure I haven't thought of everything... Usage: --- 24,29 ---- Database merging - multiple databases can be merged into one quite easily by specifying -m on an import. This will add the two database ! nham and nspams together and for wordinfo conflicts, will add spamcount ! and hamcount together. Usage: *************** *** 60,66 **** -h : help Examples: ! Export pickled mybayes.db into mybayes.db.export as a csv flat file sb_dbexpimp -e -p mybayes.db -f mybayes.db.export --- 46,56 ---- -h : help + If neither -p nor -d is specified, then the values in your configuration + file (or failing that, the defaults) will be used. In this way, you may + convert to and from storage formats other than pickle and dbm. + Examples: ! Export pickled mybayes.db into mybayes.db.export as a CSV file sb_dbexpimp -e -p mybayes.db -f mybayes.db.export *************** *** 78,88 **** sb_dbexpimp -i -d newbayes.db -f abayes.export sb_dbexpimp -i -m -d newbayes.db -f bbayes.export - - To Do: - o Suggestions? - """ ! # This module is part of the spambayes project, which is Copyright 2002 # The Python Software Foundation and is covered by the Python Software # Foundation license. --- 68,74 ---- sb_dbexpimp -i -d newbayes.db -f abayes.export sb_dbexpimp -i -m -d newbayes.db -f bbayes.export """ ! # This module is part of the spambayes project, which is Copyright 2002-5 # The Python Software Foundation and is covered by the Python Software # Foundation license. *************** *** 225,230 **** - - if __name__ == '__main__': --- 211,214 ---- From anadelonbrin at users.sourceforge.net Mon Nov 22 01:13:46 2004 From: anadelonbrin at users.sourceforge.net (Tony Meyer) Date: Mon Nov 22 01:13:50 2004 Subject: [Spambayes-checkins] spambayes/scripts sb_filter.py,1.14,1.15 Message-ID: Update of /cvsroot/spambayes/spambayes/scripts In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv18240/scripts Modified Files: sb_filter.py Log Message: Remove the "experimental" marking in the docstring for the training functions. Various people have used these for some time, and I can't see anything in the code that is particularly worrying. If I'm wrong and these should still be experiemental with 1.1, please let me know why and I'll try and modify the tests to remove concern. Index: sb_filter.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/scripts/sb_filter.py,v retrieving revision 1.14 retrieving revision 1.15 diff -C2 -d -r1.14 -r1.15 *** sb_filter.py 4 May 2004 13:02:51 -0000 1.14 --- sb_filter.py 22 Nov 2004 00:13:43 -0000 1.15 *************** *** 31,46 **** filter (default if no processing options are given) * -g ! [EXPERIMENTAL] (re)train as a good (ham) message * -s ! [EXPERIMENTAL] (re)train as a bad (spam) message * -t ! [EXPERIMENTAL] filter and train based on the result -- you must make sure to untrain all mistakes later. Not recommended. * -G ! [EXPERIMENTAL] untrain ham (only use if you've already trained ! this message) * -S ! [EXPERIMENTAL] untrain spam (only use if you've already trained ! this message) -o section:option:value --- 31,44 ---- filter (default if no processing options are given) * -g ! (re)train as a good (ham) message * -s ! (re)train as a bad (spam) message * -t ! filter and train based on the result -- you must make sure to untrain all mistakes later. Not recommended. * -G ! untrain ham (only use if you've already trained this message) * -S ! untrain spam (only use if you've already trained this message) -o section:option:value From anadelonbrin at users.sourceforge.net Mon Nov 22 01:16:44 2004 From: anadelonbrin at users.sourceforge.net (Tony Meyer) Date: Mon Nov 22 01:16:47 2004 Subject: [Spambayes-checkins] spambayes/scripts sb_pop3dnd.py,1.12,1.13 Message-ID: Update of /cvsroot/spambayes/spambayes/scripts In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv18873/scripts Modified Files: sb_pop3dnd.py Log Message: Switch from using msg.asTokens to msg.tokenize. Play nicer with win32 gui (preparation for a taskbar app for this script). Don't use the deprecated 'strict' kwarg for email messages. Add appropriate state createworkers function & call. Modify to have the prepare/start/stop API that sb_server has, to make a taskbar app more straightforward. Index: sb_pop3dnd.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/scripts/sb_pop3dnd.py,v retrieving revision 1.12 retrieving revision 1.13 diff -C2 -d -r1.12 -r1.13 *** sb_pop3dnd.py 9 Nov 2004 02:30:33 -0000 1.12 --- sb_pop3dnd.py 22 Nov 2004 00:16:39 -0000 1.13 *************** *** 72,75 **** --- 72,76 ---- import thread import getopt + import socket import imaplib import operator *************** *** 85,89 **** from twisted.internet import defer from twisted.internet import reactor - from twisted.internet import win32eventreactor from twisted.internet.defer import maybeDeferred from twisted.internet.protocol import ServerFactory --- 86,89 ---- *************** *** 95,98 **** --- 95,99 ---- from twisted.protocols.imap4 import IMailboxListener, collapseNestedLists + from spambayes import storage from spambayes import message from spambayes.Stats import Stats *************** *** 227,234 **** def train(self, classifier, isSpam): if self.GetTrained() == (not isSpam): ! classifier.unlearn(self.asTokens(), not isSpam) self.RememberTrained(None) if self.GetTrained() is None: ! classifier.learn(self.asTokens(), isSpam) self.RememberTrained(isSpam) classifier.store() --- 228,235 ---- def train(self, classifier, isSpam): if self.GetTrained() == (not isSpam): ! classifier.unlearn(self.tokenize(), not isSpam) self.RememberTrained(None) if self.GetTrained() is None: ! classifier.learn(self.tokenize(), isSpam) self.RememberTrained(isSpam) classifier.store() *************** *** 319,324 **** if content is None: return IMAPFileMessage(key, directory) ! msg = email.message_from_string(content, _class=IMAPFileMessage, ! strict=False) msg.id = key msg.file_name = key --- 320,324 ---- if content is None: return IMAPFileMessage(key, directory) ! msg = email.message_from_string(content, _class=IMAPFileMessage) msg.id = key msg.file_name = key *************** *** 609,614 **** '%s\r\nSee .\r\n' % (__doc__,) date = imaplib.Time2Internaldate(time.time())[1:-1] ! msg = email.message_from_string(about, _class=IMAPMessage, ! strict=False) msg.date = date self.addMessage(msg) --- 609,613 ---- '%s\r\nSee .\r\n' % (__doc__,) date = imaplib.Time2Internaldate(time.time())[1:-1] ! msg = email.message_from_string(about, _class=IMAPMessage) msg.date = date self.addMessage(msg) *************** *** 618,624 **** self.addMessage(msg) # XXX Add other messages here, for example ! # XXX one with a link to the configuration page ! # XXX (or maybe even the configuration page itself, ! # XXX in html!) def isWriteable(self): --- 617,621 ---- self.addMessage(msg) # XXX Add other messages here, for example ! # XXX help and other documentation. def isWriteable(self): *************** *** 830,834 **** _class=message.SBHeaderMessage) # Now find the spam disposition and add the header. ! (prob, clues) = state.bayes.spamprob(msg.asTokens(),\ evidence=True) --- 827,831 ---- _class=message.SBHeaderMessage) # Now find the spam disposition and add the header. ! (prob, clues) = state.bayes.spamprob(msg.tokenize(),\ evidence=True) *************** *** 908,911 **** --- 905,917 ---- self.activeIMAPSessions = 0 + def createWorkers(self): + """There aren't many workers in an IMAP State - most of the + work is done elsewhere. We do need to load the classifier, + though, and build the status strings.""" + if not hasattr(self, "DBName"): + self.DBName, self.useDB = storage.database_type([]) + self.bayes = storage.open_storage(self.DBName, self.useDB) + self.buildStatusStrings() + def buildServerStrings(self): """After the server details have been set up, this creates string *************** *** 921,925 **** # =================================================================== ! def setup(): # Setup state, server, boxes, trainers and account. state.imap_port = options["imapserver", "port"] --- 927,931 ---- # =================================================================== ! def prepare(): # Setup state, server, boxes, trainers and account. state.imap_port = options["imapserver", "port"] *************** *** 961,965 **** unsure_box) proxyListeners.append(listener) ! state.buildServerStrings() def run(): --- 967,998 ---- unsure_box) proxyListeners.append(listener) ! state.prepare() ! ! def start(): ! assert state.prepared, "Must prepare before starting" ! # The asyncore stuff doesn't play nicely with twisted (or vice-versa), ! # so put them in separate threads. ! thread.start_new_thread(Dibbler.run, ()) ! reactor.run() ! ! def stop(): ! # Save the classifier, although that should not be necessary. ! state.bayes.store() ! # Explicitly closing the db is a good idea, though. ! state.bayes.close() ! ! # Stop the POP3 proxy. ! if state.proxyPorts: ! killer = socket.socket(socket.AF_INET, socket.SOCK_STREAM) ! try: ! killer.connect(('localhost', state.proxyPorts[0][1])) ! killer.send('KILL\r\n') ! killer.close() ! except socket.error: ! # Well, we did our best to shut down gracefully. Warn the user ! # and just die when the thread we are in does. ! print "Could not shut down POP3 proxy gracefully." ! # Stop the IMAP4 server. ! reactor.stop() def run(): *************** *** 986,995 **** # Setup everything. ! setup() - # Kick things off. The asyncore stuff doesn't play nicely - # with twisted (or vice-versa), so put them in separate threads. - thread.start_new_thread(Dibbler.run, ()) - reactor.run() if __name__ == "__main__": --- 1019,1027 ---- # Setup everything. ! prepare() ! ! # Kick things off. ! start() if __name__ == "__main__": From anadelonbrin at users.sourceforge.net Mon Nov 22 01:22:57 2004 From: anadelonbrin at users.sourceforge.net (Tony Meyer) Date: Mon Nov 22 01:23:00 2004 Subject: [Spambayes-checkins] spambayes/spambayes/test .cvsignore, NONE, 1.1 test_message.py, NONE, 1.1 test_sb_filter.py, NONE, 1.1 test_sb_dbexpimp.py, 1.2, 1.3 test_sb_imapfilter.py, 1.5, 1.6 Message-ID: Update of /cvsroot/spambayes/spambayes/spambayes/test In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv20443/spambayes/test Modified Files: test_sb_dbexpimp.py test_sb_imapfilter.py Added Files: .cvsignore test_message.py test_sb_filter.py Log Message: Ignore .pyc and .pyo and the _pop3proxy.log file. Add tests for message.py and sb_filter.py Use email.message_from_string not our versions in imapfilter test. Add shells of SF bugs tests for imapfilter. Fix removing of temp files in test for sb_dbexpimp.py --- NEW FILE: .cvsignore --- *.py[co] _pop3proxy.log --- NEW FILE: test_message.py --- # Test spambayes.message module. import os import sys import math import email import unittest import sb_test_support sb_test_support.fix_sys_path() from spambayes.Options import options from spambayes.tokenizer import tokenize from spambayes.classifier import Classifier from spambayes.message import MessageInfoDB, insert_exception_header from spambayes.message import Message, SBHeaderMessage, MessageInfoPickle # We borrow the test messages that test_sb_server uses. # I doubt it really makes much difference, but if we wanted more than # one message of each type (the tests should all handle this ok) then # Richie's hammer.py script has code for generating any number of # randomly composed email messages. from test_sb_server import good1, spam1 TEMP_PICKLE_NAME = os.path.join(os.path.dirname(__file__), "temp.pik") TEMP_DBM_NAME = os.path.join(os.path.dirname(__file__), "temp.dbm") # The chances of anyone having files with these names in the test # directory is minute, but we don't want to wipe anything, so make # sure that they don't already exist. Our tearDown code gets rid # of our copies (whether the tests pass or fail) so they shouldn't # be ours. for fn in [TEMP_PICKLE_NAME, TEMP_DBM_NAME]: if os.path.exists(fn): print fn, "already exists. Please remove this file before " \ "running these tests (a file by that name will be " \ "created and destroyed as part of the tests)." sys.exit(1) class MessageTest(unittest.TestCase): def setUp(self): self.msg = email.message_from_string(spam1, _class=Message) def test_persistent_state(self): self.assertEqual(self.msg.stored_attributes, ['c', 't']) def test_initialisation(self): self.assertEqual(self.msg.id, None) self.assertEqual(self.msg.c, None) self.assertEqual(self.msg.t, None) def test_setId(self): # Verify that you can't change the id. self.msg.id = "test" self.assertRaises(ValueError, self.msg.setId, "test2") # Verify that you can't set the id to None. self.msg.id = None self.assertRaises(ValueError, self.msg.setId, None) # Verify that id must be a string. self.assertRaises(TypeError, self.msg.setId, 1) self.assertRaises(TypeError, self.msg.setId, False) self.assertRaises(TypeError, self.msg.setId, []) id = "Test" self.msg.setId(id) self.assertEqual(self.msg.id, id) # Check info db load_msg is called. self.msg.id = None saved = self.msg.message_info_db.load_msg self.done = False try: self.msg.message_info_db.load_msg = self._fake_setState self.msg.setId(id) self.assertEqual(self.done, True) finally: self.msg.message_info_db.load_msg = saved def test_getId(self): self.assertEqual(self.msg.getId(), None) id = "test" self.msg.id = id self.assertEqual(self.msg.getId(), id) def test_tokenize(self): toks = self.msg.tokenize() self.assertEqual(tuple(tokenize(spam1)), tuple(toks)) def test_force_CRLF(self): self.assert_('\r' not in good1) lines = self.msg._force_CRLF(good1).split('\n') for line in lines: if line: self.assert_(line.endswith('\r')) def test_as_string_endings(self): self.assert_('\r' not in spam1) lines = self.msg.as_string().split('\n') for line in lines: if line: self.assert_(line.endswith('\r')) def _fake_setState(self, state): self.done = True def test_modified(self): saved = self.msg.message_info_db.store_msg try: self.msg.message_info_db.store_msg = self._fake_setState self.done = False self.msg.modified() self.assertEqual(self.done, False) self.msg.id = "Test" self.msg.modified() self.assertEqual(self.done, True) finally: self.msg.message_info_db.store_msg = saved def test_GetClassification(self): self.msg.c = 's' self.assertEqual(self.msg.GetClassification(), options['Headers','header_spam_string']) self.msg.c = 'h' self.assertEqual(self.msg.GetClassification(), options['Headers','header_ham_string']) self.msg.c = 'u' self.assertEqual(self.msg.GetClassification(), options['Headers','header_unsure_string']) self.msg.c = 'a' self.assertEqual(self.msg.GetClassification(), None) def test_RememberClassification(self): self.msg.RememberClassification(options['Headers', 'header_spam_string']) self.assertEqual(self.msg.c, 's') self.msg.RememberClassification(options['Headers', 'header_ham_string']) self.assertEqual(self.msg.c, 'h') self.msg.RememberClassification(options['Headers', 'header_unsure_string']) self.assertEqual(self.msg.c, 'u') self.assertRaises(ValueError, self.msg.RememberClassification, "a") # Check that self.msg.modified is called. saved = self.msg.modified self.done = False try: self.msg.modified = self._fake_modified self.msg.RememberClassification(options['Headers', 'header_unsure_string']) self.assertEqual(self.done, True) finally: self.msg.modified = saved def _fake_modified(self): self.done = True def test_GetAndRememberTrained(self): t = "test" saved = self.msg.modified self.done = False try: self.msg.modified = self._fake_modified self.msg.RememberTrained(t) self.assertEqual(self.done, True) finally: self.msg.modified = saved self.assertEqual(self.msg.GetTrained(), t) class SBHeaderMessageTest(unittest.TestCase): def setUp(self): self.msg = email.message_from_string(spam1, _class=SBHeaderMessage) # Get a prob and some clues. c = Classifier() self.u_prob, clues = c.spamprob(tokenize(good1), True) c.learn(tokenize(good1), False) self.g_prob, clues = c.spamprob(tokenize(good1), True) c.unlearn(tokenize(good1), False) c.learn(tokenize(spam1), True) self.s_prob, self.clues = c.spamprob(tokenize(spam1), True) self.ham = options['Headers','header_ham_string'] self.spam = options['Headers','header_spam_string'] self.unsure = options['Headers','header_unsure_string'] self.to = "tony.meyer@gmail.com;ta-meyer@ihug.co.nz" self.msg["to"] = self.to def test_setIdFromPayload(self): id = self.msg.setIdFromPayload() self.assertEqual(id, None) self.assertEqual(self.msg.id, None) msgid = "test" msg = "".join((options['Headers','mailid_header_name'], ": ", msgid, "\r\n", good1)) msg = email.message_from_string(msg, _class=SBHeaderMessage) id = msg.setIdFromPayload() self.assertEqual(id, msgid) self.assertEqual(msg.id, msgid) def test_disposition_header_ham(self): name = options['Headers','classification_header_name'] self.msg.addSBHeaders(self.g_prob, self.clues) self.assertEqual(self.msg[name], self.ham) self.assertEqual(self.msg.GetClassification(), self.ham) def test_disposition_header_spam(self): name = options['Headers','classification_header_name'] self.msg.addSBHeaders(self.s_prob, self.clues) self.assertEqual(self.msg[name], self.spam) self.assertEqual(self.msg.GetClassification(), self.spam) def test_disposition_header_unsure(self): name = options['Headers','classification_header_name'] self.msg.addSBHeaders(self.u_prob, self.clues) self.assertEqual(self.msg[name], self.unsure) self.assertEqual(self.msg.GetClassification(), self.unsure) def test_score_header_off(self): options['Headers','include_score'] = False self.msg.addSBHeaders(self.g_prob, self.clues) self.assertEqual(self.msg[options['Headers', 'score_header_name']], None) def test_score_header(self): options['Headers','include_score'] = True options["Headers", "header_score_digits"] = 21 options["Headers", "header_score_logarithm"] = False self.msg.addSBHeaders(self.g_prob, self.clues) self.assertEqual(self.msg[options['Headers', 'score_header_name']], "%.21f" % (self.g_prob,)) def test_score_header_log(self): options['Headers','include_score'] = True options["Headers", "header_score_digits"] = 21 options["Headers", "header_score_logarithm"] = True self.msg.addSBHeaders(self.s_prob, self.clues) self.assert_(self.msg[options['Headers', 'score_header_name']].\ startswith("%.21f" % (self.s_prob,))) self.assert_(self.msg[options['Headers', 'score_header_name']].\ endswith(" (%d)" % (-math.log10(1.0-self.s_prob),))) def test_thermostat_header_off(self): options['Headers','include_thermostat'] = False self.msg.addSBHeaders(self.u_prob, self.clues) self.assertEqual(self.msg[options['Headers', 'thermostat_header_name']], None) def test_thermostat_header_unsure(self): options['Headers','include_thermostat'] = True self.msg.addSBHeaders(self.u_prob, self.clues) self.assertEqual(self.msg[options['Headers', 'thermostat_header_name']], "*****") def test_thermostat_header_spam(self): options['Headers','include_thermostat'] = True self.msg.addSBHeaders(self.s_prob, self.clues) self.assertEqual(self.msg[options['Headers', 'thermostat_header_name']], "*********") def test_thermostat_header_ham(self): options['Headers','include_thermostat'] = True self.msg.addSBHeaders(self.g_prob, self.clues) self.assertEqual(self.msg[options['Headers', 'thermostat_header_name']], "") def test_evidence_header(self): options['Headers', 'include_evidence'] = True options['Headers', 'clue_mailheader_cutoff'] = 0.5 # all self.msg.addSBHeaders(self.g_prob, self.clues) header = self.msg[options['Headers', 'evidence_header_name']] header_clues = [s.split(':') for s in \ [s.strip() for s in header.split(';')]] header_clues = dict([(":".join(clue[:-1])[1:-1], float(clue[-1])) \ for clue in header_clues]) for word, score in self.clues: self.assert_(word in header_clues) self.assertEqual(round(score, 2), header_clues[word]) def test_evidence_header_partial(self): options['Headers', 'include_evidence'] = True options['Headers', 'clue_mailheader_cutoff'] = 0.1 self.msg.addSBHeaders(self.g_prob, self.clues) header = self.msg[options['Headers', 'evidence_header_name']] header_clues = [s.split(':') for s in \ [s.strip() for s in header.split(';')]] header_clues = dict([(":".join(clue[:-1])[1:-1], float(clue[-1])) \ for clue in header_clues]) for word, score in self.clues: if score <= 0.1 or score >= 0.9: self.assert_(word in header_clues) self.assertEqual(round(score, 2), header_clues[word]) else: self.assert_(word not in header_clues) def test_evidence_header_empty(self): options['Headers', 'include_evidence'] = True options['Headers', 'clue_mailheader_cutoff'] = 0.0 self.msg.addSBHeaders(self.g_prob, self.clues) header = self.msg[options['Headers','evidence_header_name']] header_clues = [s.split(':') for s in \ [s.strip() for s in header.split(';')]] header_clues = dict([(":".join(clue[:-1])[1:-1], float(clue[-1])) \ for clue in header_clues]) for word, score in self.clues: if word == "*H*" or word == "*S*": self.assert_(word in header_clues) self.assertEqual(round(score, 2), header_clues[word]) else: self.assert_(word not in header_clues) def test_evidence_header_off(self): options['Headers', 'include_evidence'] = False self.msg.addSBHeaders(self.g_prob, self.clues) self.assertEqual(self.msg[options['Headers', 'evidence_header_name']], None) def test_notate_to_off(self): options["Headers", "notate_to"] = () self.msg.addSBHeaders(self.g_prob, self.clues) self.msg.addSBHeaders(self.u_prob, self.clues) self.msg.addSBHeaders(self.s_prob, self.clues) self.assertEqual(self.msg["To"], self.to) def test_notate_to_ham(self): options["Headers", "notate_to"] = (self.ham,) self.msg.addSBHeaders(self.g_prob, self.clues) disp, orig = self.msg["To"].split(';', 1) self.assertEqual(orig, self.to) self.assertEqual(disp, "%s@spambayes.invalid" % (self.ham,)) def test_notate_to_unsure(self): options["Headers", "notate_to"] = (self.ham, self.unsure) self.msg.addSBHeaders(self.u_prob, self.clues) disp, orig = self.msg["To"].split(';', 1) self.assertEqual(orig, self.to) self.assertEqual(disp, "%s@spambayes.invalid" % (self.unsure,)) def test_notate_to_spam(self): options["Headers", "notate_to"] = (self.ham, self.spam, self.unsure) self.msg.addSBHeaders(self.s_prob, self.clues) disp, orig = self.msg["To"].split(';', 1) self.assertEqual(orig, self.to) self.assertEqual(disp, "%s@spambayes.invalid" % (self.spam,)) def test_notate_subject_off(self): subject = self.msg["Subject"] options["Headers", "notate_subject"] = () self.msg.addSBHeaders(self.g_prob, self.clues) self.msg.addSBHeaders(self.u_prob, self.clues) self.msg.addSBHeaders(self.s_prob, self.clues) self.assertEqual(self.msg["Subject"], subject) def test_notate_subject_ham(self): subject = self.msg["Subject"] options["Headers", "notate_subject"] = (self.ham,) self.msg.addSBHeaders(self.g_prob, self.clues) disp, orig = self.msg["Subject"].split(',', 1) self.assertEqual(orig, subject) self.assertEqual(disp, self.ham) def test_notate_subject_unsure(self): subject = self.msg["Subject"] options["Headers", "notate_subject"] = (self.ham, self.unsure) self.msg.addSBHeaders(self.u_prob, self.clues) disp, orig = self.msg["Subject"].split(',', 1) self.assertEqual(orig, subject) self.assertEqual(disp, self.unsure) def test_notate_subject_spam(self): subject = self.msg["Subject"] options["Headers", "notate_subject"] = (self.ham, self.spam, self.unsure) self.msg.addSBHeaders(self.s_prob, self.clues) disp, orig = self.msg["Subject"].split(',', 1) self.assertEqual(orig, subject) self.assertEqual(disp, self.spam) def test_notate_to_changed(self): saved_ham = options["Headers", "header_ham_string"] notate_to = options.get_option("Headers", "notate_to") saved_to = notate_to.allowed_values try: options["Headers", "header_ham_string"] = "bacon" header_strings = (options["Headers", "header_ham_string"], options["Headers", "header_spam_string"], options["Headers", "header_unsure_string"]) notate_to = options.get_option("Headers", "notate_to") notate_to.allowed_values = header_strings self.ham = options["Headers", "header_ham_string"] result = self.test_notate_to_ham() # Just be sure that it's using the new value. self.assertEqual(self.msg["To"].split(';', 1)[0], "bacon@spambayes.invalid") finally: # If we leave these changed, then lots of other tests will # fail. options["Headers", "header_ham_string"] = saved_ham self.ham = saved_ham notate_to.allowed_values = saved_to return result def test_id_header(self): options['Headers','add_unique_id'] = True id = "test" self.msg.id = id self.msg.addSBHeaders(self.g_prob, self.clues) self.assertEqual(self.msg[options['Headers', 'mailid_header_name']], id) def test_id_header_off(self): options['Headers','add_unique_id'] = False id = "test" self.msg.id = id self.msg.addSBHeaders(self.g_prob, self.clues) self.assertEqual(self.msg[options['Headers', 'mailid_header_name']], None) def test_currentSBHeaders(self): sbheaders = self.msg.currentSBHeaders() self.assertEqual({}, sbheaders) headers = {options['Headers', 'classification_header_name'] : '1', options['Headers', 'mailid_header_name'] : '2', options['Headers', 'classification_header_name'] + "-ID" : '3', options['Headers', 'thermostat_header_name'] : '4', options['Headers', 'evidence_header_name'] : '5', options['Headers', 'score_header_name'] : '6', options['Headers', 'trained_header_name'] : '7', } for name, val in headers.items(): self.msg[name] = val sbheaders = self.msg.currentSBHeaders() self.assertEqual(headers, sbheaders) def test_delSBHeaders(self): headers = (options['Headers', 'classification_header_name'], options['Headers', 'mailid_header_name'], options['Headers', 'classification_header_name'] + "-ID", options['Headers', 'thermostat_header_name'], options['Headers', 'evidence_header_name'], options['Headers', 'score_header_name'], options['Headers', 'trained_header_name'],) for header in headers: self.msg[header] = "test" for header in headers: self.assert_(header in self.msg.keys()) self.msg.delSBHeaders() for header in headers: self.assert_(header not in self.msg.keys()) class MessageInfoBaseTest(unittest.TestCase): def setUp(self, fn=TEMP_PICKLE_NAME): self.db = self.klass(fn, self.mode) def test_mode(self): self.assertEqual(self.mode, self.db.mode) def test_load_msg_missing(self): msg = email.message_from_string(good1, _class=Message) msg.id = "Test" dummy_values = "a", "b" msg.c, msg.t = dummy_values self.db.load_msg(msg) self.assertEqual((msg.c, msg.t), dummy_values) def test_load_msg_compat(self): msg = email.message_from_string(good1, _class=Message) msg.id = "Test" dummy_values = "a", "b" self.db.db[msg.id] = dummy_values self.db.load_msg(msg) self.assertEqual((msg.c, msg.t), dummy_values) def test_load_msg(self): msg = email.message_from_string(good1, _class=Message) msg.id = "Test" dummy_values = [('a', 1), ('b', 2)] self.db.db[msg.id] = dummy_values self.db.load_msg(msg) for att, val in dummy_values: self.assertEqual(getattr(msg, att), val) def test_store_msg(self): msg = email.message_from_string(good1, _class=Message) msg.id = "Test" saved = self.db.store self.done = False try: self.db.store = self._fake_store self.db.store_msg(msg) finally: self.db.store = saved self.assertEqual(self.done, True) correct = [(att, getattr(msg, att)) \ for att in msg.stored_attributes] self.assertEqual(self.db.db[msg.id], correct) def _fake_store(self): self.done = True def test_remove_msg(self): msg = email.message_from_string(good1, _class=Message) msg.id = "Test" self.db.db[msg.id] = "test" saved = self.db.store self.done = False try: self.db.store = self._fake_store self.db.remove_msg(msg) finally: self.db.store = saved self.assertEqual(self.done, True) self.assertRaises(KeyError, self.db.db.__getitem__, msg.id) def test_load(self): # Create a db to try and load. data = {"1" : ('a', 'b', 'c'), "2" : ('d', 'e', 'f'), "3" : "test"} for k, v in data.items(): self.db.db[k] = v self.db.store() fn = self.db.db_name self.db.close() db2 = self.klass(fn, self.mode) try: self.assertEqual(len(db2.db.keys()), len(data.keys())) for k, v in data.items(): self.assertEqual(db2.db[k], v) finally: db2.close() def test_load_new(self): # Load from a non-existing db (i.e. create new). self.assertEqual(self.db.db.keys(), []) class MessageInfoPickleTest(MessageInfoBaseTest): def setUp(self): self.mode = 1 self.klass = MessageInfoPickle MessageInfoBaseTest.setUp(self, TEMP_PICKLE_NAME) def tearDown(self): try: os.remove(TEMP_PICKLE_NAME) except OSError: pass def store(self): if self.db is not None: self.db.sync() class MessageInfoDBTest(MessageInfoBaseTest): def setUp(self): self.mode = 'c' self.klass = MessageInfoDB MessageInfoBaseTest.setUp(self, TEMP_DBM_NAME) def tearDown(self): self.db.close() try: os.remove(TEMP_DBM_NAME) except OSError: pass def store(self): if self.db is not None: self.db.sync() def _fake_close(self): self.done += 1 def test_close(self): saved_db = self.db.db.close saved_dbm = self.db.dbm.close try: self.done = 0 self.db.db.close = self._fake_close self.db.dbm.close = self._fake_close self.db.close() self.assertEqual(self.done, 2) finally: # If we don't put these back (whatever happens), then # the db isn't closed and can't be deleted in tearDown. self.db.db.close = saved_db self.db.dbm.close = saved_dbm class UtilitiesTest(unittest.TestCase): def _verify_details(self, details): loc = details.find(__file__) self.assertNotEqual(loc, -1) loc = details.find("Exception: Test") self.assertNotEqual(loc, -1) def _verify_exception_header(self, msg, details): msg = email.message_from_string(msg) details = "\r\n.".join(details.strip().split('\n')) headerName = 'X-Spambayes-Exception' header = email.Header.Header(details, header_name=headerName) self.assertEqual(msg[headerName].replace('\n', '\r\n'), str(header).replace('\n', '\r\n')) def test_insert_exception_header(self): # Cause an exception to insert. try: raise Exception("Test") except Exception: pass msg, details = insert_exception_header(good1) self._verify_details(details) self._verify_exception_header(msg, details) def test_insert_exception_header_and_id(self): # Cause an exception to insert. try: raise Exception("Test") except Exception: pass id = "Message ID" msg, details = insert_exception_header(good1, id) self._verify_details(details) self._verify_exception_header(msg, details) # Check that ID header is inserted. msg = email.message_from_string(msg) headerName = options["Headers", "mailid_header_name"] header = email.Header.Header(id, header_name=headerName) self.assertEqual(msg[headerName], str(header).replace('\n', '\r\n')) def suite(): suite = unittest.TestSuite() for cls in (MessageTest, SBHeaderMessageTest, MessageInfoPickleTest, MessageInfoDBTest, UtilitiesTest, ): suite.addTest(unittest.makeSuite(cls)) return suite if __name__=='__main__': sb_test_support.unittest_main(argv=sys.argv + ['suite']) --- NEW FILE: test_sb_filter.py --- # Test sb_filter script. import os import sys import email import unittest import sb_test_support sb_test_support.fix_sys_path() from spambayes.Options import options from spambayes.tokenizer import tokenize from spambayes.storage import open_storage import sb_filter # We borrow the test messages that test_sb_server uses. # I doubt it really makes much difference, but if we wanted more than # one message of each type (the tests should all handle this ok) then # Richie's hammer.py script has code for generating any number of # randomly composed email messages. from test_sb_server import good1, spam1 good1 = email.message_from_string(good1) spam1 = email.message_from_string(spam1) TEMP_DBM_NAME = os.path.join(os.path.dirname(__file__), "temp.dbm") # The chances of anyone having a file with this name in the test # directory is minute, but we don't want to wipe anything, so make # sure that it doesn't already exist. Our tearDown code gets rid # of our copy (whether the tests pass or fail) so it shouldn't # be ours. if os.path.exists(TEMP_DBM_NAME): print TEMP_DBM_NAME, "already exists. Please remove this file " \ "before running these tests (a file by that name will be " \ "created and destroyed as part of the tests)." sys.exit(1) class HammieFilterTest(unittest.TestCase): def setUp(self): self.h = sb_filter.HammieFilter() self.h.dbname = TEMP_DBM_NAME self.h.usedb = "dbm" def tearDown(self): if self.h.h: self.h.close() try: os.remove(TEMP_DBM_NAME) except OSError: pass def _fake_store(self): self.done = True def test_open(self): mode = 'c' self.h.open(mode) self.assertEqual(self.h.mode, mode) # Check the underlying classifier exists. self.assert_(self.h.h is not None) # This can also be called when there is an # existing classifier, but we want to change # mode. Verify that we store the old database # first if we were not in readonly mode. self.done = False self.h.h.store = self._fake_store mode = 'r' self.h.open(mode) self.assertEqual(self.h.mode, mode) self.assert_(self.done) def test_close_readonly(self): # Must open with 'c' first, because otherwise it doesn't exist. self.h.open('c') self.h.open('r') self.done = False self.h.h.store = self._fake_store # Verify that the classifier is not stored if we are # in readonly mode. self.h.close() self.assert_(not self.done) self.assertEqual(self.h.h, None) def test_close(self): self.h.open('c') self.done = False self.h.h.store = self._fake_store # Verify that the classifier is stored if we are # not in readonly mode. self.h.close() self.assert_(self.done) self.assertEqual(self.h.h, None) def test_newdb(self): # Create an existing classifier. b = open_storage(TEMP_DBM_NAME, "dbm") b.learn(tokenize(spam1), True) b.learn(tokenize(good1), False) b.store() b.close() # Create the fresh classifier. self.h.newdb() # Verify that the classifier isn't open. self.assertEqual(self.h.h, None) # Verify that any existing classifier with the same name # is overwritten. b = open_storage(TEMP_DBM_NAME, "dbm") self.assertEqual(b.nham, 0) self.assertEqual(b.nspam, 0) b.close() def test_filter(self): # Verify that the msg has the classification header added. self.h.open('c') self.h.h.bayes.learn(tokenize(good1), False) self.h.h.bayes.learn(tokenize(spam1), True) self.h.h.store() result = email.message_from_string(self.h.filter(spam1)) self.assert_(result[options["Headers", "classification_header_name"]].\ startswith(options["Headers", "header_spam_string"])) result = email.message_from_string(self.h.filter(good1)) self.assert_(result[options["Headers", "classification_header_name"]].\ startswith(options["Headers", "header_ham_string"])) def test_filter_train(self): # Verify that the msg has the classification header # added, and that it was correctly trained. self.h.open('c') self.h.h.bayes.learn(tokenize(good1), False) self.h.h.bayes.learn(tokenize(spam1), True) self.h.h.store() result = email.message_from_string(self.h.filter_train(spam1)) self.assert_(result[options["Headers", "classification_header_name"]].\ startswith(options["Headers", "header_spam_string"])) self.assertEqual(self.h.h.bayes.nspam, 2) result = email.message_from_string(self.h.filter_train(good1)) self.assert_(result[options["Headers", "classification_header_name"]].\ startswith(options["Headers", "header_ham_string"])) self.assertEqual(self.h.h.bayes.nham, 2) def test_train_ham(self): # Verify that the classifier gets trained with the message. self.h.open('c') self.h.train_ham(good1) self.assertEqual(self.h.h.bayes.nham, 1) self.assertEqual(self.h.h.bayes.nspam, 0) for token in tokenize(good1): wi = self.h.h.bayes._wordinfoget(token) self.assertEqual(wi.hamcount, 1) self.assertEqual(wi.spamcount, 0) def test_train_spam(self): # Verify that the classifier gets trained with the message. self.h.open('c') self.h.train_spam(spam1) self.assertEqual(self.h.h.bayes.nham, 0) self.assertEqual(self.h.h.bayes.nspam, 1) for token in tokenize(spam1): wi = self.h.h.bayes._wordinfoget(token) self.assertEqual(wi.hamcount, 0) self.assertEqual(wi.spamcount, 1) def test_untrain_ham(self): self.h.open('c') # Put a message in the classifier to be removed. self.h.h.bayes.learn(tokenize(good1), False) # Verify that the classifier gets untrained with the message. self.h.untrain_ham(good1) self.assertEqual(self.h.h.bayes.nham, 0) self.assertEqual(self.h.h.bayes.nspam, 0) for token in tokenize(spam1): wi = self.h.h.bayes._wordinfoget(token) self.assertEqual(wi, None) def test_untrain_spam(self): self.h.open('c') # Put a message in the classifier to be removed. self.h.h.bayes.learn(tokenize(spam1), True) # Verify that the classifier gets untrained with the message. self.h.untrain_spam(spam1) self.assertEqual(self.h.h.bayes.nham, 0) self.assertEqual(self.h.h.bayes.nspam, 0) for token in tokenize(spam1): wi = self.h.h.bayes._wordinfoget(token) self.assertEqual(wi, None) def suite(): suite = unittest.TestSuite() for cls in (HammieFilterTest, ): suite.addTest(unittest.makeSuite(cls)) return suite if __name__=='__main__': sb_test_support.unittest_main(argv=sys.argv + ['suite']) Index: test_sb_dbexpimp.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/spambayes/test/test_sb_dbexpimp.py,v retrieving revision 1.2 retrieving revision 1.3 diff -C2 -d -r1.2 -r1.3 *** test_sb_dbexpimp.py 15 Nov 2004 06:19:14 -0000 1.2 --- test_sb_dbexpimp.py 22 Nov 2004 00:22:54 -0000 1.3 *************** *** 40,44 **** --- 40,50 ---- try: os.remove(TEMP_PICKLE_NAME) + except OSError: + pass + try: os.remove(TEMP_CSV_NAME) + except OSError: + pass + try: os.remove(TEMP_DBM_NAME) except OSError: Index: test_sb_imapfilter.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/spambayes/test/test_sb_imapfilter.py,v retrieving revision 1.5 retrieving revision 1.6 diff -C2 -d -r1.5 -r1.6 *** test_sb_imapfilter.py 5 Nov 2004 02:36:25 -0000 1.5 --- test_sb_imapfilter.py 22 Nov 2004 00:22:55 -0000 1.6 *************** *** 3,6 **** --- 3,7 ---- import sys import time + import email import types import socket *************** *** 13,21 **** sb_test_support.fix_sys_path() from spambayes import Dibbler from spambayes.Options import options from spambayes.classifier import Classifier from sb_imapfilter import BadIMAPResponseError - from spambayes.message import message_from_string from sb_imapfilter import IMAPSession, IMAPMessage, IMAPFolder, IMAPFilter --- 14,22 ---- sb_test_support.fix_sys_path() + from spambayes import message from spambayes import Dibbler from spambayes.Options import options from spambayes.classifier import Classifier from sb_imapfilter import BadIMAPResponseError from sb_imapfilter import IMAPSession, IMAPMessage, IMAPFolder, IMAPFilter *************** *** 543,547 **** for msg in self.folder: msg = msg.get_full_message() ! msg_correct = message_from_string(IMAP_MESSAGES[int(keys[0])]) id_header_name = options["Headers", "mailid_header_name"] if msg_correct[id_header_name] is None: --- 544,549 ---- for msg in self.folder: msg = msg.get_full_message() ! msg_correct = email.message_from_string(IMAP_MESSAGES[int(keys[0])], ! _class=message.Message) id_header_name = options["Headers", "mailid_header_name"] if msg_correct[id_header_name] is None: *************** *** 563,567 **** self.assertEqual(msg1.id, SB_ID_1) msg1 = msg1.get_full_message() ! msg1_correct = message_from_string(IMAP_MESSAGES[101]) self.assertNotEqual(msg1[id_header_name], None) msg1_correct[id_header_name] = SB_ID_1 --- 565,570 ---- self.assertEqual(msg1.id, SB_ID_1) msg1 = msg1.get_full_message() ! msg1_correct = email.message_from_string(IMAP_MESSAGES[101], ! message.Message) self.assertNotEqual(msg1[id_header_name], None) msg1_correct[id_header_name] = SB_ID_1 *************** *** 584,588 **** msg3 = self.folder[104] self.assertNotEqual(msg3[id_header_name], None) ! msg_correct = message_from_string(IMAP_MESSAGES[104]) msg_correct[id_header_name] = msg3.id self.assertEqual(msg3.as_string(), msg_correct.as_string()) --- 587,592 ---- msg3 = self.folder[104] self.assertNotEqual(msg3[id_header_name], None) ! msg_correct = email.message_from_string(IMAP_MESSAGES[104], ! message.Message) msg_correct[id_header_name] = msg3.id self.assertEqual(msg3.as_string(), msg_correct.as_string()) *************** *** 628,631 **** --- 632,667 ---- + class SFBugsTest(BaseIMAPFilterTest): + def test_802545(self): + # Test that the filter selects each folder before expunging, + # and that it was logged in in the first place. + pass + + def test_816400(self): + # Test that bad dates don't cause an error in appending. + # (also sf #890645) + # e.g. 31-Dec-1969 16:00:18 +0100 + # Date: Mon, 06 May 0102 10:51:16 -0100 + # Date: Sat, 08 Jun 0102 19:44:54 -0700 + # Date: 16 Mar 80 8:16:44 AM + pass + + def test_818552(self): + # Test that, when saving, we remove the RECENT flag including + # the space after it. + pass + + def test_842984(self): + # Confirm that if webbrowser.open_new() fails, we print a + # message saying "Please point your web browser at + # http://localhost:8880/" rather than bombing out. + pass + + def test_886133(self): + # Check that folder names with characters not allowed in XML + # are correctly handled for the web interface. + pass + + def suite(): suite = unittest.TestSuite() *************** *** 634,637 **** --- 670,674 ---- IMAPFolderTest, IMAPFilterTest, + SFBugsTest, ): suite.addTest(unittest.makeSuite(cls)) From anadelonbrin at users.sourceforge.net Mon Nov 22 01:26:47 2004 From: anadelonbrin at users.sourceforge.net (Tony Meyer) Date: Mon Nov 22 01:26:50 2004 Subject: [Spambayes-checkins] spambayes/spambayes Options.py, 1.117, 1.118 storage.py, 1.43, 1.44 Message-ID: Update of /cvsroot/spambayes/spambayes/spambayes In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv21225/spambayes Modified Files: Options.py storage.py Log Message: Add new storage types: CBDClassifier ZODBClassifier ZEOClassifier ZODB and ZEO need ZODB installed, obviously. ZODB seems to work, but I'm only 50% sure that ZEO is working correctly. I'll keep working on this as I can. Add code to allow persistent_storage_name to not be expanded into an absolute path with certain storage types (e.g. the SQL ones). Index: Options.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/spambayes/Options.py,v retrieving revision 1.117 retrieving revision 1.118 diff -C2 -d -r1.117 -r1.118 *** Options.py 9 Nov 2004 02:37:41 -0000 1.117 --- Options.py 22 Nov 2004 00:26:44 -0000 1.118 *************** *** 518,522 **** with the default.""", # True == "dbm", False == "pickle", "True" == "dbm", "False" == "pickle" ! ("mysql", "pgsql", "dbm", "pickle", "True", "False", True, False), RESTORE), ("persistent_storage_file", "Storage file name", "hammie.db", --- 518,522 ---- with the default.""", # True == "dbm", False == "pickle", "True" == "dbm", "False" == "pickle" ! ("zeo", "zodb", "cdb", "mysql", "pgsql", "dbm", "pickle", "True", "False", True, False), RESTORE), ("persistent_storage_file", "Storage file name", "hammie.db", Index: storage.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/spambayes/storage.py,v retrieving revision 1.43 retrieving revision 1.44 diff -C2 -d -r1.43 -r1.44 *** storage.py 28 Oct 2004 05:11:19 -0000 1.43 --- storage.py 22 Nov 2004 00:26:44 -0000 1.44 *************** *** 8,11 **** --- 8,14 ---- PGClassifier - Classifier that uses postgres mySQLClassifier - Classifier that uses mySQL + CBDClassifier - Classifier that uses CDB + ZODBClassifier - Classifier that uses ZODB + ZEOClassifier - Classifier that uses ZEO Trainer - Classifier training observer SpamTrainer - Trainer for spam *************** *** 36,40 **** To Do: - o ZODBClassifier o Would Trainer.trainall really want to train with the whole corpus, or just a random subset? --- 39,42 ---- *************** *** 43,47 **** ''' ! # This module is part of the spambayes project, which is Copyright 2002 # The Python Software Foundation and is covered by the Python Software # Foundation license. --- 45,49 ---- ''' ! # This module is part of the spambayes project, which is Copyright 2002-5 # The Python Software Foundation and is covered by the Python Software # Foundation license. *************** *** 71,74 **** --- 73,77 ---- import errno import shelve + from spambayes import cdb from spambayes import dbmstorage *************** *** 147,151 **** except IOError, e: if options["globals", "verbose"]: ! print 'Failed update: ' + str(e) if fp is not None: os.remove(tmp) --- 150,154 ---- except IOError, e: if options["globals", "verbose"]: ! print >> sys.stderr, 'Failed update: ' + str(e) if fp is not None: os.remove(tmp) *************** *** 595,598 **** --- 598,761 ---- + class CDBClassifier(classifier.Classifier): + """A classifier that uses a CDB database. + + A CDB wordinfo database is quite small and fast but is slow to update. + It is appropriate if training is done rarely (e.g. monthly or weekly + using archived ham and spam). + """ + def __init__(self, db_name): + classifier.Classifier.__init__(self) + self.db_name = db_name + self.statekey = STATE_KEY + self.load() + + def _WordInfoFactory(self, counts): + # For whatever reason, WordInfo's cannot be created with + # constructor ham/spam counts, so we do the work here. + # Since we're doing the work, we accept the ham/spam count + # in the form of a comma-delimited string, as that's what + # we get. + ham, spam = counts.split(',') + wi = classifier.WordInfo() + wi.hamcount = int(ham) + wi.spamcount = int(spam) + return wi + + def load(self): + if os.path.exists(self.db_name): + db = open(self.db_name, "rb") + data = dict(cdb.Cdb(db)) + db.close() + self.nham, self.nspam = [int(i) for i in \ + data[self.statekey].split(',')] + self.wordinfo = dict([(k, self._WordInfoFactory(v)) \ + for k, v in data.iteritems() \ + if k != self.statekey]) + if options["globals", "verbose"]: + print >> sys.stderr, ('%s is an existing CDB,' + ' with %d ham and %d spam') \ + % (self.db_name, self.nham, + self.nspam) + else: + if options["globals", "verbose"]: + print >> sys.stderr, self.db_name, 'is a new CDB' + self.wordinfo = {} + self.nham = 0 + self.nspam = 0 + + def store(self): + items = [(self.statekey, "%d,%d" % (self.nham, self.nspam))] + for word, wi in self.wordinfo.iteritems(): + items.append((word, "%d,%d" % (wi.hamcount, wi.spamcount))) + db = open(self.db_name, "wb") + cdb.cdb_make(db, items) + db.close() + + def close(self): + # We keep no resources open - nothing to do. + pass + + + # If ZODB isn't available, then this class won't be useable, but we + # still need to be able to import this module. So we pretend that all + # is ok. + try: + Persistent + except NameError: + Persistent = object + class _PersistentClassifier(classifier.Classifier, Persistent): + def __init__(self): + import ZODB + from BTrees.OOBTree import OOBTree + + classifier.Classifier.__init__(self) + self.wordinfo = OOBTree() + + class ZODBClassifier(object): + def __init__(self, db_name): + self.statekey = STATE_KEY + self.db_name = db_name + self.load() + + def __getattr__(self, att): + # We pretend that we are a classifier subclass. + if hasattr(self.classifier, att): + return getattr(self.classifier, att) + raise AttributeError("ZODBClassifier object has no attribute '%s'" + % (att,)) + + def __setattr__(self, att, value): + # For some attributes, we change the classifier instead. + if att in ["nham", "nspam"]: + setattr(self.classifier, att, value) + else: + object.__setattr__(self, att, value) + + def create_storage(self): + import ZODB + from ZODB.FileStorage import FileStorage + self.storage = FileStorage(self.db_name) + + def load(self): + import ZODB + self.create_storage() + self.db = ZODB.DB(self.storage) + root = self.db.open().root() + self.classifier = root.get(self.db_name) + if self.classifier is None: + # There is no classifier, so create one. + if options["globals", "verbose"]: + print >> sys.stderr, self.db_name, 'is a new ZODB' + self.classifier = root[self.db_name] = _PersistentClassifier() + get_transaction().commit() + else: + # It seems to me that the persistent classifier should store + # the nham and nspam values, but that doesn't appear to be the + # case, so work around that. This can be removed once I figure + # out the problem. + self.nham, self.nspam = self.classifier.wordinfo[self.statekey] + if options["globals", "verbose"]: + print >> sys.stderr, '%s is an existing ZODB, with %d ' \ + 'ham and %d spam' % (self.db_name, self.nham, + self.nspam) + + def store(self): + # It seems to me that the persistent classifier should store + # the nham and nspam values, but that doesn't appear to be the + # case, so work around that. This can be removed once I figure + # out the problem. + self.classifier.wordinfo[self.statekey] = (self.nham, self.nspam) + get_transaction().commit() + + def close(self): + self.db.close() + self.storage.close() + + + class ZEOClassifier(ZODBClassifier): + def __init__(self, data_source_name): + source_info = data_source_name.split() + self.host = "localhost" + self.port = None + db_name = "SpamBayes" + for info in source_info: + if info.startswith("host"): + self.host = info[5:] + elif info.startswith("port"): + self.port = int(info[5:]) + elif info.startswith("dbname"): + db_name = info[7:] + ZODBClassifier.__init__(self, db_name) + + def create_storage(self): + from ZEO.ClientStorage import ClientStorage + if self.port: + addr = self.host, self.port + else: + addr = self.host + self.storage = ClientStorage(addr) + + # Flags that the Trainer will recognise. These should be or'able integer # values (i.e. 1, 2, 4, 8, etc.). *************** *** 683,692 **** return "Only one type of database can be specified" ! # values are classifier class and True if it accepts a mode ! # arg, False otherwise ! _storage_types = {"dbm" : (DBDictClassifier, True), ! "pickle" : (PickledClassifier, False), ! "pgsql" : (PGClassifier, False), ! "mysql" : (mySQLClassifier, False), } --- 846,858 ---- return "Only one type of database can be specified" ! # values are classifier class, True if it accepts a mode ! # arg, and True if the argument is a pathname ! _storage_types = {"dbm" : (DBDictClassifier, True, True), ! "pickle" : (PickledClassifier, False, True), ! "pgsql" : (PGClassifier, False, False), ! "mysql" : (mySQLClassifier, False, False), ! "cdb" : (CDBClassifier, False, True), ! "zodb" : (ZODBClassifier, False, True), ! "zeo" : (ZEOClassifier, False, False), } *************** *** 696,705 **** By centralizing this code here, all the applications will behave the same given the same options. - - db_type must be one of the following strings: - dbm, pickle, pgsql, mysql """ try: ! klass, supports_mode = _storage_types[db_type] except KeyError: raise NoSuchClassifierError(db_type) --- 862,868 ---- By centralizing this code here, all the applications will behave the same given the same options. """ try: ! klass, supports_mode, unused = _storage_types[db_type] except KeyError: raise NoSuchClassifierError(db_type) *************** *** 727,731 **** } ! def database_type(opts): """Return the name of the database and the type to use. The output of this function can be used as the db_type parameter for the open_storage --- 890,895 ---- } ! def database_type(opts, default_type=("Storage", "persistent_use_database"), ! default_name=("Storage", "persistent_storage_file")): """Return the name of the database and the type to use. The output of this function can be used as the db_type parameter for the open_storage *************** *** 752,761 **** raise MutuallyExclusiveError() if nm is None and typ is None: ! typ = options["Storage", "persistent_use_database"] if typ is True or typ == "True": typ = "dbm" elif typ is False or typ == "False": typ = "pickle" ! nm = get_pathname_option("Storage", "persistent_storage_file") return nm, typ --- 916,933 ---- raise MutuallyExclusiveError() if nm is None and typ is None: ! typ = options[default_type] ! # Backwards compatibility crud. if typ is True or typ == "True": typ = "dbm" elif typ is False or typ == "False": typ = "pickle" ! try: ! unused, unused, is_path = _storage_types[typ] ! except KeyError: ! raise NoSuchClassifierError(db_type) ! if is_path: ! nm = get_pathname_option(*default_name) ! else: ! nm = options[default_name] return nm, typ From anadelonbrin at users.sourceforge.net Mon Nov 22 01:27:55 2004 From: anadelonbrin at users.sourceforge.net (Tony Meyer) Date: Mon Nov 22 01:27:57 2004 Subject: [Spambayes-checkins] spambayes/spambayes smtpproxy.py,1.8,1.9 Message-ID: Update of /cvsroot/spambayes/spambayes/spambayes In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv21577/spambayes Modified Files: smtpproxy.py Log Message: Switch from using msg.asTokens to msg.tokenize. Index: smtpproxy.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/spambayes/smtpproxy.py,v retrieving revision 1.8 retrieving revision 1.9 diff -C2 -d -r1.8 -r1.9 *** smtpproxy.py 9 Nov 2004 02:30:33 -0000 1.8 --- smtpproxy.py 22 Nov 2004 00:27:52 -0000 1.9 *************** *** 447,457 **** # mean that we didn't need to store the id with the message) # but that might be a little unreliable. ! self.classifier.learn(msg.asTokens(), isSpam) else: if msg.GetTrained() == (not isSpam): ! self.classifier.unlearn(msg.asTokens(), not isSpam) msg.RememberTrained(None) if msg.GetTrained() is None: ! self.classifier.learn(msg.asTokens(), isSpam) msg.RememberTrained(isSpam) --- 447,457 ---- # mean that we didn't need to store the id with the message) # but that might be a little unreliable. ! self.classifier.learn(msg.tokenize(), isSpam) else: if msg.GetTrained() == (not isSpam): ! self.classifier.unlearn(msg.tokenize(), not isSpam) msg.RememberTrained(None) if msg.GetTrained() is None: ! self.classifier.learn(msg.tokenize(), isSpam) msg.RememberTrained(isSpam) *************** *** 491,500 **** msg.get_substance() msg.delSBHeaders() ! self.classifier.unlearn(msg.asTokens(), not isSpam) msg.RememberTrained(None) if msg.GetTrained() is None: msg.get_substance() msg.delSBHeaders() ! self.classifier.learn(msg.asTokens(), isSpam) msg.RememberTrained(isSpam) self.classifier.store() --- 491,500 ---- msg.get_substance() msg.delSBHeaders() ! self.classifier.unlearn(msg.tokenize(), not isSpam) msg.RememberTrained(None) if msg.GetTrained() is None: msg.get_substance() msg.delSBHeaders() ! self.classifier.learn(msg.tokenize(), isSpam) msg.RememberTrained(isSpam) self.classifier.store() From anadelonbrin at users.sourceforge.net Tue Nov 23 00:34:48 2004 From: anadelonbrin at users.sourceforge.net (Tony Meyer) Date: Tue Nov 23 00:34:51 2004 Subject: [Spambayes-checkins] spambayes/spambayes Stats.py, 1.8, 1.9 message.py, 1.57, 1.58 Message-ID: Update of /cvsroot/spambayes/spambayes/spambayes In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv26368/spambayes Modified Files: Stats.py message.py Log Message: Tidy up docstring. Change MessageInfoBase's methods so that recording & retrieving a message are not private methods and are more clearly named. Change so that the messageinfodb doesn't get created/opened on import, but rather through utility functions like those in spambayes.storage. Change Stats.py to use the new methods rather than the old global. Remove the asTokens function in favour of the existing tokenize function. Fix the include_evidence header to check for *H* and *S* explicitly rather than any token starting with *. Index: Stats.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/spambayes/Stats.py,v retrieving revision 1.8 retrieving revision 1.9 diff -C2 -d -r1.8 -r1.9 *** Stats.py 5 Nov 2004 03:03:00 -0000 1.8 --- Stats.py 22 Nov 2004 23:34:43 -0000 1.9 *************** *** 40,44 **** import types ! from spambayes.message import msginfoDB class Stats(object): --- 40,44 ---- import types ! from spambayes.message import database_type, open_storage class Stats(object): *************** *** 64,67 **** --- 64,69 ---- def CalculateStats(self): self.Reset() + nm, typ = database_type() + msginfoDB = open_storage(nm, typ) for msg in msginfoDB.db.keys(): self.total += 1 Index: message.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/spambayes/message.py,v retrieving revision 1.57 retrieving revision 1.58 diff -C2 -d -r1.57 -r1.58 *** message.py 9 Nov 2004 02:30:33 -0000 1.57 --- message.py 22 Nov 2004 23:34:43 -0000 1.58 *************** *** 11,29 **** MessageInfoDB is a simple shelve persistency class for the persistent ! state of a Message obect. For the moment, the db name is hard-coded, ! but we'll have to do this a different way. Mark Hammond's idea is to ! have a master database, that simply keeps track of the names and instances ! of other databases, such as the wordinfo and msginfo databases. The ! MessageInfoDB currently does not provide iterators, but should at some ! point. This would allow us to, for example, see how many messages ! have been trained differently than their classification, for fp/fn ! assessment purposes. Message is an extension of the email package Message class, to include persistent message information. The persistent state ! -currently- consists of the message id, its current ! classification, and its current training. The payload is not ! persisted. Payload persistence is left to whatever mail client ! software is being used. SBHeaderMessage extends Message to include spambayes header specific --- 11,23 ---- MessageInfoDB is a simple shelve persistency class for the persistent ! state of a Message obect. The MessageInfoDB currently does not provide ! iterators, but should at some point. This would allow us to, for ! example, see how many messages have been trained differently than their ! classification, for fp/fn assessment purposes. Message is an extension of the email package Message class, to include persistent message information. The persistent state ! currently consists of the message id, its current classification, and ! its current training. The payload is not persisted. SBHeaderMessage extends Message to include spambayes header specific *************** *** 33,38 **** A typical classification usage pattern would be something like: ! >>> msg = spambayes.message.SBHeaderMessage() ! >>> msg.setPayload(substance) # substance comes from somewhere else >>> id = msg.setIdFromPayload() --- 27,33 ---- A typical classification usage pattern would be something like: ! >>> import email ! >>> # substance comes from somewhere else ! >>> msg = email.message_from_string(substance, _class=SBHeaderMessage) >>> id = msg.setIdFromPayload() *************** *** 50,55 **** A typical usage pattern to train as spam would be something like: ! >>> msg = spambayes.message.SBHeaderMessage() ! >>> msg.setPayload(substance) # substance comes from somewhere else >>> id = msg.setId(msgid) # id is a fname, outlook msg id, something... --- 45,51 ---- A typical usage pattern to train as spam would be something like: ! >>> import email ! >>> # substance comes from somewhere else ! >>> msg = email.message_from_string(substance, _class=SBHeaderMessage) >>> id = msg.setId(msgid) # id is a fname, outlook msg id, something... *************** *** 64,75 **** To Do: - o Master DB module, or at least make the msginfodb name an options parm - o Figure out how to safely add message id to body (or if it can be done - at all...) o Suggestions? ! """ ! ! # This module is part of the spambayes project, which is Copyright 2002-3 # The Python Software Foundation and is covered by the Python Software # Foundation license. --- 60,67 ---- To Do: o Suggestions? + """ ! # This module is part of the spambayes project, which is Copyright 2002-5 # The Python Software Foundation and is covered by the Python Software # Foundation license. *************** *** 102,105 **** --- 94,98 ---- import email.Header + from spambayes import storage from spambayes import dbmstorage from spambayes.Options import options, get_pathname_option *************** *** 117,121 **** self.db_name = db_name ! def _getState(self, msg): if self.db is not None: try: --- 110,114 ---- self.db_name = db_name ! def load_msg(self, msg): if self.db is not None: try: *************** *** 132,136 **** setattr(msg, att, val) ! def _setState(self, msg): if self.db is not None: attributes = [] --- 125,129 ---- setattr(msg, att, val) ! def store_msg(self, msg): if self.db is not None: attributes = [] *************** *** 140,144 **** self.store() ! def _delState(self, msg): if self.db is not None: del self.db[msg.getId()] --- 133,137 ---- self.store() ! def remove_msg(self, msg): if self.db is not None: del self.db[msg.getId()] *************** *** 205,228 **** self.db.sync() ! # This should come from a Mark Hammond idea of a master db ! # For the moment, we get the name of another file from the options, ! # so that these files don't litter lots of working directories. ! # Once there is a master db, this option can be removed. ! message_info_db_name = get_pathname_option("Storage", "messageinfo_storage_file") ! if options["Storage", "persistent_use_database"] is True or \ ! options["Storage", "persistent_use_database"] == "dbm": ! msginfoDB = MessageInfoDB(message_info_db_name) ! elif options["Storage", "persistent_use_database"] is False or \ ! options["Storage", "persistent_use_database"] == "pickle": ! msginfoDB = MessageInfoPickle(message_info_db_name) ! else: ! # Ah - now, what? Maybe the user has mysql or pgsql or zeo, ! # or some other newfangled thing! We don't know what to do ! # in that case, so just use a pickle, since it's the safest ! # option. ! msginfoDB = MessageInfoPickle(message_info_db_name) class Message(email.Message.Message): ! '''An email.Message.Message extended for Spambayes''' def __init__(self): --- 198,236 ---- self.db.sync() ! # values are classifier class, True if it accepts a mode ! # arg, and True if the argument is a pathname ! _storage_types = {"dbm" : (MessageInfoDB, True, True), ! "pickle" : (MessageInfoPickle, False, True), ! ## "pgsql" : (MessageInfoPG, False, False), ! ## "mysql" : (MessageInfoMySQL, False, False), ! ## "cdb" : (MessageInfoCDB, False, True), ! ## "zodb" : (MessageInfoZODB, False, True), ! ## "zeo" : (MessageInfoZEO, False, False), ! } ! ! def open_storage(data_source_name, db_type="dbm", mode=None): ! """Return a storage object appropriate to the given parameters.""" ! try: ! klass, supports_mode, unused = _storage_types[db_type] ! except KeyError: ! raise storage.NoSuchClassifierError(db_type) ! if supports_mode and mode is not None: ! return klass(data_source_name, mode) ! else: ! return klass(data_source_name) ! ! def database_type(): ! dn = ("Storage", "messageinfo_storage_file") ! # The storage options here may lag behind those in storage.py, ! # so we try and be more robust. If we can't use the same storage ! # method, then we fall back to pickle. ! nm, typ = storage.database_type((), default_name=dn) ! if typ not in _storage_types.keys(): ! typ = "pickle" ! return nm, typ ! class Message(email.Message.Message): ! '''An email.Message.Message extended for SpamBayes''' def __init__(self): *************** *** 230,233 **** --- 238,243 ---- # persistent state + nm, typ = database_type() + self.message_info_db = open_storage(nm, typ) self.stored_attributes = ['c', 't',] self.id = None *************** *** 271,284 **** self.id = id ! msginfoDB._getState(self) def getId(self): return self.id - def asTokens(self): - return tokenize(self) - def tokenize(self): ! return self.asTokens() def _force_CRLF(self, data): --- 281,291 ---- self.id = id ! self.message_info_db.load_msg(self) def getId(self): return self.id def tokenize(self): ! return tokenize(self) def _force_CRLF(self, data): *************** *** 303,307 **** def modified(self): if self.id: # only persist if key is present ! msginfoDB._setState(self) def GetClassification(self): --- 310,314 ---- def modified(self): if self.id: # only persist if key is present ! self.message_info_db.store_msg(self) def GetClassification(self): *************** *** 348,356 **** class SBHeaderMessage(Message): ! '''Message class that is cognizant of Spambayes headers. ! Adds routines to add/remove headers for Spambayes''' ! ! def __init__(self): ! Message.__init__(self) def setIdFromPayload(self): --- 355,360 ---- class SBHeaderMessage(Message): ! '''Message class that is cognizant of SpamBayes headers. ! Adds routines to add/remove headers for SpamBayes''' def setIdFromPayload(self): *************** *** 396,400 **** evd = [] for word, score in clues: ! if (word[0] == '*' or score <= hco or score >= sco): if isinstance(word, types.UnicodeType): word = email.Header.Header(word, --- 400,405 ---- evd = [] for word, score in clues: ! if (word == '*H*' or word == '*S*' \ ! or score <= hco or score >= sco): if isinstance(word, types.UnicodeType): word = email.Header.Header(word, From anadelonbrin at users.sourceforge.net Tue Nov 23 00:37:15 2004 From: anadelonbrin at users.sourceforge.net (Tony Meyer) Date: Tue Nov 23 00:37:19 2004 Subject: [Spambayes-checkins] spambayes/spambayes/test test_storage.py, 1.5, 1.5.4.1 Message-ID: Update of /cvsroot/spambayes/spambayes/spambayes/test In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv27019/spambayes/test Modified Files: Tag: release_1_0-branch test_storage.py Log Message: Backport fix for test_storage (True->"dbm") Index: test_storage.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/spambayes/test/test_storage.py,v retrieving revision 1.5 retrieving revision 1.5.4.1 diff -C2 -d -r1.5 -r1.5.4.1 *** test_storage.py 24 Dec 2003 17:16:38 -0000 1.5 --- test_storage.py 22 Nov 2004 23:37:12 -0000 1.5.4.1 *************** *** 152,156 **** try: try: ! open_storage(db_name, True) except SystemExit: pass --- 152,156 ---- try: try: ! open_storage(db_name, "dbm") except SystemExit: pass From anadelonbrin at users.sourceforge.net Tue Nov 23 00:38:37 2004 From: anadelonbrin at users.sourceforge.net (Tony Meyer) Date: Tue Nov 23 00:38:40 2004 Subject: [Spambayes-checkins] spambayes/spambayes message.py, 1.49.4.4, 1.49.4.5 Message-ID: Update of /cvsroot/spambayes/spambayes/spambayes In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv27234/spambayes Modified Files: Tag: release_1_0-branch message.py Log Message: Backport docstring fix. Fix usage of StringIO Index: message.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/spambayes/message.py,v retrieving revision 1.49.4.4 retrieving revision 1.49.4.5 diff -C2 -d -r1.49.4.4 -r1.49.4.5 *** message.py 22 Oct 2004 05:00:51 -0000 1.49.4.4 --- message.py 22 Nov 2004 23:38:34 -0000 1.49.4.5 *************** *** 11,17 **** MessageInfoDB is a simple shelve persistency class for the persistent ! state of a Message obect. For the moment, the db name is hard-coded, ! but we'll have to do this a different way. Mark Hammond's idea is to ! have a master database, that simply keeps track of the names and instances of other databases, such as the wordinfo and msginfo databases. The MessageInfoDB currently does not provide iterators, but should at some --- 11,16 ---- MessageInfoDB is a simple shelve persistency class for the persistent ! state of a Message obect. Mark Hammond's idea is to have a master ! database, that simply keeps track of the names and instances of other databases, such as the wordinfo and msginfo databases. The MessageInfoDB currently does not provide iterators, but should at some *************** *** 22,29 **** Message is an extension of the email package Message class, to include persistent message information. The persistent state ! -currently- consists of the message id, its current classification, and its current training. The payload is not ! persisted. Payload persistence is left to whatever mail client ! software is being used. SBHeaderMessage extends Message to include spambayes header specific --- 21,27 ---- Message is an extension of the email package Message class, to include persistent message information. The persistent state ! currently consists of the message id, its current classification, and its current training. The payload is not ! persisted. SBHeaderMessage extends Message to include spambayes header specific *************** *** 246,250 **** def setPayload(self, payload): prs = email.Parser.Parser() ! fp = StringIO(payload) # this is kindof a hack, due to the fact that the parser creates a # new message object, and we already have the message object --- 244,248 ---- def setPayload(self, payload): prs = email.Parser.Parser() ! fp = StringIO.StringIO(payload) # this is kindof a hack, due to the fact that the parser creates a # new message object, and we already have the message object From anadelonbrin at users.sourceforge.net Tue Nov 23 00:39:14 2004 From: anadelonbrin at users.sourceforge.net (Tony Meyer) Date: Tue Nov 23 00:39:18 2004 Subject: [Spambayes-checkins] spambayes/spambayes __init__.py, 1.11.4.2, 1.11.4.3 Message-ID: Update of /cvsroot/spambayes/spambayes/spambayes In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv27358/spambayes Modified Files: Tag: release_1_0-branch __init__.py Log Message: Prepare for 1.0.1 Index: __init__.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/spambayes/__init__.py,v retrieving revision 1.11.4.2 retrieving revision 1.11.4.3 diff -C2 -d -r1.11.4.2 -r1.11.4.3 *** __init__.py 8 Jul 2004 23:51:24 -0000 1.11.4.2 --- __init__.py 22 Nov 2004 23:39:12 -0000 1.11.4.3 *************** *** 1,3 **** # package marker. ! __version__ = '1.0' --- 1,3 ---- # package marker. ! __version__ = '1.0.1' From anadelonbrin at users.sourceforge.net Tue Nov 23 00:40:33 2004 From: anadelonbrin at users.sourceforge.net (Tony Meyer) Date: Tue Nov 23 00:40:36 2004 Subject: [Spambayes-checkins] spambayes/scripts sb_dbexpimp.py, 1.12.4.2, 1.12.4.3 Message-ID: Update of /cvsroot/spambayes/spambayes/scripts In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv27579/scripts Modified Files: Tag: release_1_0-branch sb_dbexpimp.py Log Message: Backport fix for merging into a dbm database, and fix for opening a nonexistant csv file. Index: sb_dbexpimp.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/scripts/sb_dbexpimp.py,v retrieving revision 1.12.4.2 retrieving revision 1.12.4.3 diff -C2 -d -r1.12.4.2 -r1.12.4.3 *** sb_dbexpimp.py 9 Nov 2004 22:53:02 -0000 1.12.4.2 --- sb_dbexpimp.py 22 Nov 2004 23:40:29 -0000 1.12.4.3 *************** *** 189,198 **** bayes = spambayes.storage.open_storage(dbFN, useDBM) ! try: ! fp = open(inFN, 'rb') ! except IOError, e: ! if e.errno != errno.ENOENT: ! raise ! rdr = csv.reader(fp) (nham, nspam) = rdr.next() --- 189,193 ---- bayes = spambayes.storage.open_storage(dbFN, useDBM) ! fp = open(inFN, 'rb') rdr = csv.reader(fp) (nham, nspam) = rdr.next() *************** *** 215,221 **** word = uunquote(word) ! try: ! wi = bayes.wordinfo[word] ! except KeyError: wi = bayes.WordInfoClass() --- 210,217 ---- word = uunquote(word) ! # Can't use wordinfo[word] here, because wordinfo ! # is only a cache with dbm! Need to use _wordinfoget instead. ! wi = bayes._wordinfoget(word) ! if wi is None: wi = bayes.WordInfoClass() *************** *** 240,245 **** - - if __name__ == '__main__': --- 236,239 ---- From anadelonbrin at users.sourceforge.net Tue Nov 23 00:41:43 2004 From: anadelonbrin at users.sourceforge.net (Tony Meyer) Date: Tue Nov 23 00:41:48 2004 Subject: [Spambayes-checkins] spambayes README-DEVEL.txt,1.12,1.12.4.1 Message-ID: Update of /cvsroot/spambayes/spambayes In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv27808 Modified Files: Tag: release_1_0-branch README-DEVEL.txt Log Message: Backport updates about the build process. Index: README-DEVEL.txt =================================================================== RCS file: /cvsroot/spambayes/spambayes/README-DEVEL.txt,v retrieving revision 1.12 retrieving revision 1.12.4.1 diff -C2 -d -r1.12 -r1.12.4.1 *** README-DEVEL.txt 8 Feb 2004 02:45:37 -0000 1.12 --- README-DEVEL.txt 22 Nov 2004 23:41:40 -0000 1.12.4.1 *************** *** 505,511 **** o Now commit spambayes/__init__.py and tag the whole checkout - see the existing tag names for the tag name format. ! o Update the website News, Download and Application sections. o Update reply.txt in the website repository as needed (it specifies the ! latest version). Then let Tim, Barry or Skip know that they need to update the autoresponder. --- 505,511 ---- o Now commit spambayes/__init__.py and tag the whole checkout - see the existing tag names for the tag name format. ! o Update the website News, Download, Windows and Application sections. o Update reply.txt in the website repository as needed (it specifies the ! latest version). Then let Tim, Barry, Tony, or Skip know that they need to update the autoresponder. *************** *** 525,526 **** --- 525,555 ---- else is left alone. + Making a binary release + ======================= + + The binary release includes both sb_server and the Outlook plug-in and + is an installer for Windows (98 and above) systems. In order to have + COM typelibs that work with Outlook 2000, 2002 and 2003, you need to + build the installer on a system that has Outlook 2000 (not a more recent + version). You also need to have InnoSetup, resourcepackage and py2exe + installed. + + o Get hold of a fresh copy of the source (Windows line endings, + presumably). + o Run sb_server and open the web interface. This gets resourcepackage + to generate the needed files. + o Replace the __init__.py file in spambayes/spambayes/resources with + a blank file to disable resourcepackage. + o Ensure that the version numbers in spambayes/spambayes/__init__.py + and spambayes/spambayes/Version.py are up-to-date. + o Ensure that you don't have any other copies of spambayes in your + PYTHONPATH, or py2exe will pick these up! If in doubt, run + setup.py install. + o Run the "setup_all.py" script in the spambayes/windows/py2exe/ + directory. This uses py2exe to create the files that Inno will install. + o Open (in InnoSetup) the spambayes.iss file in the spambayes/windows/ + directory. Change the version number in the AppVerName and + OutputBaseFilename lines to the new number. + o Compile the spambayes.iss script to get the executable. + o You can now follow the steps in the source release description above, + from the testing step. From anadelonbrin at users.sourceforge.net Tue Nov 23 00:48:39 2004 From: anadelonbrin at users.sourceforge.net (Tony Meyer) Date: Tue Nov 23 00:48:41 2004 Subject: [Spambayes-checkins] spambayes CHANGELOG.txt,1.44.4.3,1.44.4.4 Message-ID: Update of /cvsroot/spambayes/spambayes In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv29218 Modified Files: Tag: release_1_0-branch CHANGELOG.txt Log Message: Bring up-to-date. Index: CHANGELOG.txt =================================================================== RCS file: /cvsroot/spambayes/spambayes/CHANGELOG.txt,v retrieving revision 1.44.4.3 retrieving revision 1.44.4.4 diff -C2 -d -r1.44.4.3 -r1.44.4.4 *** CHANGELOG.txt 9 Nov 2004 22:53:55 -0000 1.44.4.3 --- CHANGELOG.txt 22 Nov 2004 23:48:28 -0000 1.44.4.4 *************** *** 3,6 **** --- 3,7 ---- Release 1.0.1 ============= + Tony Meyer 11/11/2004 The installer wasn't offered to install a startup items shortcut, so fix that. This is a non-ideal patch, but appears to be the only way Inno will work. Tony Meyer 03/11/2004 Fix [ 1022848 ] sb_dbexpimp.py crashes while importing into pickle file Tony Meyer 03/11/2004 Fix [ 831864 ] sb_mboxtrain.py: flock vs. lockf *************** *** 21,24 **** --- 22,26 ---- Tony Meyer 14/07/2004 Fix [ 959937 ] "Invalid server" message not always correct Skip Montanaro 10/07/2004 tte.py: 2.3 compatibility: add reversed() function + Tony Meyer 09/07/2004 Update test_storage.py test to reflect (current) correct way to call open_storage. Fixes part of [ 981970 ] tests failing. Tony Meyer 09/07/2004 Using -u with sb_server had been broken. Fix this. From anadelonbrin at users.sourceforge.net Tue Nov 23 00:49:35 2004 From: anadelonbrin at users.sourceforge.net (Tony Meyer) Date: Tue Nov 23 00:49:38 2004 Subject: [Spambayes-checkins] spambayes CHANGELOG.txt,1.44.4.4,1.44.4.5 Message-ID: Update of /cvsroot/spambayes/spambayes In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv29458 Modified Files: Tag: release_1_0-branch CHANGELOG.txt Log Message: Bring up-to-date. Index: CHANGELOG.txt =================================================================== RCS file: /cvsroot/spambayes/spambayes/CHANGELOG.txt,v retrieving revision 1.44.4.4 retrieving revision 1.44.4.5 diff -C2 -d -r1.44.4.4 -r1.44.4.5 *** CHANGELOG.txt 22 Nov 2004 23:48:28 -0000 1.44.4.4 --- CHANGELOG.txt 22 Nov 2004 23:49:32 -0000 1.44.4.5 *************** *** 3,6 **** --- 3,8 ---- Release 1.0.1 ============= + Tony Meyer 15/11/2004 Fix a bug in sb_dbexpimp.py where merging into an existing dbm file might lose training data. + Tony Meyer 15/11/2004 sb_dbexpimp.py: Fail if the csv file doesn't exist that we are trying to import from rather than keeping going, which made no sense. Tony Meyer 11/11/2004 The installer wasn't offered to install a startup items shortcut, so fix that. This is a non-ideal patch, but appears to be the only way Inno will work. Tony Meyer 03/11/2004 Fix [ 1022848 ] sb_dbexpimp.py crashes while importing into pickle file From anadelonbrin at users.sourceforge.net Tue Nov 23 00:50:01 2004 From: anadelonbrin at users.sourceforge.net (Tony Meyer) Date: Tue Nov 23 00:50:04 2004 Subject: [Spambayes-checkins] spambayes CHANGELOG.txt,1.48,1.49 Message-ID: Update of /cvsroot/spambayes/spambayes In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv29564 Modified Files: CHANGELOG.txt Log Message: Bring up-to-date. Index: CHANGELOG.txt =================================================================== RCS file: /cvsroot/spambayes/spambayes/CHANGELOG.txt,v retrieving revision 1.48 retrieving revision 1.49 diff -C2 -d -r1.48 -r1.49 *** CHANGELOG.txt 9 Nov 2004 03:13:00 -0000 1.48 --- CHANGELOG.txt 22 Nov 2004 23:49:58 -0000 1.49 *************** *** 3,6 **** --- 3,23 ---- Release 1.1a1 ============= + Tony Meyer 23/11/2004 message.py: Change MessageInfoBase's methods so that recording & retrieving a message are not private methods and are more clearly named. + Tony Meyer 23/11/2004 message.py: Change so that the messageinfodb doesn't get created/opened on import, but rather through utility functions like those in spambayes.storage. + Tony Meyer 23/11/2004 message.py: Remove the asTokens method in favour of the existing tokenize function. + Tony Meyer 23/11/2004 message.py: Fix the include_evidence header to check for *H* and *S* explicitly rather than any token starting with *. + Tony Meyer 22/11/2004 Add new storage types: CBDClassifier, ZODBClassifier, ZEOClassifier + Tony Meyer 22/11/2004 Add code to allow persistent_storage_name to not be expanded into an absolute path with certain storage types (e.g. the SQL ones). + Tony Meyer 22/11/2004 sb_pop3dnd: Play nicer with win32 gui + Tony Meyer 22/11/2004 sb_pop3dnd: Don't use the deprecated 'strict' kwarg for email messages. + Tony Meyer 22/11/2004 sb_pop3dnd: Add appropriate state createworkers function & call. + Tony Meyer 22/11/2004 sb_pop3dnd: Modify to have the prepare/start/stop API that sb_server has. + Tony Meyer 22/11/2004 sb_filter: Remove the "experimental" marking in the docstring for the training functions. + Tony Meyer 15/11/2004 Fix a bug in sb_dbexpimp.py where merging into an existing dbm file might lose training data. + Tony Meyer 15/11/2004 sb_dbexpimp.py: Fail if the csv file doesn't exist that we are trying to import from rather than keeping going, which made no sense. + Tony Meyer 15/11/2004 sb_dbexpimp.py: Stop bothering to remove the .dat and .dir files that dumbdbm create (long time since they were supported), and remove the verbose flag, which doesn't actually do anything. + Kenny Pitt 12/11/2004 Add a separate Statistics tab to make room for more detailed statistics. + Toby Dickenson 11/11/2004 Add a version of sb_bnfilter in C (for speed). + Tony Meyer 11/11/2004 The installer wasn't offered to install a startup items shortcut, so fix that. This is a non-ideal patch, but appears to be the only way Inno will work. Tony Meyer 09/11/2004 Implement [ 870524 ] Make the message-proxy timeout configurable Tony Meyer 09/11/2004 Use email.message_from_string(text, _class) rather than our wrapper functions. *************** *** 20,24 **** Tony Meyer 03/11/2004 Fix [ 1022848 ] sb_dbexpimp.py crashes while importing into pickle file Tony Meyer 03/11/2004 Fix [ 831864 ] sb_mboxtrain.py: flock vs. lockf ! Tony Meyer 03/11/2004 Fix [ 922063 ] Intermittent sb_filter.py faliure with URL pickle Tony Meyer 03/11/2004 Outlook: Also add an "X-Exchange-Delivery-Time" header to the faked up Exchange headers. Tony Meyer 02/11/2004 Improve the web interface statistics --- 37,41 ---- Tony Meyer 03/11/2004 Fix [ 1022848 ] sb_dbexpimp.py crashes while importing into pickle file Tony Meyer 03/11/2004 Fix [ 831864 ] sb_mboxtrain.py: flock vs. lockf ! Tony Meyer 03/11/2004 Fix [ 922063 ] Intermittent sb_filter.py failure with URL pickle Tony Meyer 03/11/2004 Outlook: Also add an "X-Exchange-Delivery-Time" header to the faked up Exchange headers. Tony Meyer 02/11/2004 Improve the web interface statistics *************** *** 33,36 **** --- 50,55 ---- Tony Meyer 18/10/2004 Copy Skip's -o command line option (available in all the regular scripts) to timcv.py. Tony Meyer 18/10/2004 TestDriver: If show_histograms was False, then the global ham/spam histogram never had the stats computed, but this gets used later, so the script would die with an AtrributeError. Fix that. + Tony Meyer 15/10/2004 Outlook: Add persistent statistics + Tony Meyer 13/10/2004 Implement [ 1039057 ] Diffs for IMAP login problems... Tony Meyer 13/10/2004 Add Classifier.use_bigrams option to the Advanced options page for sb_server and imapfilter. Tony Meyer 13/10/2004 Fix mySQL storage option for the case where the server does not support rollbacks. *************** *** 42,46 **** Tony Meyer 30/09/2004 Fix [ 903905 ] IMAP Configuration Error Tony Meyer 29/09/2004 Fix [ 1036601 ] typo on advanced config web page ! Tony Meyer 15/09/2004 sb_upload: Clarify docstring so that it's mroe clear what this script does. The -n / --null command line option didn't actually do anything; change it so that it does. Sjoerd Mullender 20/08/2004 imapfilter: Fix the regular expression to match the Message-ID header by stopping on newline. Skip Montanaro 18/08/2004 tte.py: Seems better to try and alternate ham/spam scoring instead of scoring all the hams in a batch and all the spams. --- 61,65 ---- Tony Meyer 30/09/2004 Fix [ 903905 ] IMAP Configuration Error Tony Meyer 29/09/2004 Fix [ 1036601 ] typo on advanced config web page ! Tony Meyer 15/09/2004 sb_upload: Clarify docstring so that it's more clear what this script does. The -n / --null command line option didn't actually do anything; change it so that it does. Sjoerd Mullender 20/08/2004 imapfilter: Fix the regular expression to match the Message-ID header by stopping on newline. Skip Montanaro 18/08/2004 tte.py: Seems better to try and alternate ham/spam scoring instead of scoring all the hams in a batch and all the spams. *************** *** 84,88 **** Tony Meyer 04/07/2004 Fix [ 933473 ] Unnecessary spam folder hook. Neil Schemenauer 30/06/2004 New script, hammie2cdb.py, that converts hammie databases into cdb databases (usable by CdbClassifier). ! Skip Montanaro 29/06/2004 tte.py: Worm around the extremely rare case during verbose most where the message sneaks through without either a message-id or a subject. Skip Montanaro 26/06/2004 New script, postfixproxy.py, a first cut proxy filter for use with PostFix 2.1's content filter stuff. Skip Montanaro 26/06/2004 hammie: Rename filter() to score_and_filter() and return both the spamprob and the modified message. --- 103,107 ---- Tony Meyer 04/07/2004 Fix [ 933473 ] Unnecessary spam folder hook. Neil Schemenauer 30/06/2004 New script, hammie2cdb.py, that converts hammie databases into cdb databases (usable by CdbClassifier). ! Skip Montanaro 29/06/2004 tte.py: Worm around the extremely rare case during verbose mode where the message sneaks through without either a message-id or a subject. Skip Montanaro 26/06/2004 New script, postfixproxy.py, a first cut proxy filter for use with PostFix 2.1's content filter stuff. Skip Montanaro 26/06/2004 hammie: Rename filter() to score_and_filter() and return both the spamprob and the modified message. *************** *** 342,346 **** Alpha Release 8 =============== ! There is no Alpha Release 8. Alpha Release 7 --- 361,365 ---- Alpha Release 8 =============== ! There was no Alpha Release 8. Alpha Release 7 From anadelonbrin at users.sourceforge.net Tue Nov 23 00:57:31 2004 From: anadelonbrin at users.sourceforge.net (Tony Meyer) Date: Tue Nov 23 00:57:34 2004 Subject: [Spambayes-checkins] website faq.txt,1.82,1.83 Message-ID: Update of /cvsroot/spambayes/website In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv31119 Modified Files: faq.txt Log Message: Add extra information to the "Does it work with Exchange" question based on a suggestion by Scott L Miller. Index: faq.txt =================================================================== RCS file: /cvsroot/spambayes/website/faq.txt,v retrieving revision 1.82 retrieving revision 1.83 diff -C2 -d -r1.82 -r1.83 *** faq.txt 8 Nov 2004 01:22:59 -0000 1.82 --- faq.txt 22 Nov 2004 23:57:29 -0000 1.83 *************** *** 542,545 **** --- 542,551 ---- Yes. + The SpamBayes Outlook plug-in simply watches the folders that you have + instructed it to for new mail. When new mail is received, Outlook informs + SpamBayes, which then scores the message and performs the actions you have + asked it to, depending on the message score. Thus it isn't involved in + the delivery of mail, and so has no idea that it is coming from Exchange. + Can mail marked as spam automatically be marked as read? -------------------------------------------------------- *************** *** 1246,1250 **** ``sb_server.py -u 8881 -b`` (or ``sb_imapfilter.py -u 8881 -b``), or another port that you know is free and available on your machine. ! Known Problems & Workarounds --- 1252,1256 ---- ``sb_server.py -u 8881 -b`` (or ``sb_imapfilter.py -u 8881 -b``), or another port that you know is free and available on your machine. ! Known Problems & Workarounds From anadelonbrin at users.sourceforge.net Tue Nov 23 01:12:47 2004 From: anadelonbrin at users.sourceforge.net (Tony Meyer) Date: Tue Nov 23 01:12:51 2004 Subject: [Spambayes-checkins] website faq.txt,1.83,1.84 Message-ID: Update of /cvsroot/spambayes/website In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv2133 Modified Files: faq.txt Log Message: Add a FAQ about the "no filterable messages" Outlook problem. Index: faq.txt =================================================================== RCS file: /cvsroot/spambayes/website/faq.txt,v retrieving revision 1.83 retrieving revision 1.84 diff -C2 -d -r1.83 -r1.84 *** faq.txt 22 Nov 2004 23:57:29 -0000 1.83 --- faq.txt 23 Nov 2004 00:12:44 -0000 1.84 *************** *** 1420,1423 **** --- 1420,1449 ---- + I get an error message "No filterable messages are selected". + ------------------------------------------------------------- + + This applies to the Outlook plug-in only. SpamBayes only lets you train + on messages that have been received (these are the only messages that + should be trained on). This means that you cannot train on sent messages, + drafts, notes, calendar items, tasks, and so on. + + To check whether a message has been received, SpamBayes checks some of the + Outlook properties for the message. Very seldomly, these can result in a + false classification, where the message has been received, but SpamBayes + does not believe it has. The best move here is to simply move the message + yourself. If this is a recurring problem, please add comments to the + `appropriate SourceForge tracker`_. + + Note that one cause of this problem is that with some versions of Outlook + and Outlook Express, moving mail from Outlook Express to Outlook will strip + the mail of all Internet headers, which means the messages are not able to + be filtered/trained. However, this is not a problem with SpamBayes - you + can either work around the export/import problem, or simply not use those + messages for training (we do not recommend pre-training in bulk, in any + case). + + .. _appropriate SourceForge tracker: http://sourceforge.net/tracker/index.php?func=detail&aid=854547&group_id=61702&atid=498103 + + Development =========== From anadelonbrin at users.sourceforge.net Tue Nov 23 01:15:46 2004 From: anadelonbrin at users.sourceforge.net (Tony Meyer) Date: Tue Nov 23 01:15:49 2004 Subject: [Spambayes-checkins] spambayes/spambayes oe_mailbox.py,1.9,1.10 Message-ID: Update of /cvsroot/spambayes/spambayes/spambayes In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv2913/spambayes Modified Files: oe_mailbox.py Log Message: Add modifications/improvements mostly from: [ 800671 ] Windows GUI for easy Outlook Express mailboxes training Index: oe_mailbox.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/spambayes/oe_mailbox.py,v retrieving revision 1.9 retrieving revision 1.10 diff -C2 -d -r1.9 -r1.10 *** oe_mailbox.py 19 Jan 2004 17:58:28 -0000 1.9 --- oe_mailbox.py 23 Nov 2004 00:15:42 -0000 1.10 *************** *** 1,30 **** from __future__ import generators ! # This module is part of the spambayes project, which is Copyright 2002-3 # The Python Software Foundation and is covered by the Python Software # Foundation license. ! # Simple Python library for Outlook Express mailboxes handling ! # Based on C++ work by Arne Schloh ! ! __author__ = "Romain Guy" __credits__ = "All the SpamBayes folk" import binascii import os import struct import msgs ! import StringIO import sys ! from time import gmtime, strftime try: import win32api import win32con from win32com.shell import shell, shellcon except ImportError: ! # Not win32, or win32all not installed. # Some functions will not work, but some will. ! win32api = win32con = shell = shellcon = None ########################################################################### --- 1,58 ---- + """ + Simple Python library for Outlook Express mailbox handling, and some + other Outlook Express utility functions. + + Functions: + getDBXFilesList() + Returns a list containing the DBX file names for current user + getMbox(dbxPath) + Returns an mbox converted from a DBX file + getRegistryKey() + Returns the root key for current user's Outlook Express settings + getStorePath() + Returns the path where DBX files are stored for current user + train(dbxPath, isSpam) + Trains a DBX file as spam or ham through Hammie + """ + from __future__ import generators ! # This module is part of the spambayes project, which is Copyright 2002-5 # The Python Software Foundation and is covered by the Python Software # Foundation license. ! __author__ = "Romain Guy " __credits__ = "All the SpamBayes folk" + # Based on C++ work by Arne Schloh + import binascii import os + import re import struct + import mailbox import msgs ! try: ! import cStringIO as StringIO ! except ImportError: ! import StringIO import sys ! from time import * try: import win32api import win32con + import win32gui from win32com.shell import shell, shellcon except ImportError: ! # Not win32, or pywin32 not installed. # Some functions will not work, but some will. ! win32api = win32con = win32gui = shell = shellcon = None ! ! import hammie ! import oe_mailbox ! import mboxutils ! ! from spambayes.Options import options ########################################################################### *************** *** 453,457 **** if address and entries: tree = dbxTree(dbxStream, address, entries) ! dbxBuffer = "" for i in range(entries): --- 481,485 ---- if address and entries: tree = dbxTree(dbxStream, address, entries) ! dbxBuffer = [] for i in range(entries): *************** *** 468,475 **** # data from the message itself, as this will # result in incorrect tokens. ! dbxBuffer += "From spambayes@spambayes.org %s\n%s" \ ! % (strftime("%a %b %d %H:%M:%S MET %Y", ! gmtime()), message.getText()) ! content = dbxBuffer dbxStream.close() return content --- 496,504 ---- # data from the message itself, as this will # result in incorrect tokens. ! dbxBuffer.append("From spambayes@spambayes.org %s\n%s" \ ! % (strftime("%a %b %d %H:%M:%S MET %Y", ! gmtime()), ! message.getText())) ! content = "".join(dbxBuffer) dbxStream.close() return content *************** *** 479,491 **** Tested with Outlook Express 6.0 with Windows XP.""" - if sys.platform != "win32": - # AFAIK, there is only a Win32 OE, and a Mac OE. - # The Mac OE should be easy enough, but I don't know - # where the dbx files are stored (I presume they are in the - # same format). - raise NotImplementedError if win32api is None: # Delayed import error from top. ! raise ImportError("win32all not installed") reg = win32api.RegOpenKeyEx(win32con.HKEY_USERS, "") --- 508,514 ---- Tested with Outlook Express 6.0 with Windows XP.""" if win32api is None: # Delayed import error from top. ! raise ImportError("pywin32 not installed") reg = win32api.RegOpenKeyEx(win32con.HKEY_USERS, "") *************** *** 527,544 **** yield subkey def OEStoreRoot(): """Return the path to the Outlook Express Store Root. Tested with Outlook Express 6.0 with Windows XP.""" ! # Run through the identity keys, using the first that ! # works. ! raw = "" ! for identity in OEIdentityKeys(): ! try: ! raw = win32api.RegQueryValueEx(identity, "Store Root") ! except win32api.error: ! pass ! else: ! break # I can't find a shellcon to that is the same as %UserProfile%, # so extract it from CSIDL_LOCAL_APPDATA --- 550,572 ---- yield subkey + def OECurrentUserKey(): + """Returns the root registry key for current user Outlook + Express settings.""" + if win32api is None: + # Delayed import error from top. + raise ImportError("pywin32 not installed") + key = "Identities" + reg = win32api.RegOpenKeyEx(win32con.HKEY_CURRENT_USER, key) + id = win32api.RegQueryValueEx(reg, "Default User ID")[0] + subKey = "%s\\%s\\Software\\Microsoft\\Outlook Express\\5.0" % (key, id) + return subKey + def OEStoreRoot(): """Return the path to the Outlook Express Store Root. Tested with Outlook Express 6.0 with Windows XP.""" ! subKey = OECurrentUserKey() ! reg = win32api.RegOpenKeyEx(win32con.HKEY_CURRENT_USER, subKey) ! path = win32api.RegQueryValueEx(reg, "Store Root")[0] # I can't find a shellcon to that is the same as %UserProfile%, # so extract it from CSIDL_LOCAL_APPDATA *************** *** 547,552 **** parts = UserDirectory.split(os.sep) UserProfile = os.sep.join(parts[:-2]) ! raw = raw[0].replace("%UserProfile%", UserProfile) ! return raw def OEAccountKeys(permission = None): --- 575,586 ---- parts = UserDirectory.split(os.sep) UserProfile = os.sep.join(parts[:-2]) ! return path.replace("%UserProfile%", UserProfile) ! ! def OEDBXFilesList(): ! """Returns a list of DBX files for current user.""" ! path = OEStoreRoot() ! dbx_re = re.compile('.+\.dbx') ! dbxs = [f for f in os.listdir(path) if dbx_re.search(f) != None] ! return dbxs def OEAccountKeys(permission = None): *************** *** 680,690 **** print_message = True ! if args: ! MAILBOX_DIR = args[0] ! else: ! MAILBOX_DIR = OEStoreRoot() ! ! files = [os.path.join(MAILBOX_DIR, file) for file in \ ! os.listdir(MAILBOX_DIR) if os.path.splitext(file)[1] == '.dbx'] for file in files: --- 714,719 ---- print_message = True ! MAILBOX_DIR = OEStoreRoot() ! files = [os.path.join(MAILBOX_DIR, f) for f in OEDBXFilesList()] for file in files: *************** *** 724,731 **** print message.getText() except Exception, (strerror): print strerror - dbx.close() if __name__ == '__main__': --- 753,761 ---- print message.getText() + dbx.close() + except Exception, (strerror): print strerror if __name__ == '__main__': From anadelonbrin at users.sourceforge.net Tue Nov 23 04:31:52 2004 From: anadelonbrin at users.sourceforge.net (Tony Meyer) Date: Tue Nov 23 04:31:56 2004 Subject: [Spambayes-checkins] spambayes/spambayes Version.py, 1.31.4.3, 1.31.4.4 Message-ID: Update of /cvsroot/spambayes/spambayes/spambayes In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv11109/spambayes Modified Files: Tag: release_1_0-branch Version.py Log Message: Bump some numbers for 1.0.1. Index: Version.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/spambayes/Version.py,v retrieving revision 1.31.4.3 retrieving revision 1.31.4.4 diff -C2 -d -r1.31.4.3 -r1.31.4.4 *** Version.py 9 Nov 2004 22:41:13 -0000 1.31.4.3 --- Version.py 23 Nov 2004 03:31:49 -0000 1.31.4.4 *************** *** 38,48 **** # "description" strings below - they just need to increment # so automated version checking works. ! "Version": 1.0, ! "BinaryVersion": 1.0, "Description": "SpamBayes Outlook Addin", ! "Date": "July 2004", ! "Full Description": "%(Description)s Version 1.0 (%(Date)s)", "Full Description Binary": ! "%(Description)s Binary Version 1.0 (%(Date)s)", # Note this means we can change the download page later, and old # versions will still go to the new page. --- 38,48 ---- # "description" strings below - they just need to increment # so automated version checking works. ! "Version": 1.0.1, ! "BinaryVersion": 1.0.1, "Description": "SpamBayes Outlook Addin", ! "Date": "November 2004", ! "Full Description": "%(Description)s Version 1.0.1 (%(Date)s)", "Full Description Binary": ! "%(Description)s Binary Version 1.0.1 (%(Date)s)", # Note this means we can change the download page later, and old # versions will still go to the new page. *************** *** 53,63 **** # Note these version numbers also currently don't appear in the # "description" strings below - see above ! "Version": 1.0, ! "BinaryVersion": 1.0, "Description": "SpamBayes POP3 Proxy", ! "Date": "July 2004", ! "Full Description": """%(Description)s Version 1.0 (%(Date)s)""", "Full Description Binary": ! """%(Description)s Binary Version 1.0 (%(Date)s)""", # Note this means we can change the download page later, and old # versions will still go to the new page. --- 53,63 ---- # Note these version numbers also currently don't appear in the # "description" strings below - see above ! "Version": 1.0.1, ! "BinaryVersion": 1.0.1, "Description": "SpamBayes POP3 Proxy", ! "Date": "November 2004", ! "Full Description": """%(Description)s Version 1.0.1 (%(Date)s)""", "Full Description Binary": ! """%(Description)s Binary Version 1.0.1 (%(Date)s)""", # Note this means we can change the download page later, and old # versions will still go to the new page. *************** *** 72,78 **** }, "IMAP Filter" : { ! "Version": 0.4, "Description": "SpamBayes IMAP Filter", ! "Date": "May 2004", "Full Description": """%(Description)s Version %(Version)s (%(Date)s)""", }, --- 72,78 ---- }, "IMAP Filter" : { ! "Version": 0.5, "Description": "SpamBayes IMAP Filter", ! "Date": "November 2004", "Full Description": """%(Description)s Version %(Version)s (%(Date)s)""", }, From anadelonbrin at users.sourceforge.net Tue Nov 23 04:39:38 2004 From: anadelonbrin at users.sourceforge.net (Tony Meyer) Date: Tue Nov 23 04:39:42 2004 Subject: [Spambayes-checkins] spambayes/spambayes Version.py, 1.31.4.4, 1.31.4.5 Message-ID: Update of /cvsroot/spambayes/spambayes/spambayes In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv12491/spambayes Modified Files: Tag: release_1_0-branch Version.py Log Message: Opps. Those are floats, not version numbers :) Use 1.01 not 1.0.1. Index: Version.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/spambayes/Version.py,v retrieving revision 1.31.4.4 retrieving revision 1.31.4.5 diff -C2 -d -r1.31.4.4 -r1.31.4.5 *** Version.py 23 Nov 2004 03:31:49 -0000 1.31.4.4 --- Version.py 23 Nov 2004 03:39:35 -0000 1.31.4.5 *************** *** 38,43 **** # "description" strings below - they just need to increment # so automated version checking works. ! "Version": 1.0.1, ! "BinaryVersion": 1.0.1, "Description": "SpamBayes Outlook Addin", "Date": "November 2004", --- 38,43 ---- # "description" strings below - they just need to increment # so automated version checking works. ! "Version": 1.01, ! "BinaryVersion": 1.01, "Description": "SpamBayes Outlook Addin", "Date": "November 2004", *************** *** 53,58 **** # Note these version numbers also currently don't appear in the # "description" strings below - see above ! "Version": 1.0.1, ! "BinaryVersion": 1.0.1, "Description": "SpamBayes POP3 Proxy", "Date": "November 2004", --- 53,58 ---- # Note these version numbers also currently don't appear in the # "description" strings below - see above ! "Version": 1.01, ! "BinaryVersion": 1.01, "Description": "SpamBayes POP3 Proxy", "Date": "November 2004", From anadelonbrin at users.sourceforge.net Tue Nov 23 05:03:03 2004 From: anadelonbrin at users.sourceforge.net (Tony Meyer) Date: Tue Nov 23 05:03:06 2004 Subject: [Spambayes-checkins] spambayes/windows spambayes.iss, 1.15.4.4, 1.15.4.5 Message-ID: Update of /cvsroot/spambayes/spambayes/windows In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv17103/windows Modified Files: Tag: release_1_0-branch spambayes.iss Log Message: When backporting a fix I somehow cut off an "end;" as well. Fix that. Index: spambayes.iss =================================================================== RCS file: /cvsroot/spambayes/spambayes/windows/spambayes.iss,v retrieving revision 1.15.4.4 retrieving revision 1.15.4.5 diff -C2 -d -r1.15.4.4 -r1.15.4.5 *** spambayes.iss 10 Nov 2004 22:15:37 -0000 1.15.4.4 --- spambayes.iss 23 Nov 2004 04:02:45 -0000 1.15.4.5 *************** *** 118,121 **** --- 118,122 ---- 'If this message persists, you may need to log off from Windows, and try again.' Result := CheckNoAppMutex('InternetMailTransport', closeit); + end; // And finally, the SpamBayes server if Result then begin From anadelonbrin at users.sourceforge.net Tue Nov 23 05:28:19 2004 From: anadelonbrin at users.sourceforge.net (Tony Meyer) Date: Tue Nov 23 05:28:21 2004 Subject: [Spambayes-checkins] spambayes WHAT_IS_NEW.txt,1.35.4.2,1.35.4.3 Message-ID: Update of /cvsroot/spambayes/spambayes In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv22188 Modified Files: Tag: release_1_0-branch WHAT_IS_NEW.txt Log Message: Update for 1.0.1 Index: WHAT_IS_NEW.txt =================================================================== RCS file: /cvsroot/spambayes/spambayes/WHAT_IS_NEW.txt,v retrieving revision 1.35.4.2 retrieving revision 1.35.4.3 diff -C2 -d -r1.35.4.2 -r1.35.4.3 *** WHAT_IS_NEW.txt 19 Jul 2004 03:21:23 -0000 1.35.4.2 --- WHAT_IS_NEW.txt 23 Nov 2004 04:28:16 -0000 1.35.4.3 *************** *** 4,86 **** ) ! Changes are broken into sections, so that it's easier for you to find the ! changes that are relevant to you. ! Any actions necessary to move to this release from the previous release are ! noted in the "Transition" section. You should also read the "Incompatible ! changes" section. ! New in 1.0 ! ========== ! There have been no changes made between 1.0rc2 and 1.0. If you are ! upgrading from an earlier version, you may wish to read the WHAT_IS_NEW ! files from the versions that you skipped, as well. ! Deprecated Options ! ================== ! Since 1.0a9, SpamBayes has had a method of noting options that are ! deprecated and which will not be available in future releases (it is ! likely that options will only be deprecated for one release before being ! removed). Deprecated options will not be offered in the graphical ! interfaces (Outlook plugin and web interface), and will be listed in ! the "What's New" file (this file) for each release. ! Deprecated options have the same name as previously, but now begin with ! "x-" (so "extract_dow" is now "x-extract_dow"). You can continue to use ! the original name (e.g. "extract_dow") in your configuration file, but will ! receive warnings in your log file or console window. We recommend that you ! examine this output every time you upgrade SpamBayes to ensure that you are ! not using any newly deprecated options. ! Discussion regarding the deprecation of any particular option can be found ! in the spambayes-dev archives (at ! ). ! No options have been deprecated in this release. ! The following options are still deprecated and will be removed in the near ! future, unless testing indicates otherwise: o [Tokenizer] generate_time_buckets o [Tokenizer] extract_dow o [Classifier] experimental_ham_spam_imbalance_adjustment Experimental Options ==================== ! Since 1.0a9, SpamBayes has had a method of noting options that are ! experimental and which may be removed or made permanent in future releases ! (many experimental options will only be experimental for one release before ! being removed or fully integrated). Experimental options are not exposed ! by the Outlook plugin, and are listed on a separate ! "Experimental Configuration" page in the web interface. The options will ! be listed in the "What's New" file (this file) for each release. ! ! Experimental options begin with "x-" (as do deprecated options). If you ! start using an experimental option and it later becomes permanent you can ! continue to use the "x-" name in your configuration file, but will ! receive warnings in your log file or console window. We recommend that you ! examine this output every time you upgrade SpamBayes to ensure that you are ! using the correct name for all options. ! ! Discussion of why experimental options and results from using them can be ! found in the spambayes-dev archives (at ! ). Ideally, we would like ! users to test these options out on their mail and let us know the results. ! This can be as simple as turning on the option and emailing ! spambayes@python.org with anecdotal results after a period of time, or the ! full testtools scripts can be used. For details about using these, please ! read the "README-DEVEL.txt" file that comes with the SpamBayes source ! archive. ! Experimental options are always turned off by default. ! No experimental options have been added in this release. ! Experimental options that are currently available (which we invite you to ! try out and report back your results) include: o [Tokenizer] x-search_for_habeas_headers o [Tokenizer] x-reduce_habeas_headers --- 4,101 ---- ) ! This is a bugfix release, so there are no new features, and you do not need ! to do anything to migrate to the new release (other than install it). There ! are no incompatible changes. ! New in 1.0.1 ! ============ ! o A bug with the import/export script (sb_dbexpimp.py) where merging into ! an existing database in the dbm format might lose training data has been ! fixed. Another minor bug with the script that caused an error to be ! printed when importing into a pickle file (although the import was still ! successful) has also been fixed. ! o The binary installer failed to offer to install a startup items shortcut, ! which is convenient for sb_server binary users. The installer will now ! do this. + o sb_server users who wish to use non-standard strings for classification + (e.g. "spambayes-ham" instead of "ham") can now use the "Notate To" and + "Notate Subject" options. This is particularly useful for Outlook + Express users. ! o Users of Windows extensions that automatically expand zip files (such ! as ZipMagic) should now be able to successfully use the binary versions ! of sb_server and the Outlook plug-in. ! o Checking whether a new version is available should now work for users ! who have entered proxy details in their configuration file. ! o Source code users can now use Python 2.4 with SpamBayes, although some ! DeprecationWarnings may still be generated. ! o The '-u' command line option for sb_server (letting you specify which ! port the web interface is served on) was broken, but is now fixed. ! o The tte.py (Train to Exhaustion) script now works with Python 2.3. ! o Various other minor fixes. ! ! ! Reported Bugs Fixed ! =================== ! The following bugs tracked via the SourceForge system were fixed: ! 981970, 990700, 941639, 986353, 790757, 944109, 959937, 903905, ! 1051081, 1036601, 922063, 831864, 1022848, 715248 ! ! A URL containing the details of these bugs can be made by appending the ! bug number to this URL: ! http://sourceforge.net/tracker/index.php?func=detail&group_id=61702&atid=498103&aid= ! ! As this is a bugfix release, no feature requests or patches tracked via the ! SourceForge system were added. ! ! ! Deprecated Options ! ================== ! ! The following options are still deprecated and will be removed in the 1.1 ! release: o [Tokenizer] generate_time_buckets o [Tokenizer] extract_dow o [Classifier] experimental_ham_spam_imbalance_adjustment + We recommend that you cease using these options if you still are. If you + have any questions about the deprecated options, please email + spambayes@python.org and we will try and answer them. + Experimental Options ==================== ! We would like to remind users about our set of experimental options. These ! are options which we believe may be of benefit to users, but have not been ! tested throughly enough to warrent full inclusion. We would greatly ! appreciate feedback from users willing to try these options out as to their ! perceived benefit. Both source code and binary users (including Outlook) ! can try these options out. ! To enable an experimental option, sb_server and sb_imapfilter users should ! click on the "Experimental Configuration" button on the main configuration ! page, and select the option(s) they wish to try. ! To enable an experimental option, Outlook plug-in users should open their ! "Data Directory" (via SpamBayes->SpamBayes Manager->Advanced->Show Data Folder) ! and open the "default_bayes_customize.ini" file in there (create one with ! Notepad if there isn't already one). In this file, add the options that ! you wish to try - for example, to enable searching for "Habeas" headers, ! add a line with "Tokenizer" and, below that, a line with ! "x-search_for_habeas_headers:True". ! If you have any queries about the experimental options, please email ! spambayes@python.org and we will try and answer them. ! ! Experimental options that are currently available include: o [Tokenizer] x-search_for_habeas_headers o [Tokenizer] x-reduce_habeas_headers *************** *** 93,97 **** and bigrams (pairs of words), but uses a 'tiling' scheme, where only the set of unigrams and bigrams that have the strongest effect on ! the message are used. o [URLRetriever] x-slurp_urls --- 108,114 ---- and bigrams (pairs of words), but uses a 'tiling' scheme, where only the set of unigrams and bigrams that have the strongest effect on ! the message are used. Note that this option will no longer be ! experimental (although still off by default) with 1.1 - we recommend ! that you try it out if you want higher accuracy. o [URLRetriever] x-slurp_urls From anadelonbrin at users.sourceforge.net Wed Nov 24 00:37:20 2004 From: anadelonbrin at users.sourceforge.net (Tony Meyer) Date: Wed Nov 24 00:37:24 2004 Subject: [Spambayes-checkins] spambayes/scripts sb_server.py,1.28,1.29 Message-ID: Update of /cvsroot/spambayes/spambayes/scripts In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv31243/scripts Modified Files: sb_server.py Log Message: Switch from using msg.asTokens to msg.tokenize. Index: sb_server.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/scripts/sb_server.py,v retrieving revision 1.28 retrieving revision 1.29 diff -C2 -d -r1.28 -r1.29 *** sb_server.py 9 Nov 2004 02:37:40 -0000 1.28 --- sb_server.py 23 Nov 2004 23:37:15 -0000 1.29 *************** *** 474,478 **** msg.setId(state.getNewMessageName()) # Now find the spam disposition and add the header. ! (prob, clues) = state.bayes.spamprob(msg.asTokens(),\ evidence=True) --- 474,478 ---- msg.setId(state.getNewMessageName()) # Now find the spam disposition and add the header. ! (prob, clues) = state.bayes.spamprob(msg.tokenize(),\ evidence=True) From anadelonbrin at users.sourceforge.net Wed Nov 24 00:44:42 2004 From: anadelonbrin at users.sourceforge.net (Tony Meyer) Date: Wed Nov 24 00:44:46 2004 Subject: [Spambayes-checkins] spambayes/utilities cleanarch.py,NONE,1.1 Message-ID: Update of /cvsroot/spambayes/spambayes/utilities In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv591/utilities Added Files: cleanarch.py Log Message: I'm not sure why this is still in the root directory when everything else was moved out. It really belongs in contrib or utilities, I think, so moving it to there (there is no CVS history to preserve). Also adding .py to the end of the filename, since it is a Python script. --- NEW FILE: cleanarch.py --- #! /usr/bin/env python # Copyright (C) 2001,2002 by the Free Software Foundation, Inc. # # This program is free software; you can redistribute it and/or # modify it under the terms of the GNU General Public License # as published by the Free Software Foundation; either version 2 # of the License, or (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. """Clean up an .mbox archive file. The archiver looks for Unix-From lines separating messages in an mbox archive file. For compatibility, it specifically looks for lines that start with "From " -- i.e. the letters capital-F, lowercase-r, o, m, space, ignoring everything else on the line. Normally, any lines that start "From " in the body of a message should be escaped such that a > character is actually the first on a line. It is possible though that body lines are not actually escaped. This script attempts to fix these by doing a stricter test of the Unix-From lines. Any lines that start "From " but do not pass this stricter test are escaped with a > character. Usage: cleanarch [options] < inputfile > outputfile Options: -s n --status=n Print a # character every n lines processed -q / --quiet Don't print changed line information to standard error. -n / --dry-run Don't actually output anything. -h / --help Print this message and exit """ import sys import re import getopt import mailbox cre = re.compile(mailbox.UnixMailbox._fromlinepattern) # From RFC 2822, a header field name must contain only characters from 33-126 # inclusive, excluding colon. I.e. from oct 41 to oct 176 less oct 072. Must # use re.match() so that it's anchored at the beginning of the line. fre = re.compile(r'[\041-\071\073-\0176]+') def usage(code, msg=''): print >> sys.stderr, __doc__ if msg: print >> sys.stderr, msg sys.exit(code) def escape_line(line, lineno, quiet, output): if output: sys.stdout.write('>' + line) if not quiet: print >> sys.stderr, '[%d]' % lineno, line[:-1] def main(): try: opts, args = getopt.getopt( sys.argv[1:], 'hqns:', ['help', 'quiet', 'dry-run', 'status=']) except getopt.error, msg: usage(1, msg) quiet = 0 output = 1 status = -1 for opt, arg in opts: if opt in ('-h', '--help'): usage(0) elif opt in ('-q', '--quiet'): quiet = 1 elif opt in ('-n', '--dry-run'): output = 0 elif opt in ('-s', '--status'): try: status = int(arg) except ValueError: usage(1, 'Bad status number: %s' % arg) if args: usage(1) lineno = 0 statuscnt = 0 messages = 0 while 1: lineno += 1 line = sys.stdin.readline() if not line: break if line.startswith('From '): if cre.match(line): # This is a real Unix-From line. But it could be a message # /about/ Unix-From lines, so as a second order test, make # sure there's at least one RFC 2822 header following nextline = sys.stdin.readline() lineno += 1 if not nextline: # It was the last line of the mbox, so it couldn't have # been a Unix-From escape_line(line, lineno, quiet, output) break fieldname = nextline.split(':', 1) if len(fieldname) < 2 or not fre.match(nextline): # The following line was not a header, so this wasn't a # valid Unix-From escape_line(line, lineno, quiet, output) if output: sys.stdout.write(nextline) else: # It's a valid Unix-From line messages += 1 if output: sys.stdout.write(line) sys.stdout.write(nextline) else: # This is a bogus Unix-From line escape_line(line, lineno, quiet, output) elif output: # Any old line sys.stdout.write(line) if status > 0 and (lineno % status) == 0: sys.stderr.write('#') statuscnt += 1 if statuscnt > 50: print >> sys.stderr statuscnt = 0 print >> sys.stderr, messages, 'messages found' if __name__ == '__main__': main() From anadelonbrin at users.sourceforge.net Wed Nov 24 00:44:43 2004 From: anadelonbrin at users.sourceforge.net (Tony Meyer) Date: Wed Nov 24 00:44:46 2004 Subject: [Spambayes-checkins] spambayes cleanarch,1.1,NONE Message-ID: Update of /cvsroot/spambayes/spambayes In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv591 Removed Files: cleanarch Log Message: I'm not sure why this is still in the root directory when everything else was moved out. It really belongs in contrib or utilities, I think, so moving it to there (there is no CVS history to preserve). Also adding .py to the end of the filename, since it is a Python script. --- cleanarch DELETED --- From anadelonbrin at users.sourceforge.net Thu Nov 25 07:36:25 2004 From: anadelonbrin at users.sourceforge.net (Tony Meyer) Date: Thu Nov 25 07:36:28 2004 Subject: [Spambayes-checkins] website applications.ht,1.30,1.31 Message-ID: Update of /cvsroot/spambayes/website In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv10220 Modified Files: applications.ht Log Message: Including the most recent version number in this file was fairly pointless, and made putting out a release more work, so it's gone. Index: applications.ht =================================================================== RCS file: /cvsroot/spambayes/website/applications.ht,v retrieving revision 1.30 retrieving revision 1.31 diff -C2 -d -r1.30 -r1.31 *** applications.ht 9 Jul 2004 00:35:46 -0000 1.30 --- applications.ht 25 Nov 2004 06:36:21 -0000 1.31 *************** *** 44,48 ****

Availability

!

Download the 1.0 source archive.

Alternatively, use CVS to get the code - go to the CVS page on the project's sourceforge site for more.

--- 44,48 ----

Availability

!

Download the source archive.

Alternatively, use CVS to get the code - go to the CVS page on the project's sourceforge site for more.

*************** *** 63,67 **** it.

Alternatively, to run from source, download the ! 1.0 source archive.

Alternatively, use CVS to get the code - go to the CVS page on the project's sourceforge site for more.

--- 63,67 ---- it.

Alternatively, to run from source, download the ! source archive.

Alternatively, use CVS to get the code - go to the CVS page on the project's sourceforge site for more.

*************** *** 78,82 ****

Availability

!

Download the 1.0 source archive.

Alternatively, use CVS to get the code - go to the CVS page on the project's sourceforge site for more.

--- 78,82 ----

Availability

!

Download the source archive.

Alternatively, use CVS to get the code - go to the CVS page on the project's sourceforge site for more.

*************** *** 94,98 ****

Availability

!

Download the 1.0 source archive.

Alternatively, use CVS to get the code - go to the CVS page on the project's sourceforge site for more.

--- 94,98 ----

Availability

!

Download the source archive.

Alternatively, use CVS to get the code - go to the CVS page on the project's sourceforge site for more.

*************** *** 112,115 ****

Availability

!

Download the 1.0 source archive.

Alternatively, use CVS to get the code - go to the CVS page on the project's sourceforge site for more.

--- 112,115 ----

Availability

!

Download the source archive.

Alternatively, use CVS to get the code - go to the CVS page on the project's sourceforge site for more.

From anadelonbrin at users.sourceforge.net Thu Nov 25 07:38:19 2004 From: anadelonbrin at users.sourceforge.net (Tony Meyer) Date: Thu Nov 25 07:38:22 2004 Subject: [Spambayes-checkins] website download.ht, 1.28, 1.29 index.ht, 1.35, 1.36 reply.txt, 1.15, 1.16 windows.ht, 1.41, 1.42 Message-ID: Update of /cvsroot/spambayes/website In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv10525 Modified Files: download.ht index.ht reply.txt windows.ht Log Message: Update for 1.0.1 Index: download.ht =================================================================== RCS file: /cvsroot/spambayes/website/download.ht,v retrieving revision 1.28 retrieving revision 1.29 diff -C2 -d -r1.28 -r1.29 *** download.ht 28 Sep 2004 07:38:08 -0000 1.28 --- download.ht 25 Nov 2004 06:38:15 -0000 1.29 *************** *** 3,16 **** Author: SpamBayes !

Version 1.0 of the SpamBayes project is now available. !

This is the final 1.0 release of SpamBayes. We expect it to prove to be ! quite stable and usable by most people. As time permits, we will endeavour ! to fix any remaining bugs and eventually a 1.0.1 release will be made. ! However, work can now begin on a 1.1 release, which may include many new (possibly even exciting!) features. Feedback to spambayes@python.org. !

You may like to view the release notes ! or the files that make up this release. --- 3,17 ---- Author: SpamBayes !

Version 1.0.1 of the SpamBayes project is now available. !

This is a bugfix release - it is funtionally identical to 1.0, but includes ! fixes for a number of bugs. We expect it to prove to be quite stable and ! usable by most people. As time permits, we will endeavour ! to fix any remaining bugs and eventually a 1.0.2 release will be made. ! However, work has now begin on a 1.1 release, which may include many new (possibly even exciting!) features. Feedback to spambayes@python.org. !

You may like to view the release notes ! or the files that make up this release. Index: index.ht =================================================================== RCS file: /cvsroot/spambayes/website/index.ht,v retrieving revision 1.35 retrieving revision 1.36 diff -C2 -d -r1.35 -r1.36 *** index.ht 9 Jul 2004 00:40:50 -0000 1.35 --- index.ht 25 Nov 2004 06:38:15 -0000 1.36 *************** *** 5,9 ****

News

!

SpamBayes 1.0 is now available! (This includes both the source archives and a Windows binary installer).

See the download page for more.

--- 5,9 ----

News

!

SpamBayes 1.0.1 is now available! (This includes both the source archives and a Windows binary installer).

See the download page for more.

Index: reply.txt =================================================================== RCS file: /cvsroot/spambayes/website/reply.txt,v retrieving revision 1.15 retrieving revision 1.16 diff -C2 -d -r1.15 -r1.16 *** reply.txt 9 Jul 2004 00:42:53 -0000 1.15 --- reply.txt 25 Nov 2004 06:38:15 -0000 1.16 *************** *** 48,55 **** ----------------------------------------------- ! Please ensure that you have the latest version. As of 2004-07-09, this is ! 1.0 for both the source and for the binary installer (for the Outlook ! plug-in and sb_server). If you are still having trouble, try looking at the ! bug reports that are currently open: http://sf.net/tracker/?group_id=61702&atid=498103 --- 48,55 ---- ----------------------------------------------- ! Please ensure that you have the latest version. As of November 25, 2004, ! this is 1.0.1 for both the source and for the binary installer (for the ! Outlook plug-in and sb_server). If you are still having trouble, try ! looking at the bug reports that are currently open: http://sf.net/tracker/?group_id=61702&atid=498103 Index: windows.ht =================================================================== RCS file: /cvsroot/spambayes/website/windows.ht,v retrieving revision 1.41 retrieving revision 1.42 diff -C2 -d -r1.41 -r1.42 *** windows.ht 28 Sep 2004 07:38:09 -0000 1.41 --- windows.ht 25 Nov 2004 06:38:15 -0000 1.42 *************** *** 11,17 ****

Latest Release

!

The latest release is 1.0 - see the ! release notes ! or download the installation program.

--- 11,17 ----

Latest Release

!

The latest release is 1.0.1 - see the ! release notes ! or download the installation program.

*************** *** 74,78 ****

Windows users using other mail clients and retrieving mail via POP3 can now download the same ! installation program and use it to install a binary version of sb_server, including a tray application.

--- 74,78 ----

Windows users using other mail clients and retrieving mail via POP3 can now download the same ! installation program and use it to install a binary version of sb_server, including a tray application.

From anadelonbrin at users.sourceforge.net Thu Nov 25 07:39:07 2004 From: anadelonbrin at users.sourceforge.net (Tony Meyer) Date: Thu Nov 25 07:39:10 2004 Subject: [Spambayes-checkins] spambayes README-DEVEL.txt,1.14,1.15 Message-ID: Update of /cvsroot/spambayes/spambayes In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv10661 Modified Files: README-DEVEL.txt Log Message: Putting out a release got a tiny bit simpler. Index: README-DEVEL.txt =================================================================== RCS file: /cvsroot/spambayes/spambayes/README-DEVEL.txt,v retrieving revision 1.14 retrieving revision 1.15 diff -C2 -d -r1.14 -r1.15 *** README-DEVEL.txt 30 Sep 2004 02:01:26 -0000 1.14 --- README-DEVEL.txt 25 Nov 2004 06:39:04 -0000 1.15 *************** *** 505,509 **** o Now commit spambayes/__init__.py and tag the whole checkout - see the existing tag names for the tag name format. ! o Update the website News, Download, Windows and Application sections. o Update reply.txt in the website repository as needed (it specifies the latest version). Then let Tim, Barry, Tony, or Skip know that they need to --- 505,509 ---- o Now commit spambayes/__init__.py and tag the whole checkout - see the existing tag names for the tag name format. ! o Update the website News, Download and Windows sections. o Update reply.txt in the website repository as needed (it specifies the latest version). Then let Tim, Barry, Tony, or Skip know that they need to From montanaro at users.sourceforge.net Thu Nov 25 16:12:06 2004 From: montanaro at users.sourceforge.net (Skip Montanaro) Date: Thu Nov 25 16:12:08 2004 Subject: [Spambayes-checkins] spambayes/spambayes __init__.py,1.11,1.12 Message-ID: Update of /cvsroot/spambayes/spambayes/spambayes In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv17182 Modified Files: __init__.py Log Message: uprev Index: __init__.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/spambayes/__init__.py,v retrieving revision 1.11 retrieving revision 1.12 diff -C2 -d -r1.11 -r1.12 *** __init__.py 5 May 2004 00:38:22 -0000 1.11 --- __init__.py 25 Nov 2004 15:12:03 -0000 1.12 *************** *** 1,3 **** # package marker. ! __version__ = '1.0rc1' --- 1,3 ---- # package marker. ! __version__ = '1.0.1' From anadelonbrin at users.sourceforge.net Fri Nov 26 00:19:07 2004 From: anadelonbrin at users.sourceforge.net (Tony Meyer) Date: Fri Nov 26 00:19:10 2004 Subject: [Spambayes-checkins] spambayes/spambayes message.py,1.58,1.59 Message-ID: Update of /cvsroot/spambayes/spambayes/spambayes In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv22222/spambayes Modified Files: message.py Log Message: Use cPickle when possible. Handle loading an Outlook messageinfo database. Add a __len__ function to the messageinfo databases. The messageinfo db now needs messages to have a GetDBKey function to determine the key to store the message under. For our message classes, this is just the same as getId(). Index: message.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/spambayes/message.py,v retrieving revision 1.58 retrieving revision 1.59 diff -C2 -d -r1.58 -r1.59 *** message.py 22 Nov 2004 23:34:43 -0000 1.58 --- message.py 25 Nov 2004 23:19:04 -0000 1.59 *************** *** 81,84 **** --- 81,85 ---- import os + import sys import types import math *************** *** 86,90 **** import errno import shelve ! import pickle import traceback --- 87,94 ---- import errno import shelve ! try: ! import cPickle as pickle ! except ImportError: ! import pickle import traceback *************** *** 110,125 **** self.db_name = db_name def load_msg(self, msg): if self.db is not None: try: ! attributes = self.db[msg.getId()] except KeyError: ! pass else: if not isinstance(attributes, types.ListType): ! # Old-style message info db, which only ! # handles storing 'c' and 't'. ! (msg.c, msg.t) = attributes ! return for att, val in attributes: setattr(msg, att, val) --- 114,152 ---- self.db_name = db_name + def __len__(self): + return len(self.db) + def load_msg(self, msg): if self.db is not None: try: ! try: ! attributes = self.db[msg.getDBKey()] ! except pickle.UnpicklingError: ! # The old-style Outlook message info db didn't use ! # shelve, so get it straight from the dbm. ! if hasattr(self, "dbm"): ! attributes = self.dbm[msg.getDBKey()] ! else: ! raise except KeyError: ! # Set to None, as it's not there. ! for att in msg.stored_attributes: ! setattr(msg, att, None) else: if not isinstance(attributes, types.ListType): ! # Old-style message info db ! if isinstance(attributes, types.TupleType): ! # sb_server/sb_imapfilter, which only handled ! # storing 'c' and 't'. ! (msg.c, msg.t) = attributes ! return ! elif isinstance(attributes, types.StringTypes): ! # Outlook plug-in, which only handled storing 't', ! # and did it as a string. ! msg.t = {"0" : False, "1" : True}[attributes] ! return ! else: ! print >> sys.stderr, "Unknown message info type" ! sys.exit(1) for att, val in attributes: setattr(msg, att, val) *************** *** 130,139 **** for att in msg.stored_attributes: attributes.append((att, getattr(msg, att))) ! self.db[msg.getId()] = attributes self.store() def remove_msg(self, msg): if self.db is not None: ! del self.db[msg.getId()] self.store() --- 157,166 ---- for att in msg.stored_attributes: attributes.append((att, getattr(msg, att))) ! self.db[msg.getDBKey()] = attributes self.store() def remove_msg(self, msg): if self.db is not None: ! del self.db[msg.getDBKey()] self.store() *************** *** 241,244 **** --- 268,272 ---- self.message_info_db = open_storage(nm, typ) self.stored_attributes = ['c', 't',] + self.getDBKey = self.getId self.id = None self.c = None From anadelonbrin at users.sourceforge.net Fri Nov 26 00:27:02 2004 From: anadelonbrin at users.sourceforge.net (Tony Meyer) Date: Fri Nov 26 00:27:05 2004 Subject: [Spambayes-checkins] spambayes/Outlook2000 addin.py, 1.138, 1.139 manager.py, 1.98, 1.99 msgstore.py, 1.88, 1.89 tester.py, 1.23, 1.24 train.py, 1.39, 1.40 Message-ID: Update of /cvsroot/spambayes/spambayes/Outlook2000 In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv23924/Outlook2000 Modified Files: addin.py manager.py msgstore.py tester.py train.py Log Message: Stop using the deprecated access to the bayes database and use the manager.classifier_data directly. Switch to using a spambayes.message.MessageInfo database rather than an Outlook specific one. This allows us to store more data than just the 'trained' status that we currently store, and also reduces code duplication and simplifies the Outlook code a little bit. I have tested this as much as possible, and run it for a couple of days here and it appears to work. The old database should still be usable (both old style and new style data can be in the same database) and work, so it should be seemless. The change is more-or-less the same as when the sb_server/sb_imapfilter database swapped to storing more than just 'c' and 't', and there weren't problems there, so fingers crossed... Index: addin.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/Outlook2000/addin.py,v retrieving revision 1.138 retrieving revision 1.139 diff -C2 -d -r1.138 -r1.139 *** addin.py 17 Nov 2004 00:01:06 -0000 1.138 --- addin.py 25 Nov 2004 23:26:57 -0000 1.139 *************** *** 125,130 **** # If the message has been trained on, we certainly have seen it before. import train ! if train.been_trained_as_ham(msgstore_message, manager.classifier_data) or \ ! train.been_trained_as_spam(msgstore_message, manager.classifier_data): return True # I considered checking if the "save spam score" option is enabled - but --- 125,131 ---- # If the message has been trained on, we certainly have seen it before. import train ! manager.classifier_data.message_db.load_msg(msgstore_message) ! if train.been_trained_as_ham(msgstore_message) or \ ! train.been_trained_as_spam(msgstore_message): return True # I considered checking if the "save spam score" option is enabled - but *************** *** 149,155 **** else: print "already was trained as good" ! assert train.been_trained_as_ham(msgstore_message, manager.classifier_data) if save_db: ! manager.SaveBayesPostIncrementalTrain() def TrainAsSpam(msgstore_message, manager, rescore = True, save_db = True): --- 150,157 ---- else: print "already was trained as good" ! manager.classifier_data.message_db.load_msg(msgstore_message) ! assert train.been_trained_as_ham(msgstore_message) if save_db: ! manager.classifier_data.SavePostIncrementalTrain() def TrainAsSpam(msgstore_message, manager, rescore = True, save_db = True): *************** *** 167,174 **** else: print "already was trained as spam" ! assert train.been_trained_as_spam(msgstore_message, manager.classifier_data) # And if the DB can save itself incrementally, do it now if save_db: ! manager.SaveBayesPostIncrementalTrain() # Function to filter a message - note it is a msgstore msg, not an --- 169,177 ---- else: print "already was trained as spam" ! manager.classifier_data.message_db.load_msg(msgstore_message) ! assert train.been_trained_as_spam(msgstore_message) # And if the DB can save itself incrementally, do it now if save_db: ! manager.classifier_data.SavePostIncrementalTrain() # Function to filter a message - note it is a msgstore msg, not an *************** *** 190,194 **** if manager.config.training.train_recovered_spam: import train ! if train.been_trained_as_spam(msgstore_message, manager.classifier_data): need_train = True else: --- 193,198 ---- if manager.config.training.train_recovered_spam: import train ! manager.classifier_data.message_db.load_msg(msgstore_message) ! if train.been_trained_as_spam(msgstore_message): need_train = True else: *************** *** 200,204 **** # 'Unsure', then this event is unlikely to be the user # re-classifying (and in fact it may simply be the Outlook ! # rules moving the item. need_train = manager.config.filter.unsure_threshold < prop * 100 --- 204,208 ---- # 'Unsure', then this event is unlikely to be the user # re-classifying (and in fact it may simply be the Outlook ! # rules moving the item). need_train = manager.config.filter.unsure_threshold < prop * 100 *************** *** 422,426 **** # previously trained, try and optimize. import train ! if train.been_trained_as_ham(msgstore_message, self.manager.classifier_data): need_train = True else: --- 426,431 ---- # previously trained, try and optimize. import train ! self.manager.classifier_data.message_db.load_msg(msgstore_message) ! if train.been_trained_as_ham(msgstore_message): need_train = True else: *************** *** 441,444 **** --- 446,450 ---- if msgstore_message is None: return + mgr.classifier_data.message_db.load_msg(msgstore_message) item = msgstore_message.GetOutlookItem() *************** *** 479,486 **** # Report whether this message has been trained or not. push("
\n") - trained_as = mgr.classifier_data.message_db.get(msgstore_message.searchkey) push("This message has %sbeen trained%s." % \ ! {'0' : ("", " as ham"), '1' : ("", " as spam"), None : ("not ", "")} ! [trained_as]) # Format the clues. push("

%s Significant Tokens

\n
" % len(clues))
--- 485,491 ----
      # Report whether this message has been trained or not.
      push("
\n") push("This message has %sbeen trained%s." % \ ! {False : ("", " as ham"), True : ("", " as spam"), ! None : ("not ", "")}[msgstore_message.t]) # Format the clues. push("

%s Significant Tokens

\n
" % len(clues))
***************
*** 707,711 ****
              # but we are smart enough to know we have already done it.
          # And if the DB can save itself incrementally, do it now
!         self.manager.SaveBayesPostIncrementalTrain()
          SetWaitCursor(0)
  
--- 712,716 ----
              # but we are smart enough to know we have already done it.
          # And if the DB can save itself incrementally, do it now
!         self.manager.classifier_data.SavePostIncrementalTrain()
          SetWaitCursor(0)
  
***************
*** 774,778 ****
              # but we are smart enough to know we have already done it.
          # And if the DB can save itself incrementally, do it now
!         self.manager.SaveBayesPostIncrementalTrain()
          SetWaitCursor(0)
  
--- 779,783 ----
              # but we are smart enough to know we have already done it.
          # And if the DB can save itself incrementally, do it now
!         self.manager.classifier_data.SavePostIncrementalTrain()
          SetWaitCursor(0)
  

Index: manager.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/Outlook2000/manager.py,v
retrieving revision 1.98
retrieving revision 1.99
diff -C2 -d -r1.98 -r1.99
*** manager.py	2 Nov 2004 21:33:46 -0000	1.98
--- manager.py	25 Nov 2004 23:26:58 -0000	1.99
***************
*** 118,122 ****
  
  def import_core_spambayes_stuff(ini_filenames):
!     global bayes_classifier, bayes_tokenize, bayes_storage, bayes_options
      if "spambayes.Options" in sys.modules:
          # The only thing we are worried about here is spambayes.Options
--- 118,123 ----
  
  def import_core_spambayes_stuff(ini_filenames):
!     global bayes_classifier, bayes_tokenize, bayes_storage, bayes_options, \
!            bayes_message
      if "spambayes.Options" in sys.modules:
          # The only thing we are worried about here is spambayes.Options
***************
*** 144,150 ****
--- 145,153 ----
      from spambayes.tokenizer import tokenize
      from spambayes import storage
+     from spambayes import message
      bayes_classifier = classifier
      bayes_tokenize = tokenize
      bayes_storage = storage
+     bayes_message = message
      assert "spambayes.Options" in sys.modules, \
          "Expected 'spambayes.Options' to be loaded here"
***************
*** 170,174 ****
  # Base class for our "storage manager" - we choose between the pickle
  # and DB versions at runtime.  As our bayes uses spambayes.storage,
! # our base class can share common bayes loading code.
  class BasicStorageManager:
      db_extension = None # for pychecker - overwritten by subclass
--- 173,179 ----
  # Base class for our "storage manager" - we choose between the pickle
  # and DB versions at runtime.  As our bayes uses spambayes.storage,
! # our base class can share common bayes loading code, and we use
! # spambayes.message, so the base class can share common message info
! # code, too.
  class BasicStorageManager:
      db_extension = None # for pychecker - overwritten by subclass
***************
*** 186,205 ****
          bayes.store()
      def open_bayes(self):
!         raise NotImplementedError
      def close_bayes(self, bayes):
          bayes.close()
  
  class PickleStorageManager(BasicStorageManager):
      db_extension = ".pck"
!     def open_bayes(self):
!         return bayes_storage.PickledClassifier(self.bayes_filename)
!     def open_mdb(self):
!         return cPickle.load(open(self.mdb_filename, 'rb'))
      def new_mdb(self):
          return {}
-     def store_mdb(self, mdb):
-         SavePickle(mdb, self.mdb_filename)
-     def close_mdb(self, mdb):
-         pass
      def is_incremental(self):
          return False # False means we always save the entire DB
--- 191,209 ----
          bayes.store()
      def open_bayes(self):
!         return bayes_storage.open_storage(self.bayes_filename, self.klass)
      def close_bayes(self, bayes):
          bayes.close()
+     def open_mdb(self):
+         return bayes_message.open_storage(self.mdb_filename, self.klass)
+     def store_mdb(self, mdb):
+         mdb.store()
+     def close_mdb(self, mdb):
+         mdb.close()
  
  class PickleStorageManager(BasicStorageManager):
      db_extension = ".pck"
!     klass = "pickle"
      def new_mdb(self):
          return {}
      def is_incremental(self):
          return False # False means we always save the entire DB
***************
*** 207,217 ****
  class DBStorageManager(BasicStorageManager):
      db_extension = ".db"
!     def open_bayes(self):
!         # bsddb doesn't handle unicode filenames yet :(
!         fname = self.bayes_filename.encode(filesystem_encoding)
!         return bayes_storage.DBDictClassifier(fname)
!     def open_mdb(self):
!         fname = self.mdb_filename.encode(filesystem_encoding)
!         return bsddb.hashopen(fname)
      def new_mdb(self):
          try:
--- 211,220 ----
  class DBStorageManager(BasicStorageManager):
      db_extension = ".db"
!     klass = "dbm"
!     def __init__(self, bayes_base_name, mdb_base_name):
!         self.bayes_filename = bayes_base_name.encode(filesystem_encoding) + \
!                               self.db_extension
!         self.mdb_filename = mdb_base_name.encode(filesystem_encoding) + \
!                             self.db_extension
      def new_mdb(self):
          try:
***************
*** 220,227 ****
              if e.errno != errno.ENOENT: raise
          return self.open_mdb()
-     def store_mdb(self, mdb):
-         mdb.sync()
-     def close_mdb(self, mdb):
-         mdb.close()
      def is_incremental(self):
          return True # True means only changed records get actually written
--- 223,226 ----
***************
*** 424,432 ****
          db_manager = ManagerClass(bayes_base, mdb_base)
          self.classifier_data = ClassifierData(db_manager, self)
-         self.LoadBayes()
-         self.stats = oastats.Stats(self.config, self.data_directory)
- 
-     # "old" bayes functions - new code should use "classifier_data" directly
-     def LoadBayes(self):
          try:
              self.classifier_data.Load()
--- 423,426 ----
***************
*** 434,445 ****
              self.ReportFatalStartupError("Failed to load bayes database")
              self.classifier_data.InitNew()
  
!     def InitNewBayes(self):
!         self.classifier_data.InitNew()
!     def SaveBayes(self):
!         self.classifier_data.Save()
!     def SaveBayesPostIncrementalTrain(self):
!         self.classifier_data.SavePostIncrementalTrain()
!     # Logging - this too should be somewhere else.
      def LogDebug(self, level, *args):
          if self.verbose >= level:
--- 428,434 ----
              self.ReportFatalStartupError("Failed to load bayes database")
              self.classifier_data.InitNew()
+         self.stats = oastats.Stats(self.config, self.data_directory)
  
!     # Logging - this should be somewhere else.
      def LogDebug(self, level, *args):
          if self.verbose >= level:

Index: msgstore.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/Outlook2000/msgstore.py,v
retrieving revision 1.88
retrieving revision 1.89
diff -C2 -d -r1.88 -r1.89
*** msgstore.py	2 Nov 2004 21:34:56 -0000	1.88
--- msgstore.py	25 Nov 2004 23:26:58 -0000	1.89
***************
*** 807,810 ****
--- 807,817 ----
          self.dirty = False
  
+         # For use with the spambayes.message messageinfo database.
+         self.stored_attributes = ['t',]
+ 
+     def getDBKey(self):
+         # Long lived search key.
+         return self.searchkey
+ 
      def __repr__(self):
          if self.id is None:

Index: tester.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/Outlook2000/tester.py,v
retrieving revision 1.23
retrieving revision 1.24
diff -C2 -d -r1.23 -r1.24
*** tester.py	24 Dec 2003 04:08:38 -0000	1.23
--- tester.py	25 Nov 2004 23:26:58 -0000	1.24
***************
*** 258,265 ****
          # Now move the message back to the inbox - it should get trained.
          store_msg = driver.manager.message_store.GetMessage(spam_msg)
          import train
!         if train.been_trained_as_ham(store_msg, driver.manager.classifier_data):
              TestFailed("This new spam message should not have been trained as ham yet")
!         if train.been_trained_as_spam(store_msg, driver.manager.classifier_data):
              TestFailed("This new spam message should not have been trained as spam yet")
          spam_msg.Move(folder_watch)
--- 258,266 ----
          # Now move the message back to the inbox - it should get trained.
          store_msg = driver.manager.message_store.GetMessage(spam_msg)
+         driver.manager.classifier_data.message_db.load_msg(store_msg)
          import train
!         if train.been_trained_as_ham(store_msg):
              TestFailed("This new spam message should not have been trained as ham yet")
!         if train.been_trained_as_spam(store_msg):
              TestFailed("This new spam message should not have been trained as spam yet")
          spam_msg.Move(folder_watch)
***************
*** 269,272 ****
--- 270,274 ----
              TestFailed("The message appears to have been filtered out of the watch folder")
          store_msg = driver.manager.message_store.GetMessage(spam_msg)
+         driver.manager.classifier_data.message_db.load_msg(store_msg)
          need_untrain = True
          try:
***************
*** 275,281 ****
              if nham+1 != bayes.nham:
                  TestFailed("There was not one more ham messages after a re-train")
!             if train.been_trained_as_spam(store_msg, driver.manager.classifier_data):
                  TestFailed("This new spam message should not have been trained as spam yet")
!             if not train.been_trained_as_ham(store_msg, driver.manager.classifier_data):
                  TestFailed("This new spam message should have been trained as ham now")
              # word infos should have one extra ham
--- 277,283 ----
              if nham+1 != bayes.nham:
                  TestFailed("There was not one more ham messages after a re-train")
!             if train.been_trained_as_spam(store_msg):
                  TestFailed("This new spam message should not have been trained as spam yet")
!             if not train.been_trained_as_ham(store_msg):
                  TestFailed("This new spam message should have been trained as ham now")
              # word infos should have one extra ham
***************
*** 289,299 ****
                  TestFailed("Could not find the message in the Spam folder")
              store_msg = driver.manager.message_store.GetMessage(spam_msg)
              if nspam +1 != bayes.nspam:
                  TestFailed("There should be one more spam now")
              if nham != bayes.nham:
                  TestFailed("There should be the same number of hams again")
!             if not train.been_trained_as_spam(store_msg, driver.manager.classifier_data):
                  TestFailed("This new spam message should have been trained as spam by now")
!             if train.been_trained_as_ham(store_msg, driver.manager.classifier_data):
                  TestFailed("This new spam message should have been un-trained as ham")
              # word infos should have one extra spam, no extra ham
--- 291,302 ----
                  TestFailed("Could not find the message in the Spam folder")
              store_msg = driver.manager.message_store.GetMessage(spam_msg)
+             driver.manager.classifier_data.message_db.load_msg(store_msg)
              if nspam +1 != bayes.nspam:
                  TestFailed("There should be one more spam now")
              if nham != bayes.nham:
                  TestFailed("There should be the same number of hams again")
!             if not train.been_trained_as_spam(store_msg):
                  TestFailed("This new spam message should have been trained as spam by now")
!             if train.been_trained_as_ham(store_msg):
                  TestFailed("This new spam message should have been un-trained as ham")
              # word infos should have one extra spam, no extra ham
***************
*** 308,312 ****
                  TestFailed("Could not find the message in the Unsure folder")
              store_msg = driver.manager.message_store.GetMessage(spam_msg)
!             if not train.been_trained_as_spam(store_msg, driver.manager.classifier_data):
                  TestFailed("Message was not identified as Spam after moving")
  
--- 311,316 ----
                  TestFailed("Could not find the message in the Unsure folder")
              store_msg = driver.manager.message_store.GetMessage(spam_msg)
!             driver.manager.classifier_data.message_db.load_msg(store_msg)
!             if not train.been_trained_as_spam(store_msg):
                  TestFailed("Message was not identified as Spam after moving")
  
***************
*** 316,323 ****
              # Now undo the damage we did.
              was_spam = train.untrain_message(store_msg, driver.manager.classifier_data)
              if not was_spam:
                  TestFailed("Untraining this message did not indicate it was spam")
!             if train.been_trained_as_spam(store_msg, driver.manager.classifier_data) or \
!                train.been_trained_as_ham(store_msg, driver.manager.classifier_data):
                  TestFailed("Untraining this message kept it has ham/spam")
              need_untrain = False
--- 320,328 ----
              # Now undo the damage we did.
              was_spam = train.untrain_message(store_msg, driver.manager.classifier_data)
+             driver.manager.classifier_data.message_db.load_msg(store_msg)
              if not was_spam:
                  TestFailed("Untraining this message did not indicate it was spam")
!             if train.been_trained_as_spam(store_msg) or \
!                train.been_trained_as_ham(store_msg):
                  TestFailed("Untraining this message kept it has ham/spam")
              need_untrain = False

Index: train.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/Outlook2000/train.py,v
retrieving revision 1.39
retrieving revision 1.40
diff -C2 -d -r1.39 -r1.40
*** train.py	2 Nov 2004 21:36:54 -0000	1.39
--- train.py	25 Nov 2004 23:26:58 -0000	1.40
***************
*** 5,8 ****
--- 5,9 ----
  # Copyright PSF, license under the PSF license
  
+ import sys
  import traceback
  from win32com.mapi import mapi
***************
*** 17,29 ****
  # Note our Message Database uses PR_SEARCH_KEY, *not* PR_ENTRYID, as the
  # latter changes after a Move operation - see msgstore.py
! def been_trained_as_ham(msg, cdata):
!     if not cdata.message_db.has_key(msg.searchkey):
          return False
!     return cdata.message_db[msg.searchkey]=='0'
  
! def been_trained_as_spam(msg, cdata):
!     if not cdata.message_db.has_key(msg.searchkey):
          return False
!     return cdata.message_db[msg.searchkey]=='1'
  
  def train_message(msg, is_spam, cdata):
--- 18,30 ----
  # Note our Message Database uses PR_SEARCH_KEY, *not* PR_ENTRYID, as the
  # latter changes after a Move operation - see msgstore.py
! def been_trained_as_ham(msg):
!     if msg.t is None:
          return False
!     return msg.t == False
  
! def been_trained_as_spam(msg):
!     if msg.t is None:
          return False
!     return msg.t == True
  
  def train_message(msg, is_spam, cdata):
***************
*** 36,43 ****
      from spambayes.tokenizer import tokenize
  
!     if not cdata.message_db.has_key(msg.searchkey):
!         was_spam = None
!     else:
!         was_spam = cdata.message_db[msg.searchkey]=='1'
      if was_spam == is_spam:
          return False    # already correctly classified
--- 37,42 ----
      from spambayes.tokenizer import tokenize
  
!     cdata.message_db.load_msg(msg)
!     was_spam = msg.t
      if was_spam == is_spam:
          return False    # already correctly classified
***************
*** 51,55 ****
      # Learn the correct classification.
      cdata.bayes.learn(tokenize(stream), is_spam)
!     cdata.message_db[msg.searchkey] = ['0', '1'][is_spam]
      cdata.dirty = True
      return True
--- 50,55 ----
      # Learn the correct classification.
      cdata.bayes.learn(tokenize(stream), is_spam)
!     msg.t = is_spam
!     cdata.message_db.store_msg(msg)
      cdata.dirty = True
      return True
***************
*** 62,75 ****
      from spambayes.tokenizer import tokenize
      stream = msg.GetEmailPackageObject()
!     if been_trained_as_spam(msg, cdata):
!         assert not been_trained_as_ham(msg, cdata), "Can't have been both!"
          cdata.bayes.unlearn(tokenize(stream), True)
!         del cdata.message_db[msg.searchkey]
          cdata.dirty = True
          return True
!     if been_trained_as_ham(msg, cdata):
!         assert not been_trained_as_spam(msg, cdata), "Can't have been both!"
          cdata.bayes.unlearn(tokenize(stream), False)
!         del cdata.message_db[msg.searchkey]
          cdata.dirty = True
          return False
--- 62,76 ----
      from spambayes.tokenizer import tokenize
      stream = msg.GetEmailPackageObject()
!     cdata.message_db.load_msg(msg)
!     if been_trained_as_spam(msg):
!         assert not been_trained_as_ham(msg), "Can't have been both!"
          cdata.bayes.unlearn(tokenize(stream), True)
!         cdata.message_db.remove_msg(msg)
          cdata.dirty = True
          return True
!     if been_trained_as_ham(msg):
!         assert not been_trained_as_spam(msg), "Can't have been both!"
          cdata.bayes.unlearn(tokenize(stream), False)
!         cdata.message_db.remove_msg(msg)
          cdata.dirty = True
          return False

From anadelonbrin at users.sourceforge.net  Fri Nov 26 04:06:47 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Fri Nov 26 04:06:49 2004
Subject: [Spambayes-checkins] spambayes/Outlook2000 README.txt,1.12,1.13
Message-ID: 

Update of /cvsroot/spambayes/spambayes/Outlook2000
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv543/Outlook2000

Modified Files:
	README.txt 
Log Message:
Encourage people to mail the list, not Mark personally.

Index: README.txt
===================================================================
RCS file: /cvsroot/spambayes/spambayes/Outlook2000/README.txt,v
retrieving revision 1.12
retrieving revision 1.13
diff -C2 -d -r1.12 -r1.13
*** README.txt	3 Oct 2003 05:23:15 -0000	1.12
--- README.txt	26 Nov 2004 03:06:43 -0000	1.13
***************
*** 35,39 ****
  labyrinth of Outlook preference dialogs.)  If this happens and you have
  the Python exception that caused the failure (via the tracing mentioned 
! above) please send it to Mark.
  
  To unregister the addin, execute "addin.py --unregister", then optionally
--- 35,39 ----
  labyrinth of Outlook preference dialogs.)  If this happens and you have
  the Python exception that caused the failure (via the tracing mentioned 
! above) please send it to spambayes@python.org.
  
  To unregister the addin, execute "addin.py --unregister", then optionally

From anadelonbrin at users.sourceforge.net  Fri Nov 26 04:11:46 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Fri Nov 26 04:11:49 2004
Subject: [Spambayes-checkins] spambayes/Outlook2000 addin.py, 1.139,
	1.140 filter.py, 1.39, 1.40 msgstore.py, 1.89, 1.90
Message-ID: 

Update of /cvsroot/spambayes/spambayes/Outlook2000
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv1423/Outlook2000

Modified Files:
	addin.py filter.py msgstore.py 
Log Message:
Save the current folder when doing a "delete as spam", because the message may not
 be in the folder it was when it was filtered, or it may not have been filtered, but
 we do really want to recover it to wherever it was last.

Save the original folder data in the messageinfo database as well.  This does mean
 it'll end up being somewhat larger - if this is a problem, then it could be an option.
  However, it does mean that recovery now goes to the right place, even with an IMAP
 store.  So closes [ 1071319 ] Outlook plug in for IMAP boxes

Index: addin.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/Outlook2000/addin.py,v
retrieving revision 1.139
retrieving revision 1.140
diff -C2 -d -r1.139 -r1.140
*** addin.py	25 Nov 2004 23:26:57 -0000	1.139
--- addin.py	26 Nov 2004 03:11:43 -0000	1.140
***************
*** 692,695 ****
--- 692,698 ----
              self.manager.stats.RecordManualClassification(False,
                                      self.manager.score(msgstore_message))
+             # Record the original folder, in case this message is not where
+             # it was after filtering, or has never been filtered.
+             msgstore_message.RememberMessageCurrentFolder()
              # Must train before moving, else we lose the message!
              subject = msgstore_message.GetSubject()
***************
*** 747,750 ****
--- 750,754 ----
              try:
                  subject = msgstore_message.GetSubject()
+                 self.manager.classifier_data.message_db.load_msg(msgstore_message)
                  restore_folder = msgstore_message.GetRememberedFolder()
                  if restore_folder is None or \

Index: filter.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/Outlook2000/filter.py,v
retrieving revision 1.39
retrieving revision 1.40
diff -C2 -d -r1.39 -r1.40
*** filter.py	2 Nov 2004 21:33:46 -0000	1.39
--- filter.py	26 Nov 2004 03:11:43 -0000	1.40
***************
*** 44,47 ****
--- 44,48 ----
                          if all_actions:
                              msg.RememberMessageCurrentFolder()
+                             mgr.classifier_data.message_db.store_msg(msg)
                          msg.Save()
                          break

Index: msgstore.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/Outlook2000/msgstore.py,v
retrieving revision 1.89
retrieving revision 1.90
diff -C2 -d -r1.89 -r1.90
*** msgstore.py	25 Nov 2004 23:26:58 -0000	1.89
--- msgstore.py	26 Nov 2004 03:11:43 -0000	1.90
***************
*** 808,812 ****
  
          # For use with the spambayes.message messageinfo database.
!         self.stored_attributes = ['t',]
  
      def getDBKey(self):
--- 808,814 ----
  
          # For use with the spambayes.message messageinfo database.
!         self.stored_attributes = ['t', 'original_folder']
!         self.t = None
!         self.original_folder = None
  
      def getDBKey(self):
***************
*** 1244,1247 ****
--- 1246,1252 ----
          try:
              folder = self.GetFolder()
+             # Also save this information in our messageinfo database, which
+             # means that restoring should work even with IMAP.
+             self.original_folder = folder.id[0], folder.id[1]
              props = ( (mapi.PS_PUBLIC_STRINGS, "SpamBayesOriginalFolderStoreID"),
                        (mapi.PS_PUBLIC_STRINGS, "SpamBayesOriginalFolderID")
***************
*** 1274,1277 ****
--- 1279,1285 ----
              return self.msgstore.GetFolder(folder_id)
          except:
+             # Try to get it from the message info database, if possible
+             if self.original_folder:
+                 return self.msgstore.GetFolder(self.original_folder)
              print "Error locating origin of message", self
              return None

From anadelonbrin at users.sourceforge.net  Mon Nov 29 00:38:19 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Mon Nov 29 00:38:24 2004
Subject: [Spambayes-checkins] spambayes/spambayes ProxyUI.py,1.52,1.53
Message-ID: 

Update of /cvsroot/spambayes/spambayes/spambayes
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv29550/spambayes

Modified Files:
	ProxyUI.py 
Log Message:
Fix error reported by Hatuka*nezumi:

Subject lines are not cgi.escape()d in the web interface, which might cause errors.

Index: ProxyUI.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/spambayes/ProxyUI.py,v
retrieving revision 1.52
retrieving revision 1.53
diff -C2 -d -r1.52 -r1.53
*** ProxyUI.py	9 Nov 2004 02:37:41 -0000	1.52
--- ProxyUI.py	28 Nov 2004 23:38:17 -0000	1.53
***************
*** 340,344 ****
                  else:
                      h = self.html.reviewRow.headerValue.clone()
!                 h.text = text
                  row.optionalHeadersValues += h
  
--- 340,344 ----
                  else:
                      h = self.html.reviewRow.headerValue.clone()
!                 h.text = cgi.escape(text)
                  row.optionalHeadersValues += h
  

From anadelonbrin at users.sourceforge.net  Mon Nov 29 01:11:50 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Mon Nov 29 01:11:53 2004
Subject: [Spambayes-checkins] 
	spambayes/spambayes/test test_sb_server.py, 1.2, 1.3
Message-ID: 

Update of /cvsroot/spambayes/spambayes/spambayes/test
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv6041/spambayes/test

Modified Files:
	test_sb_server.py 
Log Message:
Fix error reported by Hatuka*nezumi:

Messages that did not have the required \r?\n\r?\n separator would just pass through
 spambayes unproxied.  Change this so that they are (everything will be a header,
 though, so it may not proxy well, and might generate and exception header - but that's
 better than just letting it through.

Index: test_sb_server.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/spambayes/test/test_sb_server.py,v
retrieving revision 1.2
retrieving revision 1.3
diff -C2 -d -r1.2 -r1.3
*** test_sb_server.py	9 Nov 2004 02:37:41 -0000	1.2
--- test_sb_server.py	29 Nov 2004 00:11:47 -0000	1.3
***************
*** 77,80 ****
--- 77,83 ----
  """
  
+ malformed1 = """From: ta-meyer@ihug.co.nz
+ Subject: No body, and no separator"""
+ 
  import asyncore
  import socket
***************
*** 123,127 ****
          Dibbler.BrighterAsyncChat.__init__(self, map=socketMap)
          Dibbler.BrighterAsyncChat.set_socket(self, clientSocket, socketMap)
!         self.maildrop = [spam1, good1]
          self.set_terminator('\r\n')
          self.okCommands = ['USER', 'PASS', 'APOP', 'NOOP', 'SLOW',
--- 126,130 ----
          Dibbler.BrighterAsyncChat.__init__(self, map=socketMap)
          Dibbler.BrighterAsyncChat.set_socket(self, clientSocket, socketMap)
!         self.maildrop = [spam1, good1, malformed1]
          self.set_terminator('\r\n')
          self.okCommands = ['USER', 'PASS', 'APOP', 'NOOP', 'SLOW',
***************
*** 219,223 ****
          if 0 < number <= len(self.maildrop):
              message = self.maildrop[number-1]
!             headers, body = message.split('\n\n', 1)
              bodyLines = body.split('\n')[:maxLines]
              message = headers + '\r\n\r\n' + '\n'.join(bodyLines)
--- 222,229 ----
          if 0 < number <= len(self.maildrop):
              message = self.maildrop[number-1]
!             try:
!                 headers, body = message.split('\n\n', 1)
!             except ValueError:
!                 return "+OK\r\n%s\r\n.\r\n" % message
              bodyLines = body.split('\n')[:maxLines]
              message = headers + '\r\n\r\n' + '\n'.join(bodyLines)
***************
*** 314,318 ****
      response = proxy.recv(100)
      count, totalSize = map(int, response.split()[1:3])
!     assert count == 2
  
      # Loop through the messages ensuring that they have judgement
--- 320,324 ----
      response = proxy.recv(100)
      count, totalSize = map(int, response.split()[1:3])
!     assert count == 3
  
      # Loop through the messages ensuring that they have judgement

From anadelonbrin at users.sourceforge.net  Mon Nov 29 01:11:50 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Mon Nov 29 01:11:54 2004
Subject: [Spambayes-checkins] spambayes/scripts sb_server.py,1.29,1.30
Message-ID: 

Update of /cvsroot/spambayes/spambayes/scripts
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv6041/scripts

Modified Files:
	sb_server.py 
Log Message:
Fix error reported by Hatuka*nezumi:

Messages that did not have the required \r?\n\r?\n separator would just pass through
 spambayes unproxied.  Change this so that they are (everything will be a header,
 though, so it may not proxy well, and might generate and exception header - but that's
 better than just letting it through.

Index: sb_server.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/scripts/sb_server.py,v
retrieving revision 1.29
retrieving revision 1.30
diff -C2 -d -r1.29 -r1.30
*** sb_server.py	23 Nov 2004 23:37:15 -0000	1.29
--- sb_server.py	29 Nov 2004 00:11:46 -0000	1.30
***************
*** 457,566 ****
          """Adds the judgement header based on the raw headers and body
          of the message."""
!         # Use '\n\r?\n' to detect the end of the headers in case of
!         # broken emails that don't use the proper line separators.
!         if re.search(r'\n\r?\n', response):
!             # Remove the trailing .\r\n before passing to the email parser.
!             # Thanks to Scott Schlesier for this fix.
!             terminatingDotPresent = (response[-4:] == '\n.\r\n')
!             if terminatingDotPresent:
!                 response = response[:-3]
  
!             # Break off the first line, which will be '+OK'.
!             ok, messageText = response.split('\n', 1)
  
!             try:
!                 msg = email.message_from_string(messageText,
!                           _class=spambayes.message.SBHeaderMessage)
!                 msg.setId(state.getNewMessageName())
!                 # Now find the spam disposition and add the header.
!                 (prob, clues) = state.bayes.spamprob(msg.tokenize(),\
!                                  evidence=True)
  
!                 msg.addSBHeaders(prob, clues)
  
!                 # Check for "RETR" or "TOP N 99999999" - fetchmail without
!                 # the 'fetchall' option uses the latter to retrieve messages.
!                 if (command == 'RETR' or
!                     (command == 'TOP' and
!                      len(args) == 2 and args[1] == '99999999')):
!                     cls = msg.GetClassification()
!                     if cls == options["Headers", "header_ham_string"]:
!                         state.numHams += 1
!                     elif cls == options["Headers", "header_spam_string"]:
!                         state.numSpams += 1
!                     else:
!                         state.numUnsure += 1
  
!                     # Suppress caching of "Precedence: bulk" or
!                     # "Precedence: list" ham if the options say so.
!                     isSuppressedBulkHam = \
!                         (cls == options["Headers", "header_ham_string"] and
!                          options["Storage", "no_cache_bulk_ham"] and
!                          msg.get('precedence') in ['bulk', 'list'])
  
!                     # Suppress large messages if the options say so.
!                     size_limit = options["Storage",
!                                          "no_cache_large_messages"]
!                     isTooBig = size_limit > 0 and \
!                                len(messageText) > size_limit
  
!                     # Cache the message.  Don't pollute the cache with test
!                     # messages or suppressed bulk ham.
!                     if (not state.isTest and
!                         options["Storage", "cache_messages"] and
!                         not isSuppressedBulkHam and not isTooBig):
!                         # Write the message into the Unknown cache.
!                         makeMessage = state.unknownCorpus.makeMessage
!                         message = makeMessage(msg.getId(), msg.as_string())
!                         state.unknownCorpus.addMessage(message)
  
!                 # We'll return the message with the headers added.  We take
!                 # all the headers from the SBHeaderMessage, but take the body
!                 # directly from the POP3 conversation, because the
!                 # SBHeaderMessage might have "fixed" a partial message by
!                 # appending a closing boundary separator.  Remember we can
!                 # be dealing with partial message here because of the timeout
!                 # code in onServerLine.
!                 headers = []
!                 for name, value in msg.items():
!                     header = "%s: %s" % (name, value)
!                     headers.append(re.sub(r'\r?\n', '\r\n', header))
                  body = re.split(r'\n\r?\n', messageText, 1)[1]
                  messageText = "\r\n".join(headers) + "\r\n\r\n" + body
!             except:
!                 # Something nasty happened while parsing or classifying -
!                 # report the exception in a hand-appended header and recover.
!                 # This is one case where an unqualified 'except' is OK, 'cos
!                 # anything's better than destroying people's email...
!                 stream = cStringIO.StringIO()
!                 traceback.print_exc(None, stream)
!                 details = stream.getvalue()
! 
!                 # Build the header.  This will strip leading whitespace from
!                 # the lines, so we add a leading dot to maintain indentation.
!                 detailLines = details.strip().split('\n')
!                 dottedDetails = '\n.'.join(detailLines)
!                 headerName = 'X-Spambayes-Exception'
!                 header = Header(dottedDetails, header_name=headerName)
  
!                 # Insert the header, converting email.Header's '\n' line
!                 # breaks to POP3's '\r\n'.
!                 headers, body = re.split(r'\n\r?\n', messageText, 1)
!                 header = re.sub(r'\r?\n', '\r\n', str(header))
!                 headers += "\n%s: %s\r\n\r\n" % (headerName, header)
!                 messageText = headers + body
  
!                 # Print the exception and a traceback.
!                 print >>sys.stderr, details
  
!             # Restore the +OK and the POP3 .\r\n terminator if there was one.
!             retval = ok + "\n" + messageText
!             if terminatingDotPresent:
!                 retval += '.\r\n'
!             return retval
  
!         else:
!             # Must be an error response.
!             return response
  
      def onTop(self, command, args, response):
--- 457,578 ----
          """Adds the judgement header based on the raw headers and body
          of the message."""
!         # Previously, we used '\n\r?\n' to detect the end of the headers in
!         # case of broken emails that don't use the proper line separators,
!         # and if we couldn't find it, then we assumed that the response was
!         # and error response and passed it unfiltered.  However, if the
!         # message doesn't contain the separator (malformed mail), then this
!         # would mean the message was passed straight through the proxy.
!         # Since all the content is then in the headers, this probably
!         # doesn't do a spammer much good, but, just in case, we now just
!         # check for "+OK" and assume no error response will be given if
!         # that is (which seems reasonable).
!         # Remove the trailing .\r\n before passing to the email parser.
!         # Thanks to Scott Schlesier for this fix.
!         terminatingDotPresent = (response[-4:] == '\n.\r\n')
!         if terminatingDotPresent:
!             response = response[:-3]
  
!         # Break off the first line, which will be '+OK'.
!         ok, messageText = response.split('\n', 1)
!         if ok.strip().upper() != "+OK":
!             # Must be an error response.  Return unproxied.
!             return response
  
!         try:
!             msg = email.message_from_string(messageText,
!                       _class=spambayes.message.SBHeaderMessage)
!             msg.setId(state.getNewMessageName())
!             # Now find the spam disposition and add the header.
!             (prob, clues) = state.bayes.spamprob(msg.tokenize(),\
!                              evidence=True)
  
!             msg.addSBHeaders(prob, clues)
  
!             # Check for "RETR" or "TOP N 99999999" - fetchmail without
!             # the 'fetchall' option uses the latter to retrieve messages.
!             if (command == 'RETR' or
!                 (command == 'TOP' and
!                  len(args) == 2 and args[1] == '99999999')):
!                 cls = msg.GetClassification()
!                 if cls == options["Headers", "header_ham_string"]:
!                     state.numHams += 1
!                 elif cls == options["Headers", "header_spam_string"]:
!                     state.numSpams += 1
!                 else:
!                     state.numUnsure += 1
  
!                 # Suppress caching of "Precedence: bulk" or
!                 # "Precedence: list" ham if the options say so.
!                 isSuppressedBulkHam = \
!                     (cls == options["Headers", "header_ham_string"] and
!                      options["Storage", "no_cache_bulk_ham"] and
!                      msg.get('precedence') in ['bulk', 'list'])
  
!                 # Suppress large messages if the options say so.
!                 size_limit = options["Storage",
!                                      "no_cache_large_messages"]
!                 isTooBig = size_limit > 0 and \
!                            len(messageText) > size_limit
  
!                 # Cache the message.  Don't pollute the cache with test
!                 # messages or suppressed bulk ham.
!                 if (not state.isTest and
!                     options["Storage", "cache_messages"] and
!                     not isSuppressedBulkHam and not isTooBig):
!                     # Write the message into the Unknown cache.
!                     makeMessage = state.unknownCorpus.makeMessage
!                     message = makeMessage(msg.getId(), msg.as_string())
!                     state.unknownCorpus.addMessage(message)
  
!             # We'll return the message with the headers added.  We take
!             # all the headers from the SBHeaderMessage, but take the body
!             # directly from the POP3 conversation, because the
!             # SBHeaderMessage might have "fixed" a partial message by
!             # appending a closing boundary separator.  Remember we can
!             # be dealing with partial message here because of the timeout
!             # code in onServerLine.
!             headers = []
!             for name, value in msg.items():
!                 header = "%s: %s" % (name, value)
!                 headers.append(re.sub(r'\r?\n', '\r\n', header))
!             try:
                  body = re.split(r'\n\r?\n', messageText, 1)[1]
+             except IndexError:
+                 # No separator, so no body.  Bad message, but proxy it
+                 # through anyway (adding the missing separator).
+                 messageText = "\r\n".join(headers) + "\r\n\r\n"
+             else:
                  messageText = "\r\n".join(headers) + "\r\n\r\n" + body
!         except:
!             # Something nasty happened while parsing or classifying -
!             # report the exception in a hand-appended header and recover.
!             # This is one case where an unqualified 'except' is OK, 'cos
!             # anything's better than destroying people's email...
!             stream = cStringIO.StringIO()
!             traceback.print_exc(None, stream)
!             details = stream.getvalue()
  
!             # Build the header.  This will strip leading whitespace from
!             # the lines, so we add a leading dot to maintain indentation.
!             detailLines = details.strip().split('\n')
!             dottedDetails = '\n.'.join(detailLines)
!             headerName = 'X-Spambayes-Exception'
!             header = Header(dottedDetails, header_name=headerName)
  
!             # Insert the header, converting email.Header's '\n' line
!             # breaks to POP3's '\r\n'.
!             headers, body = re.split(r'\n\r?\n', messageText, 1)
!             header = re.sub(r'\r?\n', '\r\n', str(header))
!             headers += "\n%s: %s\r\n\r\n" % (headerName, header)
!             messageText = headers + body
  
!             # Print the exception and a traceback.
!             print >>sys.stderr, details
  
!         # Restore the +OK and the POP3 .\r\n terminator if there was one.
!         retval = ok + "\n" + messageText
!         if terminatingDotPresent:
!             retval += '.\r\n'
!         return retval
  
      def onTop(self, command, args, response):
***************
*** 656,659 ****
--- 668,672 ----
          if options["globals", "verbose"]:
              self.logFile = open('_pop3proxy.log', 'wb', 0)
+ 
          self.servers = []
          self.proxyPorts = []

From anadelonbrin at users.sourceforge.net  Mon Nov 29 01:18:02 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Mon Nov 29 01:18:04 2004
Subject: [Spambayes-checkins] spambayes/spambayes message.py,1.59,1.60
Message-ID: 

Update of /cvsroot/spambayes/spambayes/spambayes
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv7566/spambayes

Modified Files:
	message.py 
Log Message:
Handle message not having a proper separator in insert_exception_header.

Change sb_server to use the centralised insert_exception_header code.

Index: message.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/spambayes/message.py,v
retrieving revision 1.59
retrieving revision 1.60
diff -C2 -d -r1.59 -r1.60
*** message.py	25 Nov 2004 23:19:04 -0000	1.59
--- message.py	29 Nov 2004 00:17:59 -0000	1.60
***************
*** 539,546 ****
      # otherwise we might keep doing this message over and over again.
      # We also ensure that the line endings are /r/n as RFC822 requires.
!     headers, body = re.split(r'\n\r?\n', string_msg, 1)
      header = re.sub(r'\r?\n', '\r\n', str(header))
!     headers += "\n%s: %s\r\n" % \
!                (headerName, header)
      if msg_id:
          headers += "%s: %s\r\n" % \
--- 539,550 ----
      # otherwise we might keep doing this message over and over again.
      # We also ensure that the line endings are /r/n as RFC822 requires.
!     try:
!         headers, body = re.split(r'\n\r?\n', string_msg, 1)
!     except ValueError:
!         # No body - this is a bad message!
!         headers = string_msg
!         body = ""
      header = re.sub(r'\r?\n', '\r\n', str(header))
!     headers += "\n%s: %s\r\n" % (headerName, header)
      if msg_id:
          headers += "%s: %s\r\n" % \

From anadelonbrin at users.sourceforge.net  Mon Nov 29 01:18:02 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Mon Nov 29 01:18:05 2004
Subject: [Spambayes-checkins] spambayes/scripts sb_server.py,1.30,1.31
Message-ID: 

Update of /cvsroot/spambayes/spambayes/scripts
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv7566/scripts

Modified Files:
	sb_server.py 
Log Message:
Handle message not having a proper separator in insert_exception_header.

Change sb_server to use the centralised insert_exception_header code.

Index: sb_server.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/scripts/sb_server.py,v
retrieving revision 1.30
retrieving revision 1.31
diff -C2 -d -r1.30 -r1.31
*** sb_server.py	29 Nov 2004 00:11:46 -0000	1.30
--- sb_server.py	29 Nov 2004 00:17:58 -0000	1.31
***************
*** 549,569 ****
              # This is one case where an unqualified 'except' is OK, 'cos
              # anything's better than destroying people's email...
!             stream = cStringIO.StringIO()
!             traceback.print_exc(None, stream)
!             details = stream.getvalue()
! 
!             # Build the header.  This will strip leading whitespace from
!             # the lines, so we add a leading dot to maintain indentation.
!             detailLines = details.strip().split('\n')
!             dottedDetails = '\n.'.join(detailLines)
!             headerName = 'X-Spambayes-Exception'
!             header = Header(dottedDetails, header_name=headerName)
! 
!             # Insert the header, converting email.Header's '\n' line
!             # breaks to POP3's '\r\n'.
!             headers, body = re.split(r'\n\r?\n', messageText, 1)
!             header = re.sub(r'\r?\n', '\r\n', str(header))
!             headers += "\n%s: %s\r\n\r\n" % (headerName, header)
!             messageText = headers + body
  
              # Print the exception and a traceback.
--- 549,554 ----
              # This is one case where an unqualified 'except' is OK, 'cos
              # anything's better than destroying people's email...
!             messageText, details = spambayes.message.\
!                                    insert_exception_header(messageText)
  
              # Print the exception and a traceback.

From anadelonbrin at users.sourceforge.net  Tue Nov 30 07:00:42 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Tue Nov 30 07:00:45 2004
Subject: [Spambayes-checkins] website/sigs - New directory
Message-ID: 

Update of /cvsroot/spambayes/website/sigs
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv17161/sigs

Log Message:
Directory /cvsroot/spambayes/website/sigs added to the repository


From anadelonbrin at users.sourceforge.net  Tue Nov 30 07:02:30 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Tue Nov 30 07:02:32 2004
Subject: [Spambayes-checkins] website/sigs Makefile, NONE,
	1.1 spambayes-1.0.1.exe.asc, NONE,
	1.1 spambayes-1.0.1.tar.gz.asc, NONE,
	1.1 spambayes-1.0.1.zip.asc, NONE, 1.1
Message-ID: 

Update of /cvsroot/spambayes/website/sigs
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv17435/sigs

Added Files:
	Makefile spambayes-1.0.1.exe.asc spambayes-1.0.1.tar.gz.asc 
	spambayes-1.0.1.zip.asc 
Log Message:
Put OpenPGP sig's for released files on the website in this directory.

I think that by default the file permissions for new files will be wrong, so they
 have to be manually corrected - maybe this should be built into the makefile?

--- NEW FILE: Makefile ---
include ../scripts/make.rules
ROOT_DIR = ..
ROOT_OFFSET = sigs

--- NEW FILE: spambayes-1.0.1.exe.asc ---
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (MingW32)

iD8DBQBBrAMJVcUzCvI/cyoRArlUAKDyvXLO8fs2eTdO/bpLOFKIfmhaxACfaiWJ
47p7KL2Ov6kKnrCUbpfwXdM=
=Hb0X
-----END PGP SIGNATURE-----

--- NEW FILE: spambayes-1.0.1.tar.gz.asc ---
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (MingW32)

iD8DBQBBrAMRVcUzCvI/cyoRAowgAKC34ZWloV85YEnLZIwdik9HHBO2ogCgsSBE
nGX0sYG6yd7R9Lni+2r+5tc=
=E//e
-----END PGP SIGNATURE-----

--- NEW FILE: spambayes-1.0.1.zip.asc ---
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (MingW32)

iD8DBQBBrAMZVcUzCvI/cyoRAsX8AJ9duCjCjYUjJGYaJvB8JjXRzWcwhQCdFvUK
qU2vENJocIzrNoFW0ZFLQck=
=I45v
-----END PGP SIGNATURE-----

From anadelonbrin at users.sourceforge.net  Tue Nov 30 07:03:45 2004
From: anadelonbrin at users.sourceforge.net (Tony Meyer)
Date: Tue Nov 30 07:03:48 2004
Subject: [Spambayes-checkins] website download.ht,1.29,1.30
Message-ID: 

Update of /cvsroot/spambayes/website
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv17643

Modified Files:
	download.ht 
Log Message:
Update to include OpenPGP sigs, MD5 checksums and file sizes.

(Heavily based on the Python download pages).

Index: download.ht
===================================================================
RCS file: /cvsroot/spambayes/website/download.ht,v
retrieving revision 1.29
retrieving revision 1.30
diff -C2 -d -r1.29 -r1.30
*** download.ht	25 Nov 2004 06:38:15 -0000	1.29
--- download.ht	30 Nov 2004 06:03:35 -0000	1.30
***************
*** 42,45 ****
--- 42,91 ----
  

+

Files, MD5 checksums, signatures and sizes

+

The signatures below were generated with + GnuPG using release manager + Tony Meyer's + public key, which has a key id of F23F732A.

+ + + + +

You can import the release manager public keys by either downloading the + public key file and then running

+ +
% gpg --import TonyMeyer.asc
+ +

or by grabbing the key directly from the keyserver network by running + this command:

+ +
% gpg --recv-keys F23F732A
+ +

To verify the authenticity of the download, grab both the file(s) and the + signature(s) (above) and then run this command:

+ +
% gpg --verify spambayes-1.0.1.exe.asc
+ +

Note that you must use the name of the signature file, and you should + use the one that's appropriate to the download you're verifying.

+ +

These instructions are geared to GnuPG and command-line weenies. + Suggestions are welcome for other OpenPGP applications.

+

CVS Access

The code is currently available from sourceforge's CVS server - From anadelonbrin at users.sourceforge.net Tue Nov 30 07:04:27 2004 From: anadelonbrin at users.sourceforge.net (Tony Meyer) Date: Tue Nov 30 07:04:29 2004 Subject: [Spambayes-checkins] website Makefile,1.18,1.19 Message-ID: Update of /cvsroot/spambayes/website In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv17817 Modified Files: Makefile Log Message: Also sync the sigs directory. Index: Makefile =================================================================== RCS file: /cvsroot/spambayes/website/Makefile,v retrieving revision 1.18 retrieving revision 1.19 diff -C2 -d -r1.18 -r1.19 *** Makefile 14 Feb 2004 00:44:38 -0000 1.18 --- Makefile 30 Nov 2004 06:04:24 -0000 1.19 *************** *** 39,46 **** -cd apps; $(MAKE) install -cd download ; $(MAKE) install subdirs: cd apps; $(MAKE) ! cd download ; $(MAKE) ! --- 39,47 ---- -cd apps; $(MAKE) install -cd download ; $(MAKE) install + -cd sigs; $(MAKE) install subdirs: cd apps; $(MAKE) ! cd download ; $(MAKE) ! cd sigs ; $(MAKE) From anadelonbrin at users.sourceforge.net Tue Nov 30 07:05:48 2004 From: anadelonbrin at users.sourceforge.net (Tony Meyer) Date: Tue Nov 30 07:05:51 2004 Subject: [Spambayes-checkins] spambayes README-DEVEL.txt,1.15,1.16 Message-ID: Update of /cvsroot/spambayes/spambayes In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv18034 Modified Files: README-DEVEL.txt Log Message: Add instructions for generating OpenPGP sigs, MD5 checksums and file sizes. Add instructions for modifying __init__.py's __version__ after the release, based on the discussion on spambayes@python.org. Index: README-DEVEL.txt =================================================================== RCS file: /cvsroot/spambayes/spambayes/README-DEVEL.txt,v retrieving revision 1.15 retrieving revision 1.16 diff -C2 -d -r1.15 -r1.16 *** README-DEVEL.txt 25 Nov 2004 06:39:04 -0000 1.15 --- README-DEVEL.txt 30 Nov 2004 06:05:45 -0000 1.16 *************** *** 505,513 **** o Now commit spambayes/__init__.py and tag the whole checkout - see the existing tag names for the tag name format. o Update the website News, Download and Windows sections. o Update reply.txt in the website repository as needed (it specifies the ! latest version). Then let Tim, Barry, Tony, or Skip know that they need to ! update the autoresponder. ! Then announce the release on the mailing lists and watch the bug reports roll in. 8-) --- 505,536 ---- o Now commit spambayes/__init__.py and tag the whole checkout - see the existing tag names for the tag name format. + o Create MD5 checksums for the files, and update download.ht with these. + Tony uses wxChecksums (http://wxchecksums.sourceforge.net) for this, + but you could just do + >>> import md5 + >>> print md5.md5(file("spambayes-1.0.1.exe", "rb").read()).hexdigest() + o Calculate the sizes of the files, and update download.ht with these. + o Create OpenPGP/PGP signatures for the files. Using GnuPG: + % gpg -sab spambayes-1.0.1.zip + % gpg -sab spambayes-1.0.1.tar.gz + % gpg -sab spambayes-1.0.1.exe + Put the created *.asc files in the "sigs" directory of the website. + o If your public key isn't already linked to on the Download page, put + it there. o Update the website News, Download and Windows sections. o Update reply.txt in the website repository as needed (it specifies the ! latest version). Then let Tim, Barry, Tony, or Skip know that they need ! to update the autoresponder. ! o Run "make install version" in the website directory to push the new ! version file, so that "Check for new version" works. ! o Add '+' to the end of spambayes/__init__.py's __version__, to ! differentiate CVS users, and check this change in. After a number of ! changes have been checked in, this can be incremented and have "a0" ! added to the end. For example, with a 1.1 release: ! [before the release process] '1.1rc1' ! [during the release process] '1.1' ! [after the release process] '1.1+' ! [later] '1.2a0' ! Then announce the release on the mailing lists and watch the bug reports roll in. 8-) From anadelonbrin at users.sourceforge.net Tue Nov 30 22:45:06 2004 From: anadelonbrin at users.sourceforge.net (Tony Meyer) Date: Tue Nov 30 22:45:09 2004 Subject: [Spambayes-checkins] spambayes/spambayes __init__.py, 1.11.4.3, 1.11.4.4 Message-ID: Update of /cvsroot/spambayes/spambayes/spambayes In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv11647/spambayes Modified Files: Tag: release_1_0-branch __init__.py Log Message: Update to match the new versioning scheme. Index: __init__.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/spambayes/__init__.py,v retrieving revision 1.11.4.3 retrieving revision 1.11.4.4 diff -C2 -d -r1.11.4.3 -r1.11.4.4 *** __init__.py 22 Nov 2004 23:39:12 -0000 1.11.4.3 --- __init__.py 30 Nov 2004 21:44:55 -0000 1.11.4.4 *************** *** 1,3 **** # package marker. ! __version__ = '1.0.1' --- 1,3 ---- # package marker. ! __version__ = '1.0.1+' From anadelonbrin at users.sourceforge.net Tue Nov 30 22:49:22 2004 From: anadelonbrin at users.sourceforge.net (Tony Meyer) Date: Tue Nov 30 22:49:26 2004 Subject: [Spambayes-checkins] spambayes/spambayes __init__.py,1.12,1.13 Message-ID: Update of /cvsroot/spambayes/spambayes/spambayes In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv12772/spambayes Modified Files: __init__.py Log Message: Update to reflect where things really are at the moment. Index: __init__.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/spambayes/__init__.py,v retrieving revision 1.12 retrieving revision 1.13 diff -C2 -d -r1.12 -r1.13 *** __init__.py 25 Nov 2004 15:12:03 -0000 1.12 --- __init__.py 30 Nov 2004 21:49:19 -0000 1.13 *************** *** 1,3 **** # package marker. ! __version__ = '1.0.1' --- 1,3 ---- # package marker. ! __version__ = '1.1a0'