From tim_one at users.sourceforge.net Thu Jan 1 19:19:47 2004 From: tim_one at users.sourceforge.net (Tim Peters) Date: Thu Jan 1 19:19:53 2004 Subject: [Spambayes-checkins] website contact.ht,1.1,1.2 Message-ID: Update of /cvsroot/spambayes/website In directory sc8-pr-cvs1:/tmp/cvs-serv13966/website Modified Files: contact.ht Log Message: Beefed up the descriptions of the mailing lists, and empahsized that messages sent to them are visible to the world. Index: contact.ht =================================================================== RCS file: /cvsroot/spambayes/website/contact.ht,v retrieving revision 1.1 retrieving revision 1.2 diff -C2 -d -r1.1 -r1.2 *** contact.ht 12 Aug 2003 04:55:32 -0000 1.1 --- contact.ht 2 Jan 2004 00:19:45 -0000 1.2 *************** *** 5,32 ****

Mailing lists

!

There are currently five mailing lists of interest:

--- 5,45 ----

Mailing lists

!

There are currently five mailing lists of interest. All lists are public, and all ! are publicly archived. This is normal practice for open-source projects, and you ! should be aware that all email sent to one of these addresses will be visible ! to the world:

From anadelonbrin at users.sourceforge.net Thu Jan 1 19:20:58 2004 From: anadelonbrin at users.sourceforge.net (Tony Meyer) Date: Thu Jan 1 19:21:02 2004 Subject: [Spambayes-checkins] spambayes/spambayes UserInterface.py, 1.40, 1.41 oe_mailbox.py, 1.5, 1.6 Message-ID: Update of /cvsroot/spambayes/spambayes/spambayes In directory sc8-pr-cvs1:/tmp/cvs-serv14313/spambayes Modified Files: UserInterface.py oe_mailbox.py Log Message: Fix import error reported by Paul Sorenson. Index: UserInterface.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/spambayes/UserInterface.py,v retrieving revision 1.40 retrieving revision 1.41 diff -C2 -d -r1.40 -r1.41 *** UserInterface.py 29 Dec 2003 03:41:10 -0000 1.40 --- UserInterface.py 2 Jan 2004 00:20:56 -0000 1.41 *************** *** 23,26 **** --- 23,28 ---- onAdvancedconfig - present the appropriate advanced configuration page onHelp - present the help page + onStats - present statistics information + onBugreport - help the user fill out a bug report To Do: *************** *** 79,83 **** import oe_mailbox - from time import gmtime, strftime import PyMeldLite --- 81,84 ---- Index: oe_mailbox.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/spambayes/oe_mailbox.py,v retrieving revision 1.5 retrieving revision 1.6 diff -C2 -d -r1.5 -r1.6 *** oe_mailbox.py 30 Dec 2003 16:26:33 -0000 1.5 --- oe_mailbox.py 2 Jan 2004 00:20:56 -0000 1.6 *************** *** 17,25 **** import StringIO import sys ! if sys.platform == "win32": import win32api import win32con from win32com.shell import shell, shellcon ########################################################################### --- 17,30 ---- import StringIO import sys + from time import gmtime, strftime ! try: import win32api import win32con from win32com.shell import shell, shellcon + except ImportError: + # Not win32, or win32all not installed. + # Some functions will not work, but some will. + win32api = win32con = shell = shellcon = None ########################################################################### *************** *** 480,483 **** --- 485,491 ---- # same format). raise NotImplementedError + if win32api is None: + # Delayed import error from top. + raise ImportError("win32all not installed") reg = win32api.RegOpenKeyEx(win32con.HKEY_USERS, "") From tim_one at users.sourceforge.net Thu Jan 1 19:21:21 2004 From: tim_one at users.sourceforge.net (Tim Peters) Date: Thu Jan 1 19:21:23 2004 Subject: [Spambayes-checkins] website contact.ht,1.2,1.3 Message-ID: Update of /cvsroot/spambayes/website In directory sc8-pr-cvs1:/tmp/cvs-serv14388/website Modified Files: contact.ht Log Message: Repaired typos in new text. Index: contact.ht =================================================================== RCS file: /cvsroot/spambayes/website/contact.ht,v retrieving revision 1.2 retrieving revision 1.3 diff -C2 -d -r1.2 -r1.3 *** contact.ht 2 Jan 2004 00:19:45 -0000 1.2 --- contact.ht 2 Jan 2004 00:21:19 -0000 1.3 *************** *** 22,26 **** href="http://mail.python.org/mailman/listinfo/spambayes">spambayes user list. General discussions and user queries go here. (This was also the developer list until late May, 2003). ! This list is not moderated, although "unusual" message may be get held automatically, for moderator review. For example, very large messages are held for review, and the moderator may reject such messages, asking you to trim their size. --- 22,26 ---- href="http://mail.python.org/mailman/listinfo/spambayes">spambayes user list. General discussions and user queries go here. (This was also the developer list until late May, 2003). ! This list is not moderated, although "unusual" messages may get held automatically, for moderator review. For example, very large messages are held for review, and the moderator may reject such messages, asking you to trim their size. From tim_one at users.sourceforge.net Sun Jan 4 13:17:30 2004 From: tim_one at users.sourceforge.net (Tim Peters) Date: Sun Jan 4 13:17:32 2004 Subject: [Spambayes-checkins] website faq.txt,1.55,1.56 Message-ID: Update of /cvsroot/spambayes/website In directory sc8-pr-cvs1:/tmp/cvs-serv18610/website Modified Files: faq.txt Log Message: s/PSA/PSF/g Index: faq.txt =================================================================== RCS file: /cvsroot/spambayes/website/faq.txt,v retrieving revision 1.55 retrieving revision 1.56 diff -C2 -d -r1.55 -r1.56 *** faq.txt 31 Dec 2003 04:07:36 -0000 1.55 --- faq.txt 4 Jan 2004 18:17:27 -0000 1.56 *************** *** 60,64 **** SpamBayes is free and open-source - there is no charge. The software ! is released under `the PSA license`_. If you really feel that your life would be incomplete without giving --- 60,64 ---- SpamBayes is free and open-source - there is no charge. The software ! is released under `the PSF license`_. If you really feel that your life would be incomplete without giving *************** *** 78,82 **** ease-of-use. ! .. _the PSA license: http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/*checkout*/spambayes/spambayes/LICENSE.txt .. _I'm not a programmer but still want to help: #i-m-not-a-programmer-but-want-to-help-out-what-can-i-do .. _Python Software Foundation: http://www.python.org/psf/ --- 78,82 ---- ease-of-use. ! .. _the PSF license: http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/*checkout*/spambayes/spambayes/LICENSE.txt .. _I'm not a programmer but still want to help: #i-m-not-a-programmer-but-want-to-help-out-what-can-i-do .. _Python Software Foundation: http://www.python.org/psf/ From xenogeist at users.sourceforge.net Sun Jan 4 13:27:01 2004 From: xenogeist at users.sourceforge.net (Adam Walker) Date: Sun Jan 4 13:27:03 2004 Subject: [Spambayes-checkins] spambayes/scripts sb_server.py,1.16,1.17 Message-ID: Update of /cvsroot/spambayes/spambayes/scripts In directory sc8-pr-cvs1:/tmp/cvs-serv20244/scripts Modified Files: sb_server.py Log Message: Start SMTP proxy in a trainable state. Index: sb_server.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/scripts/sb_server.py,v retrieving revision 1.16 retrieving revision 1.17 diff -C2 -d -r1.16 -r1.17 *** sb_server.py 31 Dec 2003 03:32:37 -0000 1.16 --- sb_server.py 4 Jan 2004 18:26:59 -0000 1.17 *************** *** 859,863 **** servers, proxyPorts = smtpproxy.LoadServerInfo() proxyListeners.extend(smtpproxy.CreateProxies(servers, proxyPorts, ! state)) # setup info for the web interface --- 859,863 ---- servers, proxyPorts = smtpproxy.LoadServerInfo() proxyListeners.extend(smtpproxy.CreateProxies(servers, proxyPorts, ! smtpproxy.SMTPTrainer(state.bayes, state))) # setup info for the web interface From anadelonbrin at users.sourceforge.net Sun Jan 4 21:18:31 2004 From: anadelonbrin at users.sourceforge.net (Tony Meyer) Date: Sun Jan 4 21:18:37 2004 Subject: [Spambayes-checkins] spambayes/windows autoconfigure.py,1.9,1.10 Message-ID: Update of /cvsroot/spambayes/spambayes/windows In directory sc8-pr-cvs1:/tmp/cvs-serv9109/windows Modified Files: autoconfigure.py Log Message: Add extra utility functions to oe_mailbox for dealing with Outlook Express. OEAccountKeys() returns a generator of all the registry keys of OE Accounts. OEIdentityKeys() returns a generator of all the registry keys of Identities. OEIsInstalled() makes a guess at whether OE is installed *and setup*. Get autoconfigure.py to use these new functions. Have autoconfigure confirm that configuration has occured. Do better 'is installed' checks in autoconfigure. I *think* that this means that autoconfigure for OE will work on Win98 as well as WinXP now, but will have to check when I get home tonight. Index: autoconfigure.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/windows/autoconfigure.py,v retrieving revision 1.9 retrieving revision 1.10 diff -C2 -d -r1.9 -r1.10 *** autoconfigure.py 30 Dec 2003 01:54:59 -0000 1.9 --- autoconfigure.py 5 Jan 2004 02:18:29 -0000 1.10 *************** *** 70,73 **** --- 70,74 ---- import win32api import win32con + import pywintypes from win32com.shell import shell, shellcon except ImportError: *************** *** 76,81 **** # fail. (And having "import win32api" in lots of functions # didn't seem to make much sense). ! win32api = win32con = shell = shellcon = win32ui = None from spambayes import OptionsClass from spambayes.Options import options, optionsPathname --- 77,83 ---- # fail. (And having "import win32api" in lots of functions # didn't seem to make much sense). ! win32api = win32con = shell = shellcon = win32ui = pywintypes = None + from spambayes import oe_mailbox from spambayes import OptionsClass from spambayes.Options import options, optionsPathname *************** *** 121,124 **** --- 123,127 ---- smtp_proxy = smtp_proxy_port + results = [] for sect in c.sections(): if sect.startswith("Persona-") or sect == "Settings": *************** *** 149,155 **** options[us_name, "remote_servers"] += (server,) options[us_name, "listen_ports"] += (proxy_port,) ! if options["globals", "verbose"]: ! print "[%s] Proxy %s on localhost:%s" % \ ! (sect, server, proxy_port) c.set(sect, eud_name, "localhost") c.set(sect, eud_port, proxy_port) --- 152,157 ---- options[us_name, "remote_servers"] += (server,) options[us_name, "listen_ports"] += (proxy_port,) ! results.append("[%s] Proxy %s on localhost:%s" % \ ! (sect, server, proxy_port)) c.set(sect, eud_name, "localhost") c.set(sect, eud_port, proxy_port) *************** *** 200,203 **** --- 202,206 ---- filter_file.write(filter_rules) filter_file.close() + return results def configure_mozilla(config_location): *************** *** 214,217 **** --- 217,221 ---- r = re.compile(r"user_pref\(\"mail.server.server(\d+).(real)?hostname\", \"([^\"]*)\"\);") current_pos = 0 + results = [] while True: m = r.search(prefs[current_pos:]) *************** *** 272,278 **** save_prefs = save_prefs.replace(old_port, port_pref) save_prefs = save_prefs.replace(old_pref, pref) ! if options["globals", "verbose"]: ! print "[%s] Proxy %s on localhost:%s" % \ ! (num, server, proxy_port) # Do the SMTP server. --- 276,281 ---- save_prefs = save_prefs.replace(old_port, port_pref) save_prefs = save_prefs.replace(old_pref, pref) ! results.append("[%s] Proxy %s on localhost:%s" % \ ! (num, server, proxy_port)) # Do the SMTP server. *************** *** 323,329 **** save_prefs = save_prefs.replace(old_port, port_pref) save_prefs = save_prefs.replace(old_pref, pref) ! if options["globals", "verbose"]: ! print "[%s] Proxy %s on localhost:%s" % \ ! (num, server, proxy_port) prefs_file = file("%s%sprefs.js" % (config_location, os.sep), "w") --- 326,331 ---- save_prefs = save_prefs.replace(old_port, port_pref) save_prefs = save_prefs.replace(old_pref, pref) ! results.append("[%s] Proxy %s on localhost:%s" % \ ! (num, server, proxy_port)) prefs_file = file("%s%sprefs.js" % (config_location, os.sep), "w") *************** *** 361,364 **** --- 363,367 ---- # We are assuming that a rules file already exists, otherwise there # is a bit more to go at the top. + return results def configure_m2(config_location): *************** *** 383,386 **** --- 386,390 ---- smtp_proxy = smtp_proxy_port + results = [] for sect in c.sections(): if sect.startswith("Account") and sect != "Accounts": *************** *** 405,411 **** options[us_name, "remote_servers"] += (server,) options[us_name, "listen_ports"] += (proxy_port,) ! if options["globals", "verbose"]: ! print "[%s] Proxy %s on localhost:%s" % \ ! (sect, server, proxy_port) c.set(sect, m2_name, "localhost") c.set(sect, m2_port, proxy_port) --- 409,414 ---- options[us_name, "remote_servers"] += (server,) options[us_name, "listen_ports"] += (proxy_port,) ! results.append("[%s] Proxy %s on localhost:%s" % \ ! (sect, server, proxy_port)) c.set(sect, m2_name, "localhost") c.set(sect, m2_port, proxy_port) *************** *** 423,426 **** --- 426,430 ---- # If someone can describe the best all-purpose rule, I'll pop it in # here. + return results def configure_outlook_express(unused): *************** *** 432,437 **** raise ImportError("win32 extensions required") ! # OE stores its configuration in the registry, not a file. ! key = "Software\\Microsoft\\Internet Account Manager\\Accounts" translate = {("POP3 Server", "POP3 Port") : "pop3proxy", --- 436,440 ---- raise ImportError("win32 extensions required") ! accounts = oe_mailbox.OEAccountKeys() translate = {("POP3 Server", "POP3 Port") : "pop3proxy", *************** *** 442,475 **** smtp_proxy = smtp_proxy_port ! reg = win32api.RegOpenKeyEx(win32con.HKEY_CURRENT_USER, key) ! account_index = 0 ! while True: ! # Loop through all the accounts ! config = {} ! try: ! subkey_name = "%s\\%s" % (key, ! win32api.RegEnumKey(reg, ! account_index)) ! except win32api.error: ! break ! account_index += 1 ! index = 0 ! subkey = win32api.RegOpenKeyEx(win32con.HKEY_CURRENT_USER, ! subkey_name, 0, win32con.KEY_READ | ! win32con.KEY_SET_VALUE) ! while True: ! # Loop through all the keys ! try: ! raw = win32api.RegEnumValue(subkey, index) ! except win32api.error: ! break ! config[raw[0]] = (raw[1], raw[2]) ! index += 1 ! ! # Process this account ! if config.has_key("POP3 Server"): for (server_key, port_key), sect in translate.items(): ! server = "%s:%s" % (config[server_key][0], ! config[port_key][0]) if sect[:4] == "pop3": pop_proxy = move_to_next_free_port(pop_proxy) --- 445,454 ---- smtp_proxy = smtp_proxy_port ! results = [] ! for proto, subkey, account in accounts: ! if proto == "POP3": for (server_key, port_key), sect in translate.items(): ! server = "%s:%s" % (account[server_key][0], ! account[port_key][0]) if sect[:4] == "pop3": pop_proxy = move_to_next_free_port(pop_proxy) *************** *** 484,491 **** win32api.RegSetValueEx(subkey, port_key, 0, win32con.REG_SZ, str(proxy)) ! if options["globals", "verbose"]: ! print "[%s] Proxy %s on localhost:%s" % \ ! (config["Account Name"][0], server, proxy) ! elif config.has_key("IMAP Server"): # Setup imapfilter instead. pass --- 463,469 ---- win32api.RegSetValueEx(subkey, port_key, 0, win32con.REG_SZ, str(proxy)) ! results.append("[%s] Proxy %s on localhost:%s" % \ ! (account["Account Name"][0], server, proxy)) ! elif proto == "IMAP4": # Setup imapfilter instead. pass *************** *** 496,499 **** --- 474,478 ---- # be set up to work with notate_to or notate_subject? (and set that # option, obviously) + return results def configure_pegasus_mail(config_location): *************** *** 509,512 **** --- 488,492 ---- smtp_proxy = smtp_proxy_port + results = [] for filename in os.listdir(config_location): if filename.lower().startswith("pop") or filename.lower().startswith("smt"): *************** *** 531,537 **** c.set("all", "port", proxy) c.update_file(working_filename) ! if options["globals", "verbose"]: ! print "[%s] Proxy %s on localhost:%s" % \ ! (c.get("all", "title"), server, proxy) elif filename.lower() == "IMAP.PM": # Setup imapfilter instead. --- 511,516 ---- c.set("all", "port", proxy) c.update_file(working_filename) ! results.append("[%s] Proxy %s on localhost:%s" % \ ! (c.get("all", "title"), server, proxy)) elif filename.lower() == "IMAP.PM": # Setup imapfilter instead. *************** *** 558,566 **** rules_file.write(rule) rules_file.close() ! def configure_pocomail(unused): ! # Requires win32all to be available. if win32api is None: ! raise ImportError("win32 extensions required") key = "Software\\Poco Systems Inc" --- 537,546 ---- rules_file.write(rule) rules_file.close() + return results ! def pocomail_accounts_filename(): if win32api is None: ! # If we don't have win32, then we don't know. ! return "" key = "Software\\Poco Systems Inc" *************** *** 568,579 **** smtp_proxy = smtp_proxy_port ! reg = win32api.RegOpenKeyEx(win32con.HKEY_CURRENT_USER, key) ! subkey_name = "%s\\%s" % (key, win32api.RegEnumKey(reg, 0)) ! reg = win32api.RegOpenKeyEx(win32con.HKEY_CURRENT_USER, ! subkey_name) ! pocomail_path = win32api.RegQueryValueEx(reg, "Path")[0] ! pocomail_accounts_file = os.path.join(pocomail_path, "accounts.ini") if os.path.exists(pocomail_accounts_file): f = open(pocomail_accounts_file, "r") --- 548,565 ---- smtp_proxy = smtp_proxy_port ! try: ! reg = win32api.RegOpenKeyEx(win32con.HKEY_CURRENT_USER, key) ! except pywintypes.error: ! # It seems that we don't have PocoMail ! return "" ! else: ! subkey_name = "%s\\%s" % (key, win32api.RegEnumKey(reg, 0)) ! reg = win32api.RegOpenKeyEx(win32con.HKEY_CURRENT_USER, ! subkey_name) ! pocomail_path = win32api.RegQueryValueEx(reg, "Path")[0] ! return os.path.join(pocomail_path, "accounts.ini") + def configure_pocomail(pocomail_accounts_file): if os.path.exists(pocomail_accounts_file): f = open(pocomail_accounts_file, "r") *************** *** 668,671 **** --- 654,658 ---- f.write(filter + '\n') f.close() + return [] *************** *** 678,691 **** if win32api is None: raise ImportError("win32 extensions required") ! if mailer in ["Outlook Express", "PocoMail"]: ! # Outlook Express and PocoMail can be configured without a ! # config location, because it's all in the registry return "" windowsUserDirectory = shell.SHGetFolderPath(0,shellcon.CSIDL_APPDATA,0,0) ! potential_locations = {"Eudora" : ("Qualcomm%(sep)sEudora",), ! "Mozilla" : ("Mozilla%(sep)sProfiles%(sep)s%(user)s", ! "Mozilla%(sep)sProfiles%(sep)sdefault",), ! "M2" : ("Opera%(sep)sOpera7",), ! } # We try with the username that the user uses # for Windows, even though that might not be the same as their profile --- 665,681 ---- if win32api is None: raise ImportError("win32 extensions required") ! if mailer in ["Outlook Express", ]: ! # Outlook Express can be configured without a ! # config location, because it's all in the registry. return "" windowsUserDirectory = shell.SHGetFolderPath(0,shellcon.CSIDL_APPDATA,0,0) ! potential_locations = \ ! {"Eudora" : ("%(wud)s%(sep)sQualcomm%(sep)sEudora",), ! "Mozilla" : \ ! ("%(wud)s%(sep)sMozilla%(sep)sProfiles%(sep)s%(user)s", ! "%(wud)s%(sep)sMozilla%(sep)sProfiles%(sep)sdefault",), ! "M2" : ("%(wud)s%(sep)sOpera%(sep)sOpera7",), ! "PocoMail" : (pocomail_accounts_filename(),), ! } # We try with the username that the user uses # for Windows, even though that might not be the same as their profile *************** *** 693,712 **** username = win32api.GetUserName() loc_dict = {"sep" : os.sep, "user" : username} for loc in potential_locations[mailer]: loc = loc % loc_dict - loc = os.path.join(windowsUserDirectory, loc) if os.path.exists(loc): return loc return None - def configure(mailer): ! """Automatically configure the specified mailer and SpamBayes. ! Return True if successful, False otherwise. ! """ loc = find_config_location(mailer) if loc is None: ! return False funcs = {"Eudora" : configure_eudora, "Mozilla" : configure_mozilla, --- 683,700 ---- username = win32api.GetUserName() loc_dict = {"sep" : os.sep, + "wud" : windowsUserDirectory, "user" : username} for loc in potential_locations[mailer]: loc = loc % loc_dict if os.path.exists(loc): return loc return None def configure(mailer): ! """Automatically configure the specified mailer and SpamBayes.""" loc = find_config_location(mailer) if loc is None: ! # Can't set it up, so do nothing. ! return funcs = {"Eudora" : configure_eudora, "Mozilla" : configure_mozilla, *************** *** 715,729 **** "PocoMail" : configure_pocomail, } ! funcs[mailer](loc) ! return True def offer_to_configure(mailer): """If the mailer appears to be installed, offer to set it up for SpamBayes (and SpamBayes for it).""" - # Requires win32all to be available, or someone to write a version - # with a different gui. - if win32api is None: - raise ImportError("win32 extensions required") # At the moment, the test we use to check if the mailer is installed # is whether a valid path to the configuration file can be found. --- 703,730 ---- "PocoMail" : configure_pocomail, } ! return funcs[mailer](loc) ! ! def is_installed(mailer): ! """Return True if we believe that the mailer is installed.""" ! # For the simpler mailers, we believe it is installed if the ! # configuration path can be found and exists. ! config_location = find_config_location(mailer) ! if config_location: ! if os.path.exists(config_location): ! return True ! return False ! # For the ones based in the registry, we have different ! # techniques. ! if mailer == "Outlook Express": ! if oe_mailbox.OEIsInstalled(): ! return True ! return False + # If we don't know, guess that it isn't. + return False def offer_to_configure(mailer): """If the mailer appears to be installed, offer to set it up for SpamBayes (and SpamBayes for it).""" # At the moment, the test we use to check if the mailer is installed # is whether a valid path to the configuration file can be found. *************** *** 739,743 **** win32con.MB_YESNO) if ans == win32con.IDYES: ! configure(mailer) --- 740,750 ---- win32con.MB_YESNO) if ans == win32con.IDYES: ! results = configure(mailer) ! if results is None: ! win32ui.MessageBox("Configuration unsuccessful.", "Error", ! win32con.MB_OK) ! else: ! text = "Configuration complete.\n\n" + "\n".join(results) ! win32ui.MessageBox(text, "Complete", win32con.MB_OK) From anadelonbrin at users.sourceforge.net Sun Jan 4 21:18:31 2004 From: anadelonbrin at users.sourceforge.net (Tony Meyer) Date: Sun Jan 4 21:18:38 2004 Subject: [Spambayes-checkins] spambayes/spambayes oe_mailbox.py,1.6,1.7 Message-ID: Update of /cvsroot/spambayes/spambayes/spambayes In directory sc8-pr-cvs1:/tmp/cvs-serv9109/spambayes Modified Files: oe_mailbox.py Log Message: Add extra utility functions to oe_mailbox for dealing with Outlook Express. OEAccountKeys() returns a generator of all the registry keys of OE Accounts. OEIdentityKeys() returns a generator of all the registry keys of Identities. OEIsInstalled() makes a guess at whether OE is installed *and setup*. Get autoconfigure.py to use these new functions. Have autoconfigure confirm that configuration has occured. Do better 'is installed' checks in autoconfigure. I *think* that this means that autoconfigure for OE will work on Win98 as well as WinXP now, but will have to check when I get home tonight. Index: oe_mailbox.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/spambayes/oe_mailbox.py,v retrieving revision 1.6 retrieving revision 1.7 diff -C2 -d -r1.6 -r1.7 *** oe_mailbox.py 2 Jan 2004 00:20:56 -0000 1.6 --- oe_mailbox.py 5 Jan 2004 02:18:29 -0000 1.7 *************** *** 475,482 **** return content ! def OEStoreRoot(): ! """Return the path to the Outlook Express Store Root. ! Tested with Outlook Express 5.0 with Windows XP.""" if sys.platform != "win32": # AFAIK, there is only a Win32 OE, and a Mac OE. --- 475,482 ---- return content ! def OEIdentityKeys(): ! """Return the OE identity keys. ! Tested with Outlook Express 6.0 with Windows XP.""" if sys.platform != "win32": # AFAIK, there is only a Win32 OE, and a Mac OE. *************** *** 525,538 **** # Not this user continue try: ! raw = win32api.RegQueryValueEx(subkey, "Store Root") except win32api.error: break ! UserDirectory = shell.SHGetFolderPath \ ! (0, shellcon.CSIDL_LOCAL_APPDATA, 0, 0) ! raw = raw[0].replace("%UserProfile%\\Local Settings\\" \ ! "Application Data", UserDirectory) ! return raw ## For use by the test tools. --- 525,607 ---- # Not this user continue + yield subkey + def OEStoreRoot(): + """Return the path to the Outlook Express Store Root. + + Tested with Outlook Express 6.0 with Windows XP.""" + # Run through the identity keys, using the first that + # works. + raw = "" + for identity in OEIdentityKeys(): + try: + raw = win32api.RegQueryValueEx(identity, "Store Root") + except win32api.error: + pass + else: + break + # I can't find a shellcon to that is the same as %UserProfile%, + # so extract it from CSIDL_LOCAL_APPDATA + UserDirectory = shell.SHGetFolderPath \ + (0, shellcon.CSIDL_LOCAL_APPDATA, 0, 0) + parts = UserDirectory.split(os.sep) + UserProfile = os.sep.join(parts[:-2]) + raw = raw[0].replace("%UserProfile%", UserProfile) + return raw + + def OEAccountKeys(permission = win32con.KEY_READ | win32con.KEY_SET_VALUE): + """Return registry keys for each of the OE mail accounts, along + with information about what type of mail account it is.""" + possible_root_keys = [] + + # This appears to be the place for OE6 and WinXP + # (So I'm guessing also for NT4) + if sys.getwindowsversion()[0] >= 4: + possible_root_keys = ["Software\\Microsoft\\" \ + "Internet Account Manager\\Accounts"] + else: + # This appears to be the place for OE6 and Win98 + # (So I'm guessing also for Win95) + possible_root_keys = oe_mailbox.OEIdentityKeys() + + for key in possible_root_keys: + reg = win32api.RegOpenKeyEx(win32con.HKEY_CURRENT_USER, key) + account_index = 0 + while True: + # Loop through all the accounts + account = {} try: ! subkey_name = "%s\\%s" % \ ! (key, win32api.RegEnumKey(reg, account_index)) except win32api.error: break ! account_index += 1 ! index = 0 ! subkey = win32api.RegOpenKeyEx(win32con.HKEY_CURRENT_USER, ! subkey_name, 0, permission) ! while True: ! # Loop through all the keys so that we can determine ! # what type of account this is. ! try: ! name, value, typ = win32api.RegEnumValue(subkey, index) ! except win32api.error: ! break ! account[name] = (value, typ) ! index += 1 ! ! # Yield, as appropriate. ! if account.has_key("POP3 Server"): ! yield("POP3", subkey, account) ! elif account.has_key("IMAP Server"): ! yield("IMAP4", subkey, account) ! ! def OEIsInstalled(): ! """Return True if Outlook Express appears to be installed, ! and in use (I think if sys.platform == "win32" would say if ! it was installed at all).""" ! # Our heuristic is that there is at least one mail account setup. ! if len(list(OEAccountKeys)) > 0: ! return True ! return False ## For use by the test tools. *************** *** 588,592 **** ########################################################################### ! if __name__ == '__main__': import sys import getopt --- 657,661 ---- ########################################################################### ! def test(): import sys import getopt *************** *** 606,614 **** print_message = True ! if not args: ! print "Please enter a directory with dbx files." ! sys.exit() ! ! MAILBOX_DIR = args[0] files = [os.path.join(MAILBOX_DIR, file) for file in \ --- 675,682 ---- print_message = True ! if args: ! MAILBOX_DIR = args[0] ! else: ! MAILBOX_DIR = OEStoreRoot() files = [os.path.join(MAILBOX_DIR, file) for file in \ *************** *** 655,656 **** --- 723,727 ---- dbx.close() + + if __name__ == '__main__': + test() From bwarsaw at users.sourceforge.net Mon Jan 5 09:34:33 2004 From: bwarsaw at users.sourceforge.net (Barry A. Warsaw) Date: Mon Jan 5 09:34:36 2004 Subject: [Spambayes-checkins] spambayes README.txt,1.62,1.63 Message-ID: Update of /cvsroot/spambayes/spambayes In directory sc8-pr-cvs1:/tmp/cvs-serv29961 Modified Files: README.txt Log Message: Update where to get the email package, as pointed out by Harri Pasanen. Index: README.txt =================================================================== RCS file: /cvsroot/spambayes/spambayes/README.txt,v retrieving revision 1.62 retrieving revision 1.63 diff -C2 -d -r1.62 -r1.63 *** README.txt 29 Dec 2003 05:20:34 -0000 1.62 --- README.txt 5 Jan 2004 14:34:31 -0000 1.63 *************** *** 43,52 **** version of the email package. ! If not, you can download email version 2.5 from ! and install it - unpack the archive, cd to the email-2.5 directory and ! type "python setup.py install". This will install it into your Python ! site-packages directory. You'll also need to move aside the standard ! "email" library - go to your Python "Lib" directory and rename "email" ! to "email_old". To run the Outlook plug-in from source, you also need have the win32com --- 43,53 ---- version of the email package. ! If not, you can download email version 2.5 from the email SIG at ! and install it - unpack the ! archive, cd to the email-2.5 directory and type "python setup.py ! install". This will install it into your Python site-packages ! directory. You'll also need to move aside the standard "email" ! library - go to your Python "Lib" directory and rename "email" to ! "email_old". To run the Outlook plug-in from source, you also need have the win32com From montanaro at users.sourceforge.net Mon Jan 5 12:40:33 2004 From: montanaro at users.sourceforge.net (Skip Montanaro) Date: Mon Jan 5 12:40:37 2004 Subject: [Spambayes-checkins] spambayes/spambayes Options.py, 1.97, 1.98 tokenizer.py, 1.27, 1.28 Message-ID: Update of /cvsroot/spambayes/spambayes/spambayes In directory sc8-pr-cvs1:/tmp/cvs-serv31327/spambayes Modified Files: Options.py tokenizer.py Log Message: Add experimental option and code to pick out some semantic bits from URLs based upon a few examples which suggest some recent spam scams use non-standard ports, raw ips for hostnames, embedded usernames and %-entity-based obfuscation to coax people to click through to what appears to be a valid site. Tests so far show this to be a small win, but it is well-isolated and should add at most 300 new tokens to the database. This checkin allows easier widespread testing. Index: Options.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/spambayes/Options.py,v retrieving revision 1.97 retrieving revision 1.98 diff -C2 -d -r1.97 -r1.98 *** Options.py 30 Dec 2003 16:26:33 -0000 1.97 --- Options.py 5 Jan 2004 17:40:25 -0000 1.98 *************** *** 146,149 **** --- 146,154 ---- BOOLEAN, RESTORE), + ("x-pick_apart_urls", "Extract clues about url structure", False, + """(EXPERIMENTAL) Note whether url contains non-standard port or + user/password elements.""", + BOOLEAN, RESTORE), + ("replace_nonascii_chars", "Replace non-ascii characters", False, """If true, replace high-bit characters (ord(c) >= 128) and control Index: tokenizer.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/spambayes/tokenizer.py,v retrieving revision 1.27 retrieving revision 1.28 diff -C2 -d -r1.27 -r1.28 *** tokenizer.py 30 Dec 2003 16:26:33 -0000 1.27 --- tokenizer.py 5 Jan 2004 17:40:30 -0000 1.28 *************** *** 14,17 **** --- 14,19 ---- import os import binascii + import urlparse + import urllib try: from sets import Set *************** *** 1013,1024 **** def tokenize(self, m): proto, guts = m.groups() tokens = ["proto:" + proto] pushclue = tokens.append # Lose the trailing punctuation for casual embedding, like: # The code is at http://mystuff.org/here? Didn't resolve. # or # I found it at http://mystuff.org/there/. Thanks! - assert guts while guts and guts[-1] in '.:?!/': guts = guts[:-1] --- 1015,1072 ---- def tokenize(self, m): proto, guts = m.groups() + assert guts tokens = ["proto:" + proto] pushclue = tokens.append + if options["Tokenizer", "x-pick_apart_urls"]: + url = proto + "://" + guts + + escapes = re.findall(r'%..', guts) + # roughly how many %nn escapes are there? + if escapes: + pushclue("url:%%%d" % int(log2(len(escapes)))) + # %nn escapes are usually intentional obfuscation. Generate a + # lot of correlated tokens if the URL contains a lot of them. + # The classifier will learn which specific ones are and aren't + # spammy. + tokens.extend(["url:" + escape for escape in escapes]) + + # now remove any obfuscation and probe around a bit + url = urllib.unquote(url) + scheme, netloc, path, params, query, frag = urlparse.urlparse(url) + + # one common technique in bogus "please (re-)authorize yourself" + # scams is to make it appear as if you're visiting a valid + # payment-oriented site like PayPal, CitiBank or eBay, when you + # actually aren't. The company's web server appears as the + # beginning of an often long username element in the URL such as + # http://www.paypal.com%65%43%99%35@10.0.1.1/iwantyourccinfo + # generally with an innocuous-looking fragment of text or a + # valid URL as the highlighted link. Usernames should rarely + # appear in URLs (perhaps in a local bookmark you established), + # and never in a URL you receive from an unsolicited email or + # another website. + user_pwd, host_port = urllib.splituser(netloc) + if user_pwd is not None: + pushclue("url:has user") + + host, port = urllib.splitport(host_port) + # web servers listening on non-standard ports are suspicious ... + if port is not None: + if (scheme == "http" and port != '80' or + scheme == "https" and port != '443'): + pushclue("url:non-standard %s port" % scheme) + + # ... as are web servers associated with raw ip addresses + if re.match("(\d+\.?){4,4}$", host) is not None: + pushclue("url:ip addr") + + # make sure we later tokenize the unobfuscated url bits + proto, guts = url.split("://", 1) + # Lose the trailing punctuation for casual embedding, like: # The code is at http://mystuff.org/here? Didn't resolve. # or # I found it at http://mystuff.org/there/. Thanks! while guts and guts[-1] in '.:?!/': guts = guts[:-1] From montanaro at users.sourceforge.net Mon Jan 5 12:44:36 2004 From: montanaro at users.sourceforge.net (Skip Montanaro) Date: Mon Jan 5 12:44:39 2004 Subject: [Spambayes-checkins] spambayes/spambayes ImapUI.py, 1.33, 1.34 ProxyUI.py, 1.36, 1.37 Message-ID: Update of /cvsroot/spambayes/spambayes/spambayes In directory sc8-pr-cvs1:/tmp/cvs-serv356/spambayes Modified Files: ImapUI.py ProxyUI.py Log Message: Add x-pick_apart_urls to advanced config page. (Are ImapUI.py and ProxyUI.py supposed to be so different in the advanced options they expose? Should x- options *not* be exposed?) Index: ImapUI.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/spambayes/ImapUI.py,v retrieving revision 1.33 retrieving revision 1.34 diff -C2 -d -r1.33 -r1.34 *** ImapUI.py 30 Dec 2003 22:57:23 -0000 1.33 --- ImapUI.py 5 Jan 2004 17:44:33 -0000 1.34 *************** *** 99,102 **** --- 99,103 ---- ('Tokenizer', 'summarize_email_prefixes'), ('Tokenizer', 'summarize_email_suffixes'), + ('Tokenizer', 'x-pick_apart_urls'), ('Interface Options', None), ('html_ui', 'display_adv_find'), Index: ProxyUI.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/spambayes/ProxyUI.py,v retrieving revision 1.36 retrieving revision 1.37 diff -C2 -d -r1.36 -r1.37 *** ProxyUI.py 30 Dec 2003 22:57:23 -0000 1.36 --- ProxyUI.py 5 Jan 2004 17:44:33 -0000 1.37 *************** *** 128,131 **** --- 128,132 ---- ('Tokenizer', 'summarize_email_prefixes'), ('Tokenizer', 'summarize_email_suffixes'), + ('Tokenizer', 'x-pick_apart_urls'), ('Training Options', None), ('Hammie', 'train_on_filter'), From montanaro at users.sourceforge.net Tue Jan 6 10:27:20 2004 From: montanaro at users.sourceforge.net (Skip Montanaro) Date: Tue Jan 6 10:27:24 2004 Subject: [Spambayes-checkins] spambayes/utilities mkreversemap.py,NONE,1.1 Message-ID: Update of /cvsroot/spambayes/spambayes/utilities In directory sc8-pr-cvs1:/tmp/cvs-serv22747 Added Files: mkreversemap.py Log Message: New script which generates a pickle file mapping features to mailbox files and message-id's. Use with extractmessages.py. --- NEW FILE: mkreversemap.py --- #!/usr/bin/env python """ Create mapping from features to message ids usage %(prog)s [ -h ] -t ham|spam -d mapfile mailbox ... -d mapfile - identify file which will hold mapping information -t ham|spam - identify the type of messages in the input mailbox(es) -h - print this documentation and exit """ import sys import getopt import anydbm import cPickle as pickle from spambayes.mboxutils import getmbox from spambayes.tokenizer import tokenize prog = sys.argv[0] def usage(msg=None): if msg is not None: print >> sys.stderr, msg print >> sys.stderr, __doc__.strip() % globals() def mapmessages(f, mboxtype, mapdb): i = 0 for msg in getmbox(f): i += 1 sys.stdout.write('\r%s: %d' % (f, i)) sys.stdout.flush() msgid = msg.get("message-id") if msgid is None: continue for t in tokenize(msg): ham, spam = mapdb.get(t, ({}, {})) if mboxtype == "ham": msgids = ham.get(f, set()) msgids.add(msgid) ham[f] = msgids else: msgids = spam.get(f, set()) msgids.add(msgid) spam[f] = msgids mapdb[t] = (ham, spam) sys.stdout.write("\n") def main(args): try: opts, args = getopt.getopt(args, "hd:t:", ["type=", "help", "database="]) except getopt.GetoptError, msg: usage(msg) return 1 mapfile = None mboxtype = None for opt, arg in opts: if opt in ("-h", "--help"): usage() return 0 elif opt in ("-d", "--database"): mapfile = arg elif opt in ("-t", "--type"): mboxtype = arg if mapfile is None: usage("'-d mapfile' is required") return 1 if mboxtype is None: usage("'-t ham|spam' is required") return 1 if mboxtype not in ("ham", "spam"): usage("mboxtype must be 'ham' or 'spam'") return 1 try: mapd = pickle.load(file(mapfile)) except IOError: mapd = {} for f in args: mapmessages(f, mboxtype, mapd) pickle.dump(mapd, file(mapfile, "w")) if __name__ == "__main__": sys.exit(main(sys.argv[1:])) From montanaro at users.sourceforge.net Tue Jan 6 10:28:00 2004 From: montanaro at users.sourceforge.net (Skip Montanaro) Date: Tue Jan 6 10:28:03 2004 Subject: [Spambayes-checkins] spambayes/utilities extractmessages.py, NONE, 1.1 Message-ID: Update of /cvsroot/spambayes/spambayes/utilities In directory sc8-pr-cvs1:/tmp/cvs-serv22825 Added Files: extractmessages.py Log Message: Use with mkreversemap.py to identify messages in your training database which contain interesting tokens. --- NEW FILE: extractmessages.py --- #!/usr/bin/env python """ Extract messages which share extreme tokens with message(s) on cmd line usage %(prog)s [ -h ] -d mapfile [ -f feature [ -S file ] [ -H file ] file ... If no features are given on the command line, one or more files containing messages with X-Spambayes-Evidence headers must be present. -d mapfile - specify file which holds feature mapping information -S file - output spam message file -H file - output spam message file -f feature - specify feature to locate (may be given more than once) -h - print this documentation and exit """ import sys import getopt import re import cPickle as pickle from spambayes.mboxutils import getmbox from spambayes.tokenizer import tokenize prog = sys.argv[0] def usage(msg=None): if msg is not None: print >> sys.stderr, msg print >> sys.stderr, __doc__.strip() % globals() def extractmessages(mapdb, hamfile, spamfile, features): """tokenize messages in f and extract messages with tokens to match""" i = 0 hamids = {} spamids = {} for feature in features: ham, spam = mapdb.get(feature, ([], [])) if hamfile is not None: for mbox in ham: msgids = hamids.get(mbox, set()) msgids.update(ham.get(mbox, set())) hamids[mbox] = msgids if spamfile is not None: for mbox in spam: msgids = spamids.get(mbox, set()) msgids.update(spam.get(mbox, set())) spamids[mbox] = msgids # now run through each mailbox in hamids and spamids and print # matching messages to relevant ham or spam files i = 0 for mailfile in hamids: msgids = hamids[mailfile] for msg in getmbox(mailfile): if msg.get("message-id") in msgids: i += 1 sys.stdout.write('\r%s: %5d' % (mailfile, i)) sys.stdout.flush() print >> hamfile, msg print for mailfile in spamids: msgids = spamids[mailfile] for msg in getmbox(mailfile): if msg.get("message-id") in msgids: i += 1 sys.stdout.write('\r%s: %5d' % (mailfile, i)) sys.stdout.flush() print >> spamfile, msg print def main(args): try: opts, args = getopt.getopt(args, "hd:S:H:f:", ["help", "database=", "spamfile=", "hamfile=", "feature="]) except getopt.GetoptError, msg: usage(msg) return 1 mapfile = spamfile = hamfile = None features = [] for opt, arg in opts: if opt in ("-h", "--help"): usage() return 0 elif opt in ("-d", "--database"): mapfile = arg elif opt in ("-H", "--hamfile"): hamfile = arg elif opt in ("-S", "--spamfile"): spamfile = arg elif opt in ("-f", "--feature"): features.append(arg) if hamfile is None and spamfile is None: usage("At least one of -S or -H are required") return 1 if mapfile is None: usage("'-d mapfile' is required") return 1 try: mapd = pickle.load(file(mapfile)) except IOError: usage("Mapfile %s does not exist" % mapfile) return 1 if not features and not args: usage("Require at least one feature (-f) arg or one message file") return 1 if not features: # extract significant tokens from each message and identify # where they came from for f in args: for msg in getmbox(f): evidence = msg.get("X-Spambayes-Evidence", "") evidence = re.sub(r"\s+", " ", evidence) features = [e.rsplit(": ", 1)[0] for e in evidence.split("; ")[2:]] features = [eval(f) for f in features] if not features: usage("No X-Spambayes-Evidence headers found") return 1 if spamfile is not None: spamfile = file(spamfile, "w") if hamfile is not None: hamfile = file(hamfile, "w") extractmessages(mapd, hamfile, spamfile, features) if __name__ == "__main__": sys.exit(main(sys.argv[1:])) From montanaro at users.sourceforge.net Tue Jan 6 10:45:13 2004 From: montanaro at users.sourceforge.net (Skip Montanaro) Date: Tue Jan 6 10:45:17 2004 Subject: [Spambayes-checkins] spambayes/utilities extractmessages.py, 1.1, 1.2 Message-ID: Update of /cvsroot/spambayes/spambayes/utilities In directory sc8-pr-cvs1:/tmp/cvs-serv26794 Modified Files: extractmessages.py Log Message: correct usage message, dump unused tokenize variable, reorder args to extractmessages() to reflect relative input/output relationship Index: extractmessages.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/utilities/extractmessages.py,v retrieving revision 1.1 retrieving revision 1.2 diff -C2 -d -r1.1 -r1.2 *** extractmessages.py 6 Jan 2004 15:27:57 -0000 1.1 --- extractmessages.py 6 Jan 2004 15:45:10 -0000 1.2 *************** *** 2,13 **** """ ! Extract messages which share extreme tokens with message(s) on cmd line ! ! usage %(prog)s [ -h ] -d mapfile [ -f feature [ -S file ] [ -H file ] file ... ! If no features are given on the command line, one or more files containing ! messages with X-Spambayes-Evidence headers must be present. ! -d mapfile - specify file which holds feature mapping information -S file - output spam message file --- 2,10 ---- """ ! Extract messages which contain given features ! usage: %(prog)s [ options ] ! -d mapfile - specify file which holds feature mapping information (required) -S file - output spam message file *************** *** 18,21 **** --- 15,22 ---- -h - print this documentation and exit + + At least one of either the -H or -S flags must be given on the command line. + If no features are given on the command line with the -f flag, one or more + files containing messages with X-Spambayes-Evidence headers must be given. """ *************** *** 26,30 **** from spambayes.mboxutils import getmbox - from spambayes.tokenizer import tokenize prog = sys.argv[0] --- 27,30 ---- *************** *** 35,40 **** print >> sys.stderr, __doc__.strip() % globals() ! def extractmessages(mapdb, hamfile, spamfile, features): ! """tokenize messages in f and extract messages with tokens to match""" i = 0 hamids = {} --- 35,40 ---- print >> sys.stderr, __doc__.strip() % globals() ! def extractmessages(features, mapdb, hamfile, spamfile): ! """extract messages which contain given features""" i = 0 hamids = {} *************** *** 87,91 **** mapfile = spamfile = hamfile = None ! features = [] for opt, arg in opts: if opt in ("-h", "--help"): --- 87,91 ---- mapfile = spamfile = hamfile = None ! features = set() for opt, arg in opts: if opt in ("-h", "--help"): *************** *** 99,103 **** spamfile = arg elif opt in ("-f", "--feature"): ! features.append(arg) if hamfile is None and spamfile is None: --- 99,103 ---- spamfile = arg elif opt in ("-f", "--feature"): ! features.add(arg) if hamfile is None and spamfile is None: *************** *** 128,132 **** features = [e.rsplit(": ", 1)[0] for e in evidence.split("; ")[2:]] ! features = [eval(f) for f in features] if not features: usage("No X-Spambayes-Evidence headers found") --- 128,132 ---- features = [e.rsplit(": ", 1)[0] for e in evidence.split("; ")[2:]] ! features = set([eval(f) for f in features]) if not features: usage("No X-Spambayes-Evidence headers found") *************** *** 138,142 **** hamfile = file(hamfile, "w") ! extractmessages(mapd, hamfile, spamfile, features) if __name__ == "__main__": --- 138,142 ---- hamfile = file(hamfile, "w") ! extractmessages(features, mapd, hamfile, spamfile) if __name__ == "__main__": From montanaro at users.sourceforge.net Tue Jan 6 10:47:28 2004 From: montanaro at users.sourceforge.net (Skip Montanaro) Date: Tue Jan 6 10:47:31 2004 Subject: [Spambayes-checkins] spambayes/utilities mkreversemap.py,1.1,1.2 Message-ID: Update of /cvsroot/spambayes/spambayes/utilities In directory sc8-pr-cvs1:/tmp/cvs-serv27380 Modified Files: mkreversemap.py Log Message: tweak usage message Index: mkreversemap.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/utilities/mkreversemap.py,v retrieving revision 1.1 retrieving revision 1.2 diff -C2 -d -r1.1 -r1.2 *** mkreversemap.py 6 Jan 2004 15:27:18 -0000 1.1 --- mkreversemap.py 6 Jan 2004 15:47:26 -0000 1.2 *************** *** 4,14 **** Create mapping from features to message ids ! usage %(prog)s [ -h ] -t ham|spam -d mapfile mailbox ... ! -d mapfile - identify file which will hold mapping information -t ham|spam - identify the type of messages in the input mailbox(es) -h - print this documentation and exit """ --- 4,17 ---- Create mapping from features to message ids ! usage %(prog)s [ options ] mailbox ... ! -d mapfile - identify file which will hold mapping information (required) -t ham|spam - identify the type of messages in the input mailbox(es) -h - print this documentation and exit + + One of '-t ham' or '-t spam' must be given, as must one or more message + sources. """ From montanaro at users.sourceforge.net Tue Jan 6 12:17:24 2004 From: montanaro at users.sourceforge.net (Skip Montanaro) Date: Tue Jan 6 12:17:27 2004 Subject: [Spambayes-checkins] spambayes/utilities mkreversemap.py,1.2,1.3 Message-ID: Update of /cvsroot/spambayes/spambayes/utilities In directory sc8-pr-cvs1:/tmp/cvs-serv16382 Modified Files: mkreversemap.py Log Message: forgot that bigrams need to be generated Index: mkreversemap.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/utilities/mkreversemap.py,v retrieving revision 1.2 retrieving revision 1.3 diff -C2 -d -r1.2 -r1.3 *** mkreversemap.py 6 Jan 2004 15:47:26 -0000 1.2 --- mkreversemap.py 6 Jan 2004 17:17:22 -0000 1.3 *************** *** 23,26 **** --- 23,28 ---- from spambayes.mboxutils import getmbox from spambayes.tokenizer import tokenize + from spambayes.Options import options + from spambayes.classifier import Classifier prog = sys.argv[0] *************** *** 51,54 **** --- 53,68 ---- spam[f] = msgids mapdb[t] = (ham, spam) + if options["Classifier", "x-use_bigrams"]: + for t in Classifier()._enhance_wordstream(tokenize(msg)): + ham, spam = mapdb.get(t, ({}, {})) + if mboxtype == "ham": + msgids = ham.get(f, set()) + msgids.add(msgid) + ham[f] = msgids + else: + msgids = spam.get(f, set()) + msgids.add(msgid) + spam[f] = msgids + mapdb[t] = (ham, spam) sys.stdout.write("\n") From anadelonbrin at users.sourceforge.net Wed Jan 7 01:04:46 2004 From: anadelonbrin at users.sourceforge.net (Tony Meyer) Date: Wed Jan 7 01:04:48 2004 Subject: [Spambayes-checkins] spambayes/spambayes ProxyUI.py,1.37,1.38 Message-ID: Update of /cvsroot/spambayes/spambayes/spambayes In directory sc8-pr-cvs1:/tmp/cvs-serv5637/spambayes Modified Files: ProxyUI.py Log Message: When making the changes that allow messages on a review page to be sorted by column, I introduced a bug that could cause problems in separating the messages by day (which is done before the other sort). Problem found and patch provided by Brendon Whateley, thanks. Fixes [ 872044 ] HTTP review page date problems Index: ProxyUI.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/spambayes/ProxyUI.py,v retrieving revision 1.37 retrieving revision 1.38 diff -C2 -d -r1.37 -r1.38 *** ProxyUI.py 5 Jan 2004 17:44:33 -0000 1.37 --- ProxyUI.py 7 Jan 2004 06:04:44 -0000 1.38 *************** *** 225,228 **** --- 225,232 ---- # Fetch all the message keys allKeys = state.unknownCorpus.keys() + # We have to sort here to split into days. + # Later on, we also sort the messages that will be on the page + # (by whatever column we wish). + allKeys.sort() # The default start timestamp is derived from the most recent message, From anadelonbrin at users.sourceforge.net Wed Jan 7 01:19:10 2004 From: anadelonbrin at users.sourceforge.net (Tony Meyer) Date: Wed Jan 7 01:19:13 2004 Subject: [Spambayes-checkins] website related.ht, 1.13, 1.14 server_side.ht, 1.3, 1.4 Message-ID: Update of /cvsroot/spambayes/website In directory sc8-pr-cvs1:/tmp/cvs-serv7816 Modified Files: related.ht server_side.ht Log Message: Add link to TUGAST on related page, and link to related page from server-side. Index: related.ht =================================================================== RCS file: /cvsroot/spambayes/website/related.ht,v retrieving revision 1.13 retrieving revision 1.14 diff -C2 -d -r1.13 -r1.14 *** related.ht 18 Aug 2003 23:13:17 -0000 1.13 --- related.ht 7 Jan 2004 06:19:08 -0000 1.14 *************** *** 63,67 **** the normal SpamBayes web interface for training. The 'punishment' takes the form of slowing down the SMTP connection. !

Other Commercial Products using "Bayesian" style filtering

--- 63,70 ---- the normal SpamBayes web interface for training. The 'punishment' takes the form of slowing down the SMTP connection. !
  • TUGAST ! is a server-side anti-spam (open source) filter based on ! SpamBayes, started by Simone Piunno, who is "open for discussion, help, ! critics, everything.".

    Other Commercial Products using "Bayesian" style filtering

    Index: server_side.ht =================================================================== RCS file: /cvsroot/spambayes/website/server_side.ht,v retrieving revision 1.3 retrieving revision 1.4 diff -C2 -d -r1.3 -r1.4 *** server_side.ht 15 Oct 2003 04:44:41 -0000 1.3 --- server_side.ht 7 Jan 2004 06:19:08 -0000 1.4 *************** *** 6,10 ****

    This page includes notes from users that have successfully managed to ! get SpamBayes working server-side.

    postfix notes from Jonathan St-Andre

    --- 6,12 ----

    This page includes notes from users that have successfully managed to ! get SpamBayes working server-side. You should also see the ! related projects page for server-side projects ! based on SpamBayes.

    postfix notes from Jonathan St-Andre

    From montanaro at users.sourceforge.net Wed Jan 7 17:01:16 2004 From: montanaro at users.sourceforge.net (Skip Montanaro) Date: Wed Jan 7 17:01:19 2004 Subject: [Spambayes-checkins] spambayes/testtools table.py,1.2,1.3 Message-ID: Update of /cvsroot/spambayes/spambayes/testtools In directory sc8-pr-cvs1:/tmp/cvs-serv30485 Modified Files: table.py Log Message: space the table out a little more - how often do people display more than two or three columns anyway? Index: table.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/testtools/table.py,v retrieving revision 1.2 retrieving revision 1.3 diff -C2 -d -r1.2 -r1.3 *** table.py 17 Jan 2003 06:42:54 -0000 1.2 --- table.py 7 Jan 2004 22:01:14 -0000 1.3 *************** *** 145,191 **** filename = filename[filename.rfind("\\")+1:] if len(fname) > len(fnam2): ! fname += " " ! fname = fname[0:(len(fnam2) + 8)] ! fnam2 += " %7s" % filename else: ! fnam2 += " " ! fnam2 = fnam2[0:(len(fname) + 8)] ! fname += " %7s" % filename if len(ratio) > len(rat2): ! ratio += " " ! ratio = ratio[0:(len(rat2) + 8)] ! rat2 += " %7s" % ("%d:%d" % (htest, stest)) else: ! rat2 += " " ! rat2 = rat2[0:(len(ratio) + 8)] ! ratio += " %7s" % ("%d:%d" % (htest, stest)) ! fptot += "%8d" % fp tfptot += fp ! fpper += "%8.2f" % fpp tfpper += fpp ! fntot += "%8d" % fn tfntot += fn ! fnper += "%8.2f" % fnp tfnper += fnp ! untot += "%8d" % un tuntot += un ! unper += "%8.2f" % unp tunper += unp ! rcost += "%8s" % ("$%.2f" % cost) trcost += cost ! bcost += "%8s" % ("$%.2f" % bestcost) tbcost += bestcost ! hmean += "%8.2f" % hamdevall[0] thmean += hamdevall[0] ! hsdev += "%8.2f" % hamdevall[1] thsdev += hamdevall[1] ! smean += "%8.2f" % spamdevall[0] tsmean += spamdevall[0] ! ssdev += "%8.2f" % spamdevall[1] tssdev += spamdevall[1] ! meand += "%8.2f" % (spamdevall[0] - hamdevall[0]) tmeand += (spamdevall[0] - hamdevall[0]) k = (spamdevall[0] - hamdevall[0]) / (spamdevall[1] + hamdevall[1]) ! kval += "%8.2f" % k tkval += k --- 145,191 ---- filename = filename[filename.rfind("\\")+1:] if len(fname) > len(fnam2): ! fname += " " ! fname = fname[0:(len(fnam2) + 12)] ! fnam2 += " %11s" % filename else: ! fnam2 += " " ! fnam2 = fnam2[0:(len(fname) + 12)] ! fname += " %11s" % filename if len(ratio) > len(rat2): ! ratio += " " ! ratio = ratio[0:(len(rat2) + 12)] ! rat2 += " %11s" % ("%d:%d" % (htest, stest)) else: ! rat2 += " " ! rat2 = rat2[0:(len(ratio) + 12)] ! ratio += " %11s" % ("%d:%d" % (htest, stest)) ! fptot += "%12d" % fp tfptot += fp ! fpper += "%12.2f" % fpp tfpper += fpp ! fntot += "%12d" % fn tfntot += fn ! fnper += "%12.2f" % fnp tfnper += fnp ! untot += "%12d" % un tuntot += un ! unper += "%12.2f" % unp tunper += unp ! rcost += "%12s" % ("$%.2f" % cost) trcost += cost ! bcost += "%12s" % ("$%.2f" % bestcost) tbcost += bestcost ! hmean += "%12.2f" % hamdevall[0] thmean += hamdevall[0] ! hsdev += "%12.2f" % hamdevall[1] thsdev += hamdevall[1] ! smean += "%12.2f" % spamdevall[0] tsmean += spamdevall[0] ! ssdev += "%12.2f" % spamdevall[1] tssdev += spamdevall[1] ! meand += "%12.2f" % (spamdevall[0] - hamdevall[0]) tmeand += (spamdevall[0] - hamdevall[0]) k = (spamdevall[0] - hamdevall[0]) / (spamdevall[1] + hamdevall[1]) ! kval += "%12.2f" % k tkval += k From anadelonbrin at users.sourceforge.net Wed Jan 7 18:54:20 2004 From: anadelonbrin at users.sourceforge.net (Tony Meyer) Date: Wed Jan 7 18:54:22 2004 Subject: [Spambayes-checkins] spambayes README.txt,1.63,1.64 Message-ID: Update of /cvsroot/spambayes/spambayes In directory sc8-pr-cvs1:/tmp/cvs-serv25301 Modified Files: README.txt Log Message: Fix [ 805852 ] need python-dev package on Debian Index: README.txt =================================================================== RCS file: /cvsroot/spambayes/spambayes/README.txt,v retrieving revision 1.63 retrieving revision 1.64 diff -C2 -d -r1.63 -r1.64 *** README.txt 5 Jan 2004 14:34:31 -0000 1.63 --- README.txt 7 Jan 2004 23:54:17 -0000 1.64 *************** *** 55,58 **** --- 55,62 ---- . + When installing SpamBayes on some *nix systems, such as Debian, you may need + to install the python-dev package. This can be done with a command like + "apt-get install python-dev" (this may vary between distributions). + Getting the software From anadelonbrin at users.sourceforge.net Wed Jan 7 19:48:49 2004 From: anadelonbrin at users.sourceforge.net (Tony Meyer) Date: Wed Jan 7 19:48:52 2004 Subject: [Spambayes-checkins] spambayes/windows/py2exe setup_all.py, 1.13, 1.14 Message-ID: Update of /cvsroot/spambayes/spambayes/windows/py2exe In directory sc8-pr-cvs1:/tmp/cvs-serv4411/windows/py2exe Modified Files: setup_all.py Log Message: Generate the dialogs.resources.dialogs.py file so that setup_all.py works with a fresh setup. Index: setup_all.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/windows/py2exe/setup_all.py,v retrieving revision 1.13 retrieving revision 1.14 diff -C2 -d -r1.13 -r1.14 *** setup_all.py 30 Dec 2003 01:57:18 -0000 1.13 --- setup_all.py 8 Jan 2004 00:48:47 -0000 1.14 *************** *** 10,13 **** --- 10,17 ---- sys.path.append(os.path.join(sb_top_dir, "Outlook2000/sandbox")) + # Generate the dialogs.py file. + import dialogs + dialogs.LoadDialogs() + # ModuleFinder can't handle runtime changes to __path__, but win32com uses them, # particularly for people who build from sources. Hook this in. From kpitt at users.sourceforge.net Thu Jan 8 15:35:55 2004 From: kpitt at users.sourceforge.net (Kenny Pitt) Date: Thu Jan 8 15:35:58 2004 Subject: [Spambayes-checkins] spambayes/windows autoconfigure.py,1.10,1.11 Message-ID: Update of /cvsroot/spambayes/spambayes/windows In directory sc8-pr-cvs1:/tmp/cvs-serv29776 Modified Files: autoconfigure.py Log Message: Mark went to great pains to remove win32ui/MFC dependencies from the Outlook plugin, and it would be a shame to regrow them here just for a few message boxes. Do the same thing that win32ui.MessageBox would have done using only win32gui and win32api. While we're at it, add some question and error icons to the message boxes. Index: autoconfigure.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/windows/autoconfigure.py,v retrieving revision 1.10 retrieving revision 1.11 diff -C2 -d -r1.10 -r1.11 *** autoconfigure.py 5 Jan 2004 02:18:29 -0000 1.10 --- autoconfigure.py 8 Jan 2004 20:35:53 -0000 1.11 *************** *** 67,71 **** try: ! import win32ui import win32api import win32con --- 67,71 ---- try: ! import win32gui import win32api import win32con *************** *** 77,81 **** # fail. (And having "import win32api" in lots of functions # didn't seem to make much sense). ! win32api = win32con = shell = shellcon = win32ui = pywintypes = None from spambayes import oe_mailbox --- 77,81 ---- # fail. (And having "import win32api" in lots of functions # didn't seem to make much sense). ! win32api = win32con = shell = shellcon = win32gui = pywintypes = None from spambayes import oe_mailbox *************** *** 737,750 **** "only do this if you know how to re-setup %s " \ "if necessary.)" % (mailer, mailer, mailer) ! ans = win32ui.MessageBox(confirm_text, "Configure?", ! win32con.MB_YESNO) if ans == win32con.IDYES: results = configure(mailer) if results is None: ! win32ui.MessageBox("Configuration unsuccessful.", "Error", ! win32con.MB_OK) else: text = "Configuration complete.\n\n" + "\n".join(results) ! win32ui.MessageBox(text, "Complete", win32con.MB_OK) --- 737,785 ---- "only do this if you know how to re-setup %s " \ "if necessary.)" % (mailer, mailer, mailer) ! ans = MessageBox(confirm_text, "Configure?", ! win32con.MB_YESNO | win32con.MB_ICONQUESTION) if ans == win32con.IDYES: results = configure(mailer) if results is None: ! MessageBox("Configuration unsuccessful.", "Error", ! win32con.MB_OK | win32con.MB_ICONERROR) else: text = "Configuration complete.\n\n" + "\n".join(results) ! MessageBox(text, "Complete", win32con.MB_OK) ! ! def GetConsoleHwnd(): ! """Returns the window handle of the console window in which this script is ! running, or 0 if not running in a console window. This function is taken ! directly from Pythonwin\dllmain.cpp in the win32all source, ported to ! Python.""" ! ! # fetch current window title ! try: ! oldWindowTitle = win32api.GetConsoleTitle() ! except: ! return 0 ! ! # format a "unique" NewWindowTitle ! newWindowTitle = "%d/%d" % (win32api.GetTickCount(), ! win32api.GetCurrentProcessId()) ! ! # change current window title ! win32api.SetConsoleTitle(newWindowTitle) ! ! # ensure window title has been updated ! import time ! time.sleep(0.040) ! ! # look for NewWindowTitle ! hwndFound = win32gui.FindWindow(0, newWindowTitle) ! ! # restore original window title ! win32api.SetConsoleTitle(oldWindowTitle) ! ! return hwndFound ! ! hwndOwner = GetConsoleHwnd() ! def MessageBox(message, title=None, style=win32con.MB_OK): ! return win32gui.MessageBox(hwndOwner, message, title, style) From kpitt at users.sourceforge.net Thu Jan 8 15:41:20 2004 From: kpitt at users.sourceforge.net (Kenny Pitt) Date: Thu Jan 8 15:41:27 2004 Subject: [Spambayes-checkins] spambayes/windows/py2exe setup_all.py, 1.14, 1.15 Message-ID: Update of /cvsroot/spambayes/spambayes/windows/py2exe In directory sc8-pr-cvs1:/tmp/cvs-serv31500 Modified Files: setup_all.py Log Message: Modified autoconfigure.py so that we are back to not needing win32ui. Index: setup_all.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/windows/py2exe/setup_all.py,v retrieving revision 1.14 retrieving revision 1.15 diff -C2 -d -r1.14 -r1.15 *** setup_all.py 8 Jan 2004 00:48:47 -0000 1.14 --- setup_all.py 8 Jan 2004 20:41:18 -0000 1.15 *************** *** 35,39 **** py2exe_options = dict( packages = "spambayes.resources,encodings", ! excludes = "pywin,pywin.debugger", # pywin is a package, and still seems to be included. includes = "dialogs.resources.dialogs", # Outlook dynamic dialogs dll_excludes = "dapi.dll,mapi32.dll", --- 35,39 ---- py2exe_options = dict( packages = "spambayes.resources,encodings", ! excludes = "win32ui,pywin,pywin.debugger", # pywin is a package, and still seems to be included. includes = "dialogs.resources.dialogs", # Outlook dynamic dialogs dll_excludes = "dapi.dll,mapi32.dll", From tim_one at users.sourceforge.net Thu Jan 8 23:19:53 2004 From: tim_one at users.sourceforge.net (Tim Peters) Date: Thu Jan 8 23:19:57 2004 Subject: [Spambayes-checkins] website related.ht,1.14,1.15 Message-ID: Update of /cvsroot/spambayes/website In directory sc8-pr-cvs1:/tmp/cvs-serv15613/website Modified Files: related.ht Log Message: Added Richie's SpamBayes Wiki to the Related Websites section. Index: related.ht =================================================================== RCS file: /cvsroot/spambayes/website/related.ht,v retrieving revision 1.14 retrieving revision 1.15 diff -C2 -d -r1.14 -r1.15 *** related.ht 7 Jan 2004 06:19:08 -0000 1.14 --- related.ht 9 Jan 2004 04:19:51 -0000 1.15 *************** *** 5,8 **** --- 5,13 ----

    Related Websites

      +
    • + Our own Richie Hindle runs a + SpamBayes Wiki, + where everyone is welcome to contribute. +
    • Gary Robinson has a well-organized Spam Wiki. From anadelonbrin at users.sourceforge.net Sat Jan 10 19:06:26 2004 From: anadelonbrin at users.sourceforge.net (Tony Meyer) Date: Sat Jan 10 19:06:29 2004 Subject: [Spambayes-checkins] spambayes/spambayes oe_mailbox.py,1.7,1.8 Message-ID: Update of /cvsroot/spambayes/spambayes/spambayes In directory sc8-pr-cvs1:/tmp/cvs-serv17714/spambayes Modified Files: oe_mailbox.py Log Message: I introduced a bug that meant that oe_mailbox couldn't be imported if win32all wasn't available. Fix this (UserInterface.py imports oe_mailbox, among others). Index: oe_mailbox.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/spambayes/oe_mailbox.py,v retrieving revision 1.7 retrieving revision 1.8 diff -C2 -d -r1.7 -r1.8 *** oe_mailbox.py 5 Jan 2004 02:18:29 -0000 1.7 --- oe_mailbox.py 11 Jan 2004 00:06:24 -0000 1.8 *************** *** 550,556 **** return raw ! def OEAccountKeys(permission = win32con.KEY_READ | win32con.KEY_SET_VALUE): """Return registry keys for each of the OE mail accounts, along with information about what type of mail account it is.""" possible_root_keys = [] --- 550,561 ---- return raw ! def OEAccountKeys(permission = None): """Return registry keys for each of the OE mail accounts, along with information about what type of mail account it is.""" + if permission is None: + # Can't do this in the parameter, because then it requires + # win32con to be available for the module to be imported. + permission = win32con.KEY_READ | win32con.KEY_SET_VALUE + possible_root_keys = [] From anadelonbrin at users.sourceforge.net Sat Jan 10 20:34:39 2004 From: anadelonbrin at users.sourceforge.net (Tony Meyer) Date: Sat Jan 10 20:34:42 2004 Subject: [Spambayes-checkins] spambayes/testtools incremental.py, 1.6, 1.7 regimes.py, 1.4, 1.5 Message-ID: Update of /cvsroot/spambayes/spambayes/testtools In directory sc8-pr-cvs1:/tmp/cvs-serv30522/testtools Modified Files: incremental.py regimes.py Log Message: Add a docstring and the ability to print it with -h or --help to incremental.py (interestingly, it already checked for --help and --examples, but neither did anything). Add a docstring to regimes.py that outlines the various regimes in hopefully easy to understand terms (based on a spambayes-dev post by Alex). Print this out if regimes.py is executed. Add a new regime - balanced_corrected. Details have been on spambayes-dev and are in the docstring. Index: incremental.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/testtools/incremental.py,v retrieving revision 1.6 retrieving revision 1.7 diff -C2 -d -r1.6 -r1.7 *** incremental.py 16 Dec 2003 05:06:34 -0000 1.6 --- incremental.py 11 Jan 2004 01:34:36 -0000 1.7 *************** *** 1,2 **** --- 1,14 ---- + """incremental.py + + This is a test harness for doing testing of incremental + training regimes. The individual regimes used should + be specified in regime.py. + + Options: + -h --help Display this message. + -r [regime] Use this regime (default: perfect). + -s [number] Run only this set. + """ + ### ### This is a test harness for doing testing of incremental *************** *** 285,294 **** which = None ! opts, args = getopt.getopt(sys.argv[1:], 's:r:', ['help', 'examples']) for opt, arg in opts: if opt == '-s': which = int(arg) - 1 ! if opt == '-r': regime = arg nsets = len(glob.glob("Data/Ham/Set*")) --- 297,309 ---- which = None ! opts, args = getopt.getopt(sys.argv[1:], 'hs:r:', ['help', 'examples']) for opt, arg in opts: if opt == '-s': which = int(arg) - 1 ! elif opt == '-r': regime = arg + elif opt == '-h' or opt == '--help': + print __doc__ + sys.exit() nsets = len(glob.glob("Data/Ham/Set*")) Index: regimes.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/testtools/regimes.py,v retrieving revision 1.4 retrieving revision 1.5 diff -C2 -d -r1.4 -r1.5 *** regimes.py 22 Dec 2003 23:32:53 -0000 1.4 --- regimes.py 11 Jan 2004 01:34:36 -0000 1.5 *************** *** 1,2 **** --- 1,37 ---- + """regimes.py + + This module is not executable - it contains regime definitions + for use with incremental.py. Pass the name of any regime to + incremental.py with the "-r" switch, and it will be loaded from + this module. + + Existing regimes are: + 'perfect' A train-on-everything regime. The trainer is given + perfect and immediate knowledge of the proper + classification. + 'corrected' A train-on-everything regime. The trainer trusts the + classifier result until end-of-group, at which point + all mistrained and non-trained items (fp, fn, and + unsure) are corrected to be trained with their proper + classification. + 'balanced_corrected' + A partial-training regime. Works just like the + 'corrected' regime, except that if the database is + imbalanced more than 2::1 (or 1::2) then messages are + not used for training. + 'expire4months' This is like 'perfect', except that messages are + untrained after 120 groups have passed. + 'nonedge' A partial-training regime, which trains only on messages + which are not properly classified with scores of 1.00 or + 0.00 (rounded). All false positives and false negatives + *are* trained. + 'fpfnunsure' A partial-training regime, which trains only on + false positives, false negatives and unsures. + 'fnunsure' A partial-training regime, which trains only on + false negatives and unsures. This simulates, for + example, a user who deletes all mail classified as spam + without ever examining it for false positives. + """ + ### ### This is a training regime for the incremental.py harness. *************** *** 52,55 **** --- 87,118 ---- ### ### This is a training regime for the incremental.py harness. + ### It does guess-based training on all messages, as long + ### as the ham::spam ratio stays roughly even (not more than 2::1), + ### followed by correction to perfect at the end of each group. + ### + + class balanced_corrected(corrected): + ratio_maximum = 2.0 + def guess_action(self, which, test, guess, actual, msg): + # In some situations, we just do the 'corrected' regime: + # If we haven't trained any ham/spam (regardless of + # the guess because if all we know is one, everything + # will look like it). + # If the guess is unsure. + if not (guess[0] == 0 or test.nham_trained == 0 or \ + test.nspam_trained == 0): + # Otherwise, we only train if it doesn't screw up the + # balance. + ratio = test.nham_trained / float(test.nspam_trained) + if ratio > self.ratio_maximum and guess[0] == 1: + # Too much ham, and this is ham - don't train. + return 0 + elif ratio < (1/self.ratio_maximum) and guess[0] == -1: + # Too much spam, and this is spam - don't train. + return 0 + return corrected(self, which, test, guess, actual, msg) + + ### + ### This is a training regime for the incremental.py harness. ### It does perfect training for fp, fn, and unsures. ### *************** *** 130,131 **** --- 193,197 ---- self.ham[0].append(msg) return actual + + if __name__ == "__main__": + print __doc__ From anadelonbrin at users.sourceforge.net Sat Jan 10 22:15:10 2004 From: anadelonbrin at users.sourceforge.net (Tony Meyer) Date: Sat Jan 10 22:15:13 2004 Subject: [Spambayes-checkins] spambayes/testtools mkgraph.py,1.3,1.4 Message-ID: Update of /cvsroot/spambayes/spambayes/testtools In directory sc8-pr-cvs1:/tmp/cvs-serv15220/testtools Modified Files: mkgraph.py Log Message: Add a docstring. Add -f command line arg to pass a filename rather than reading from stdin (which is still the default) Add a training_is_ham line to the error graph which shows the percentage of training data that is ham (i.e. shows the imbalance). Modify the outputing so that it can be in different formats for those of us without plotmtv. The -c command line option outputs all the lines in the same set of rows, rather than in their own set as is the default. The -s arg specifies the separator for this sort of output (defaults to a comma, so that csv files are output). Comparing output from this in default mode and the previous mkgraph.py output gives me identical results except for one extra newline at the top, so this should still work with plotmtv, I think (but can't test more than that). Index: mkgraph.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/testtools/mkgraph.py,v retrieving revision 1.3 retrieving revision 1.4 diff -C2 -d -r1.3 -r1.4 *** mkgraph.py 16 Dec 2003 05:06:34 -0000 1.3 --- mkgraph.py 11 Jan 2004 03:15:07 -0000 1.4 *************** *** 1,2 **** --- 1,23 ---- + """ + This takes incremental.py output and outputs a file to be + used to create a graph (by default by plotmtv). + + Options: + -h Display this message. + -r [report type] Output this type of report. + Currently supported: "error", "counts" + Defaults to "error". + -s [number] Span of days to average counts over. + If not specified then culmulative counts are + output (this is the default). + -f [file] Input file (if not specified, stdin is used) + -c Rather than outputting in plotmtv format, + where each line is described separately, + output each line in a separate column, which + is easier to create an Excel graph from. + -s [sep] If -c is used, then this is the column + separator (defaults to comma). + """ + import sys import getopt *************** *** 17,31 **** nspam_unsure = [] ! def line(vals): ! global span ! for k in range(0, len(vals)): ! n = vals[k] ! if span and k - span >= 0: ! n -= vals[k - span] ! print '%d %d' % (k, n) ! print ! ! ! def outputset(): global report global span --- 38,42 ---- nspam_unsure = [] ! def outputset(Output): global report global span *************** *** 51,84 **** if report == "counts": ! print '$ Data=Curve2d name="%s Counts"' % (title) ! print '% linetype=1 linelabel="ham_tested" markertype=0 linecolor=0' ! line(nham_tested) ! print '% linetype=1 linelabel="ham_trained" markertype=0 linecolor=1' ! line(nham_trained) ! print '% linetype=1 linelabel="ham_right" markertype=0 linecolor=2' ! line(nham_right) ! print '% linetype=1 linelabel="ham_wrong" markertype=0 linecolor=3' ! line(nham_wrong) ! print '% linetype=1 linelabel="ham_unsure" markertype=0 linecolor=4' ! line(nham_unsure) ! print '% linetype=1 linelabel="spam_tested" markertype=0 linecolor=5' ! line(nspam_tested) ! print '% linetype=1 linelabel="spam_trained" markertype=0 linecolor=6' ! line(nspam_trained) ! print '% linetype=1 linelabel="spam_right" markertype=0 linecolor=7' ! line(nspam_right) ! print '% linetype=1 linelabel="spam_wrong" markertype=0 linecolor=8' ! line(nspam_wrong) ! print '% linetype=1 linelabel="spam_unsure" markertype=0 linecolor=9' ! line(nspam_unsure) if report == "error": ! print '$ Data=Curve2d' ! print '% toplabel="%s Error Rates"' % (title) ! print '% ymax=5' ! print '% xlabel="Days"' ! print '% ylabel="Percent"' ! print '% linetype=1 linelabel="fp" markertype=0 linecolor=0' ! for k in range(0, len(nham_wrong)): n = nham_wrong[k] d = nham_tested[k] --- 62,86 ---- if report == "counts": ! Output.output_title(title) ! color = 0 ! for data, label in [(nham_tested, "ham_tested"), ! (nham_trained, "ham_trained"), ! (nham_right, "ham_right"), ! (nham_wrong, "ham_wrong"), ! (nham_unsure, "ham_unsure"), ! (nspam_tested, "spam_tested"), ! (nspam_trained, "spam_trained"), ! (nspam_right, "spam_right"), ! (nspam_wrong, "spam_wrong"), ! (nspam_unsure, "spam_unsure"), ! ]: ! Output.add_line(data, linelabel=label, linecolor=color) ! color += 1 ! Output.output() if report == "error": ! Output.output_title(title) ! Output.line_title(linelabel="fp", linecolor=0) ! for k in xrange(len(nham_wrong)): n = nham_wrong[k] d = nham_tested[k] *************** *** 86,93 **** n -= nham_wrong[k - span] d -= nham_tested[k - span] ! print '%d %f' % (k, (n * 100.0 / (d or 1))) ! print ! print '% linetype=1 linelabel="fn" markertype=0 linecolor=1' ! for k in range(0, len(nspam_wrong)): n = nspam_wrong[k] d = nspam_tested[k] --- 88,95 ---- n -= nham_wrong[k - span] d -= nham_tested[k - span] ! Output.add_line(k, (n * 100.0 / (d or 1))) ! ! Output.line_title(linelabel="fn", linecolor=1) ! for k in xrange(len(nspam_wrong)): n = nspam_wrong[k] d = nspam_tested[k] *************** *** 95,102 **** n -= nspam_wrong[k - span] d -= nspam_tested[k - span] ! print '%d %f' % (k, (n * 100.0 / (d or 1))) ! print ! print '% linetype=1 linelabel="unsure" markertype=0 linecolor=2' ! for k in range(0, len(nspam_unsure)): n = nham_unsure[k] + nspam_unsure[k] d = nham_tested[k] + nspam_tested[k] --- 97,104 ---- n -= nspam_wrong[k - span] d -= nspam_tested[k - span] ! Output.add_line(k, (n * 100.0 / (d or 1))) ! ! Output.line_title(linelabel="unsure", linecolor=2) ! for k in xrange(len(nspam_unsure)): n = nham_unsure[k] + nspam_unsure[k] d = nham_tested[k] + nspam_tested[k] *************** *** 104,109 **** n -= nham_unsure[k - span] + nspam_unsure[k - span] d -= nham_tested[k - span] + nspam_tested[k - span] ! print '%d %f' % (k, (n * 100.0 / (d or 1))) ! print set = "" --- 106,120 ---- n -= nham_unsure[k - span] + nspam_unsure[k - span] d -= nham_tested[k - span] + nspam_tested[k - span] ! Output.add_line(k, (n * 100.0 / (d or 1))) ! ! Output.line_title(linelabel="training_is_ham", linecolor=3) ! for k in xrange(len(nspam_unsure)): ! n = nham_trained[k] ! d = nham_trained[k] + nspam_trained[k] ! if span and k - span >= 0: ! n -= nham_trained[k - span] ! d -= nham_trained[k - span] + nspam_trained[k - span] ! Output.add_line(k, (n * 100.0 / (d or 1))) ! Output.output() set = "" *************** *** 119,122 **** --- 130,213 ---- nspam_unsure = [] + class SetOutputter(object): + """Class to output set data in the correct format.""" + def __init__(self, sep=',', immediate_print=False): + self.sep = sep + self.immediate_print = immediate_print + self.reset() + + def output_title(self, title): + if self.immediate_print: + title = '$ Data=Curve2d name="%s Counts"' % (title) + print title + if not self.immediate_print: + print self.sep.join(["group", "ham_tested", "ham_trained", + "ham_right", "ham_wrong", "ham_unsure", + "spam_tested", "spam_trained", + "spam_right", "spam_wrong", + "spam_unsure"]) + + def add_line(self, vals, linetype=1, linelabel="", markertype=0, + linecolor=0): + if self.immediate_print: + print + print '%% linetype=%d linelabel="%s" markertype=%d linecolor=%s' % \ + (linetype, linelabel, markertype, linecolor) + for k in xrange(len(vals)): + n = vals[k] + if span and k - span >= 0: + n -= vals[k - span] + if self.lines.has_key(k): + self.lines[k].append(str(n)) + else: + self.lines[k] = [str(n)] + if self.immediate_print: + print '%d %d' % (k, n) + + def output(self): + if not self.immediate_print: + keys = self.lines.keys() + keys.sort() + for k in keys: + vals = [str(k)] + vals.extend(self.lines.get(k, [])) + print self.sep.join(vals) + else: + print + self.reset() + + def reset(self): + self.lines = {} + + class ErrorSetOutputter(SetOutputter): + """Class to output set error data in the correct format.""" + def output_title(self, title): + if self.immediate_print: + print '$ Data=Curve2d' + print '%% toplabel="%s Error Rates"' % (title) + print '% ymax=5' + print '% xlabel="Days"' + print '% ylabel="Percent"' + else: + print title + print self.sep.join(["group", "fp", "fn", "unsure", + "training_is_ham"]) + + def line_title(self, linetype=1, linelabel="", markertype=0, + linecolor=0): + if self.immediate_print: + print '\n%% linetype=%d linelabel="%s" markertype=%d ' \ + 'linecolor=%d' % (linetype, linelabel, markertype, + linecolor) + + def add_line(self, k, v): + if self.immediate_print: + print '%d %f' % (k, v) + else: + if self.lines.has_key(k): + self.lines[k].append(str(v)) + else: + self.lines[k] = [str(v)] + def main(): global report *************** *** 134,143 **** global nspam_unsure ! opts, args = getopt.getopt(sys.argv[1:], 's:r:') for opt, arg in opts: if opt == '-s': span = int(arg) ! if opt == '-r': report = arg if report not in ("error", "counts"): --- 225,246 ---- global nspam_unsure ! filename = None ! sep = ',' ! all_together = False ! opts, args = getopt.getopt(sys.argv[1:], 's:r:f:hcs:') for opt, arg in opts: if opt == '-s': span = int(arg) ! elif opt == '-r': report = arg + elif opt == '-f': + filename = arg + elif opt == '-c': + all_together = True + elif opt == '-s': + sep = arg + elif opt == '-h': + print __doc__ + sys.exit() if report not in ("error", "counts"): *************** *** 145,150 **** sys.exit(1) while 1: ! line = sys.stdin.readline() if line == "": break --- 248,263 ---- sys.exit(1) + if report == "counts": + Output = SetOutputter(sep, not all_together) + elif report == "error": + Output = ErrorSetOutputter(sep, not all_together) + + if filename: + source = file(filename) + else: + source = sys.stdin + while 1: ! line = source.readline() if line == "": break *************** *** 152,156 **** line = line[:-1] if line.startswith("Set "): ! outputset() set = line[4:] if len(line) > 0 and (line[0] in ('0', '1', '2', '3', '4', '5', '6', '7', '8', '9')): --- 265,269 ---- line = line[:-1] if line.startswith("Set "): ! outputset(Output) set = line[4:] if len(line) > 0 and (line[0] in ('0', '1', '2', '3', '4', '5', '6', '7', '8', '9')): *************** *** 167,171 **** nspam_unsure.append(int(vals[9])) ! outputset() if __name__ == "__main__": --- 280,284 ---- nspam_unsure.append(int(vals[9])) ! outputset(Output) if __name__ == "__main__": From anadelonbrin at users.sourceforge.net Sat Jan 10 22:37:57 2004 From: anadelonbrin at users.sourceforge.net (Tony Meyer) Date: Sat Jan 10 22:38:01 2004 Subject: [Spambayes-checkins] spambayes/testtools .cvsignore,NONE,1.1 Message-ID: Update of /cvsroot/spambayes/spambayes/testtools In directory sc8-pr-cvs1:/tmp/cvs-serv19624/testtools Added Files: .cvsignore Log Message: Ignore autogenerated and configuration files. --- NEW FILE: .cvsignore --- *.py[co] *.ini From anadelonbrin at users.sourceforge.net Sat Jan 10 22:39:58 2004 From: anadelonbrin at users.sourceforge.net (Tony Meyer) Date: Sat Jan 10 22:40:01 2004 Subject: [Spambayes-checkins] spambayes/testtools regimes.py,1.5,1.6 Message-ID: Update of /cvsroot/spambayes/spambayes/testtools In directory sc8-pr-cvs1:/tmp/cvs-serv19948/testtools Modified Files: regimes.py Log Message: Was calling the parent class, not the parent class's function. Index: regimes.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/testtools/regimes.py,v retrieving revision 1.5 retrieving revision 1.6 diff -C2 -d -r1.5 -r1.6 *** regimes.py 11 Jan 2004 01:34:36 -0000 1.5 --- regimes.py 11 Jan 2004 03:39:56 -0000 1.6 *************** *** 111,115 **** # Too much spam, and this is spam - don't train. return 0 ! return corrected(self, which, test, guess, actual, msg) ### --- 111,115 ---- # Too much spam, and this is spam - don't train. return 0 ! return corrected.guess_action(self, which, test, guess, actual, msg) ### From anthonybaxter at users.sourceforge.net Mon Jan 12 01:46:35 2004 From: anthonybaxter at users.sourceforge.net (Anthony Baxter) Date: Mon Jan 12 01:46:41 2004 Subject: [Spambayes-checkins] spambayes/spambayes Options.py, 1.98, 1.99 ProxyUI.py, 1.38, 1.39 Message-ID: Update of /cvsroot/spambayes/spambayes/spambayes In directory sc8-pr-cvs1:/tmp/cvs-serv5305 Modified Files: Options.py ProxyUI.py Log Message: New options ham_discard_level and spam_discard_level. These make the interface default to discard hams/spams in the training interface. Obviously, if you have the system training on everything as it filters, it will make no difference at all. Index: Options.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/spambayes/Options.py,v retrieving revision 1.98 retrieving revision 1.99 diff -C2 -d -r1.98 -r1.99 *** Options.py 5 Jan 2004 17:40:25 -0000 1.98 --- Options.py 12 Jan 2004 06:46:33 -0000 1.99 *************** *** 911,914 **** --- 911,924 ---- ("ham", "spam", "discard", "defer"), RESTORE), + ("ham_discard_level", "Ham Discard Level", 0.0, + """Hams scoring less than this percentage will default to being + discarded in the training interface (they won't be trained).""", + REAL, RESTORE), + + ("spam_discard_level", "Spam Discard Level", 100.0, + """Spams scoring more than this percentage will default to being + discarded in the training interface (they won't be trained).""", + REAL, RESTORE), + ("http_authentication", "HTTP Authentication", "None", """This option lets you choose the security level of the web interface. Index: ProxyUI.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/spambayes/ProxyUI.py,v retrieving revision 1.38 retrieving revision 1.39 diff -C2 -d -r1.38 -r1.39 *** ProxyUI.py 7 Jan 2004 06:04:44 -0000 1.38 --- ProxyUI.py 12 Jan 2004 06:46:33 -0000 1.39 *************** *** 139,142 **** --- 139,144 ---- ('html_ui', 'default_spam_action'), ('html_ui', 'default_unsure_action'), + ('html_ui', 'ham_discard_level'), + ('html_ui', 'spam_discard_level'), ('html_ui', 'allow_remote_connections'), ('html_ui', 'http_authentication'), *************** *** 290,298 **** self._getTimeRange(self._keyToTimestamp(key)) row = self.html.reviewRow.clone() if label == 'Spam': ! r_att = getattr(row, options["html_ui", "default_spam_action"]) elif label == 'Ham': ! r_att = getattr(row, options["html_ui", "default_ham_action"]) else: --- 292,307 ---- self._getTimeRange(self._keyToTimestamp(key)) row = self.html.reviewRow.clone() + score = float(messageInfo.score.rstrip('%')) if label == 'Spam': ! if score > options["html_ui", "spam_discard_level"]: ! r_att = getattr(row, 'discard') ! else: ! r_att = getattr(row, options["html_ui", "default_spam_action"]) elif label == 'Ham': ! if score < options["html_ui", "ham_discard_level"]: ! r_att = getattr(row, 'discard') ! else: ! r_att = getattr(row, options["html_ui", "default_ham_action"]) else: From anthonybaxter at users.sourceforge.net Mon Jan 12 01:49:39 2004 From: anthonybaxter at users.sourceforge.net (Anthony Baxter) Date: Mon Jan 12 01:49:43 2004 Subject: [Spambayes-checkins] spambayes/spambayes Options.py,1.99,1.100 Message-ID: Update of /cvsroot/spambayes/spambayes/spambayes In directory sc8-pr-cvs1:/tmp/cvs-serv5752 Modified Files: Options.py Log Message: note requirement of turn off train when filtering Index: Options.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/spambayes/Options.py,v retrieving revision 1.99 retrieving revision 1.100 diff -C2 -d -r1.99 -r1.100 *** Options.py 12 Jan 2004 06:46:33 -0000 1.99 --- Options.py 12 Jan 2004 06:49:37 -0000 1.100 *************** *** 913,922 **** ("ham_discard_level", "Ham Discard Level", 0.0, """Hams scoring less than this percentage will default to being ! discarded in the training interface (they won't be trained).""", REAL, RESTORE), ("spam_discard_level", "Spam Discard Level", 100.0, """Spams scoring more than this percentage will default to being ! discarded in the training interface (they won't be trained).""", REAL, RESTORE), --- 913,926 ---- ("ham_discard_level", "Ham Discard Level", 0.0, """Hams scoring less than this percentage will default to being ! discarded in the training interface (they won't be trained). You'll ! need to turn off the 'Train when filtering' option, above, for this ! to have any effect""", REAL, RESTORE), ("spam_discard_level", "Spam Discard Level", 100.0, """Spams scoring more than this percentage will default to being ! discarded in the training interface (they won't be trained). You'll ! need to turn off the 'Train when filtering' option, above, for this ! to have any effect""", REAL, RESTORE), From anadelonbrin at users.sourceforge.net Mon Jan 12 03:36:17 2004 From: anadelonbrin at users.sourceforge.net (Tony Meyer) Date: Mon Jan 12 03:36:21 2004 Subject: [Spambayes-checkins] spambayes/contrib mod_spambayes.py, 1.3, 1.4 spamcounts.py, 1.5, 1.6 Message-ID: Update of /cvsroot/spambayes/spambayes/contrib In directory sc8-pr-cvs1:/tmp/cvs-serv21286/contrib Modified Files: mod_spambayes.py spamcounts.py Log Message: Well, it's been some time, and no-one disagreed, and three people liked the idea, so here it is. MAY BREAK THINGS! >From now, path/file options are not relative to the current working directory, they are relative to the last configuration file loaded. This still defaults to the current working directory (except on Windows with win32all, where it's still the User Application Data directory). So this should mean no noticeable change, unless your last configuration file is not in the current working directory, and your data is. Index: mod_spambayes.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/contrib/mod_spambayes.py,v retrieving revision 1.3 retrieving revision 1.4 diff -C2 -d -r1.3 -r1.4 *** mod_spambayes.py 16 Dec 2003 05:06:33 -0000 1.3 --- mod_spambayes.py 12 Jan 2004 08:36:15 -0000 1.4 *************** *** 12,16 **** from spambayes import hammie, Options, mboxutils ! dbf = os.path.expanduser(Options.options["Storage", "persistent_storage_file"]) class SpambayesFilter(BufferAllFilter): --- 12,16 ---- from spambayes import hammie, Options, mboxutils ! bdf = Options.get_pathname_option("Storage", "persistent_storage_file") class SpambayesFilter(BufferAllFilter): Index: spamcounts.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/contrib/spamcounts.py,v retrieving revision 1.5 retrieving revision 1.6 diff -C2 -d -r1.5 -r1.6 *** spamcounts.py 24 Dec 2003 04:08:38 -0000 1.5 --- spamcounts.py 12 Jan 2004 08:36:15 -0000 1.6 *************** *** 25,29 **** import csv ! from spambayes.Options import options from spambayes.tokenizer import tokenize from spambayes.storage import STATE_KEY --- 25,29 ---- import csv ! from spambayes.Options import options, get_pathname_option from spambayes.tokenizer import tokenize from spambayes.storage import STATE_KEY *************** *** 97,101 **** usere = False ! dbname = options["Storage", "persistent_storage_file"] ispickle = not options["Storage", "persistent_use_database"] tokenizestdin = False --- 97,101 ---- usere = False ! dbname = get_pathname_option("Storage", "persistent_storage_file") ispickle = not options["Storage", "persistent_use_database"] tokenizestdin = False From anadelonbrin at users.sourceforge.net Mon Jan 12 03:36:17 2004 From: anadelonbrin at users.sourceforge.net (Tony Meyer) Date: Mon Jan 12 03:36:23 2004 Subject: [Spambayes-checkins] spambayes/scripts sb_filter.py, 1.8, 1.9 sb_imapfilter.py, 1.17, 1.18 sb_server.py, 1.17, 1.18 sb_xmlrpcserver.py, 1.3, 1.4 Message-ID: Update of /cvsroot/spambayes/spambayes/scripts In directory sc8-pr-cvs1:/tmp/cvs-serv21286/scripts Modified Files: sb_filter.py sb_imapfilter.py sb_server.py sb_xmlrpcserver.py Log Message: Well, it's been some time, and no-one disagreed, and three people liked the idea, so here it is. MAY BREAK THINGS! >From now, path/file options are not relative to the current working directory, they are relative to the last configuration file loaded. This still defaults to the current working directory (except on Windows with win32all, where it's still the User Application Data directory). So this should mean no noticeable change, unless your last configuration file is not in the current working directory, and your data is. Index: sb_filter.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/scripts/sb_filter.py,v retrieving revision 1.8 retrieving revision 1.9 diff -C2 -d -r1.8 -r1.9 *** sb_filter.py 26 Nov 2003 20:43:43 -0000 1.8 --- sb_filter.py 12 Jan 2004 08:36:15 -0000 1.9 *************** *** 146,151 **** options.merge_files(['/etc/hammierc', os.path.expanduser('~/.hammierc')]) ! self.dbname = options["Storage", "persistent_storage_file"] ! self.dbname = os.path.expanduser(self.dbname) self.usedb = options["Storage", "persistent_use_database"] --- 146,151 ---- options.merge_files(['/etc/hammierc', os.path.expanduser('~/.hammierc')]) ! self.dbname = Options.get_pathname_option("Storage", ! "persistent_storage_file") self.usedb = options["Storage", "persistent_use_database"] Index: sb_imapfilter.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/scripts/sb_imapfilter.py,v retrieving revision 1.17 retrieving revision 1.18 diff -C2 -d -r1.17 -r1.18 *** sb_imapfilter.py 16 Dec 2003 05:06:33 -0000 1.17 --- sb_imapfilter.py 12 Jan 2004 08:36:15 -0000 1.18 *************** *** 101,105 **** from email.Utils import parsedate ! from spambayes.Options import options from spambayes import tokenizer, storage, message, Dibbler from spambayes.UserInterface import UserInterfaceServer --- 101,105 ---- from email.Utils import parsedate ! from spambayes.Options import options, get_pathname_option from spambayes import tokenizer, storage, message, Dibbler from spambayes.UserInterface import UserInterfaceServer *************** *** 698,702 **** sys.exit() ! bdbname = options["Storage", "persistent_storage_file"] useDBM = options["Storage", "persistent_use_database"] doTrain = False --- 698,702 ---- sys.exit() ! bdbname = get_pathname_option("Storage", "persistent_storage_file") useDBM = options["Storage", "persistent_use_database"] doTrain = False Index: sb_server.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/scripts/sb_server.py,v retrieving revision 1.17 retrieving revision 1.18 diff -C2 -d -r1.17 -r1.18 *** sb_server.py 4 Jan 2004 18:26:59 -0000 1.17 --- sb_server.py 12 Jan 2004 08:36:15 -0000 1.18 *************** *** 103,107 **** from spambayes.FileCorpus import FileCorpus, ExpiryFileCorpus from spambayes.FileCorpus import FileMessageFactory, GzipFileMessageFactory ! from spambayes.Options import options from spambayes.UserInterface import UserInterfaceServer from spambayes.ProxyUI import ProxyUserInterface --- 103,107 ---- from spambayes.FileCorpus import FileCorpus, ExpiryFileCorpus from spambayes.FileCorpus import FileMessageFactory, GzipFileMessageFactory ! from spambayes.Options import options, get_pathname_option from spambayes.UserInterface import UserInterfaceServer from spambayes.ProxyUI import ProxyUserInterface *************** *** 725,729 **** options["Storage", "persistent_storage_file"] = \ '_pop3proxy_test.pickle' # This is never saved. ! filename = options["Storage", "persistent_storage_file"] filename = os.path.expanduser(filename) self.bayes = storage.open_storage(filename, self.useDB) --- 725,729 ---- options["Storage", "persistent_storage_file"] = \ '_pop3proxy_test.pickle' # This is never saved. ! filename = get_pathname_option("Storage", "persistent_storage_file") filename = os.path.expanduser(filename) self.bayes = storage.open_storage(filename, self.useDB) *************** *** 743,749 **** # Create/open the Corpuses. Use small cache sizes to avoid hogging # lots of memory. ! map(ensureDir, [options["Storage", "spam_cache"], ! options["Storage", "ham_cache"], ! options["Storage", "unknown_cache"]]) if self.gzipCache: factory = GzipFileMessageFactory() --- 743,750 ---- # Create/open the Corpuses. Use small cache sizes to avoid hogging # lots of memory. ! sc = get_pathname_option("Storage", "spam_cache") ! hc = get_pathname_option("Storage", "ham_cache") ! uc = get_pathname_option("Storage", "unknown_cache") ! map(ensureDir, [sc, hc, uc]) if self.gzipCache: factory = GzipFileMessageFactory() *************** *** 751,768 **** factory = FileMessageFactory() age = options["Storage", "cache_expiry_days"]*24*60*60 ! self.spamCorpus = ExpiryFileCorpus(age, factory, ! options["Storage", ! "spam_cache"], '[0123456789\-]*', cacheSize=20) ! self.hamCorpus = ExpiryFileCorpus(age, factory, ! options["Storage", ! "ham_cache"], '[0123456789\-]*', cacheSize=20) ! self.unknownCorpus = ExpiryFileCorpus(age, factory, ! options["Storage", ! "unknown_cache"], ! '[0123456789\-]*', cacheSize=20) --- 752,763 ---- factory = FileMessageFactory() age = options["Storage", "cache_expiry_days"]*24*60*60 ! self.spamCorpus = ExpiryFileCorpus(age, factory, sc, '[0123456789\-]*', cacheSize=20) ! self.hamCorpus = ExpiryFileCorpus(age, factory, hc, '[0123456789\-]*', cacheSize=20) ! self.unknownCorpus = ExpiryFileCorpus(age, factory, uc, ! '[0123456789\-]*', cacheSize=20) Index: sb_xmlrpcserver.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/scripts/sb_xmlrpcserver.py,v retrieving revision 1.3 retrieving revision 1.4 diff -C2 -d -r1.3 -r1.4 *** sb_xmlrpcserver.py 25 Nov 2003 03:58:14 -0000 1.3 --- sb_xmlrpcserver.py 12 Jan 2004 08:36:15 -0000 1.4 *************** *** 78,83 **** options = Options.options ! dbname = options["Storage", "persistent_storage_file"] ! dbname = os.path.expanduser(dbname) usedb = options["Storage", "persistent_use_database"] for opt, arg in opts: --- 78,83 ---- options = Options.options ! dbname = Options.get_pathname_option("Storage", ! "persistent_storage_file") usedb = options["Storage", "persistent_use_database"] for opt, arg in opts: From anadelonbrin at users.sourceforge.net Mon Jan 12 03:36:17 2004 From: anadelonbrin at users.sourceforge.net (Tony Meyer) Date: Mon Jan 12 03:36:24 2004 Subject: [Spambayes-checkins] spambayes WHAT_IS_NEW.txt,1.23,1.24 Message-ID: Update of /cvsroot/spambayes/spambayes In directory sc8-pr-cvs1:/tmp/cvs-serv21286 Modified Files: WHAT_IS_NEW.txt Log Message: Well, it's been some time, and no-one disagreed, and three people liked the idea, so here it is. MAY BREAK THINGS! >From now, path/file options are not relative to the current working directory, they are relative to the last configuration file loaded. This still defaults to the current working directory (except on Windows with win32all, where it's still the User Application Data directory). So this should mean no noticeable change, unless your last configuration file is not in the current working directory, and your data is. Index: WHAT_IS_NEW.txt =================================================================== RCS file: /cvsroot/spambayes/spambayes/WHAT_IS_NEW.txt,v retrieving revision 1.23 retrieving revision 1.24 diff -C2 -d -r1.23 -r1.24 *** WHAT_IS_NEW.txt 29 Dec 2003 04:46:31 -0000 1.23 --- WHAT_IS_NEW.txt 12 Jan 2004 08:36:14 -0000 1.24 *************** *** 17,21 **** -------------------------- ! o XXX There should be no other incompatible changes (from 1.0a7) in this release. --- 17,32 ---- -------------------------- ! o The way pathnames in option files are handled has changed, as has the ! default values for some pathname options, in some situations. All ! pathnames in option values that are not absolute (with Windows, this ! means they will start with a drive letter) are now relative to the ! directory of the last configuration file to be loaded, rather than to ! the current working directory. ! ! What does this mean for you? Nothing, as long as your pathnames ! (the cache directories and databases, primarily) are either absolute ! or in the same directory as your configuration file. If, after ! upgrading, your database is suddenly empty, then you need to fix your ! configuration so that it points to the correct place. There should be no other incompatible changes (from 1.0a7) in this release. From anadelonbrin at users.sourceforge.net Mon Jan 12 03:36:18 2004 From: anadelonbrin at users.sourceforge.net (Tony Meyer) Date: Mon Jan 12 03:36:26 2004 Subject: [Spambayes-checkins] spambayes/testtools timcv.py, 1.4, 1.5 timtest.py, 1.5, 1.6 weaktest.py, 1.4, 1.5 Message-ID: Update of /cvsroot/spambayes/spambayes/testtools In directory sc8-pr-cvs1:/tmp/cvs-serv21286/testtools Modified Files: timcv.py timtest.py weaktest.py Log Message: Well, it's been some time, and no-one disagreed, and three people liked the idea, so here it is. MAY BREAK THINGS! >From now, path/file options are not relative to the current working directory, they are relative to the last configuration file loaded. This still defaults to the current working directory (except on Windows with win32all, where it's still the User Application Data directory). So this should mean no noticeable change, unless your last configuration file is not in the current working directory, and your data is. Index: timcv.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/testtools/timcv.py,v retrieving revision 1.4 retrieving revision 1.5 diff -C2 -d -r1.4 -r1.5 *** timcv.py 5 Sep 2003 01:15:29 -0000 1.4 --- timcv.py 12 Jan 2004 08:36:15 -0000 1.5 *************** *** 56,60 **** ! from spambayes.Options import options from spambayes import TestDriver from spambayes import msgs --- 56,60 ---- ! from spambayes.Options import options, get_pathname_option from spambayes import TestDriver from spambayes import msgs *************** *** 73,79 **** print options.display() ! hamdirs = [options["TestDriver", "ham_directories"] % \ i for i in range(1, nsets+1)] ! spamdirs = [options["TestDriver", "spam_directories"] % \ i for i in range(1, nsets+1)] --- 73,79 ---- print options.display() ! hamdirs = [get_pathname_options("TestDriver", "ham_directories") % \ i for i in range(1, nsets+1)] ! spamdirs = [get_pathname_options("TestDriver", "spam_directories") % \ i for i in range(1, nsets+1)] Index: timtest.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/testtools/timtest.py,v retrieving revision 1.5 retrieving revision 1.6 diff -C2 -d -r1.5 -r1.6 *** timtest.py 5 Sep 2003 01:15:29 -0000 1.5 --- timtest.py 12 Jan 2004 08:36:16 -0000 1.6 *************** *** 41,45 **** sys.path.insert(-1, os.path.dirname(os.getcwd())) ! from spambayes.Options import options from spambayes import TestDriver from spambayes import msgs --- 41,45 ---- sys.path.insert(-1, os.path.dirname(os.getcwd())) ! from spambayes.Options import options, get_pathname_option from spambayes import TestDriver from spambayes import msgs *************** *** 58,64 **** print options.display() ! spamdirs = [options["TestDriver", "spam_directories"] % \ i for i in range(1, nsets+1)] ! hamdirs = [options["TestDriver", "ham_directories"] % \ i for i in range(1, nsets+1)] spamhamdirs = zip(spamdirs, hamdirs) --- 58,64 ---- print options.display() ! spamdirs = [get_pathname_option("TestDriver", "spam_directories") % \ i for i in range(1, nsets+1)] ! hamdirs = [get_pathname_option("TestDriver", "ham_directories") % \ i for i in range(1, nsets+1)] spamhamdirs = zip(spamdirs, hamdirs) Index: weaktest.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/testtools/weaktest.py,v retrieving revision 1.4 retrieving revision 1.5 diff -C2 -d -r1.4 -r1.5 *** weaktest.py 5 Sep 2003 01:15:29 -0000 1.4 --- weaktest.py 12 Jan 2004 08:36:16 -0000 1.5 *************** *** 34,38 **** import sys,os ! from spambayes.Options import options from spambayes import hammie, msgs, CostCounter --- 34,38 ---- import sys,os ! from spambayes.Options import options, get_pathname_option from spambayes import hammie, msgs, CostCounter *************** *** 148,154 **** print options.display() ! spamdirs = [options["TestDriver", "spam_directories"] % \ i for i in range(1, nsets+1)] ! hamdirs = [options["TestDriver", "ham_directories"] % \ i for i in range(1, nsets+1)] --- 148,154 ---- print options.display() ! spamdirs = [get_pathname_option("TestDriver", "spam_directories") % \ i for i in range(1, nsets+1)] ! hamdirs = [get_pathname_option("TestDriver", "ham_directories") % \ i for i in range(1, nsets+1)] From anadelonbrin at users.sourceforge.net Mon Jan 12 03:36:18 2004 From: anadelonbrin at users.sourceforge.net (Tony Meyer) Date: Mon Jan 12 03:36:27 2004 Subject: [Spambayes-checkins] spambayes/utilities pop3graph.py, 1.3, 1.4 which_database.py, 1.7, 1.8 Message-ID: Update of /cvsroot/spambayes/spambayes/utilities In directory sc8-pr-cvs1:/tmp/cvs-serv21286/utilities Modified Files: pop3graph.py which_database.py Log Message: Well, it's been some time, and no-one disagreed, and three people liked the idea, so here it is. MAY BREAK THINGS! >From now, path/file options are not relative to the current working directory, they are relative to the last configuration file loaded. This still defaults to the current working directory (except on Windows with win32all, where it's still the User Application Data directory). So this should mean no noticeable change, unless your last configuration file is not in the current working directory, and your data is. Index: pop3graph.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/utilities/pop3graph.py,v retrieving revision 1.3 retrieving revision 1.4 diff -C2 -d -r1.3 -r1.4 *** pop3graph.py 4 May 2003 03:16:45 -0000 1.3 --- pop3graph.py 12 Jan 2004 08:36:16 -0000 1.4 *************** *** 29,36 **** else: messageFactory = FileMessageFactory() ! spamCorpus = FileCorpus(messageFactory, options["pop3proxy", ! "spam_cache"]) ! hamCorpus = FileCorpus(messageFactory, options["pop3proxy", ! "ham_cache"]) # Read in all the trained messages. --- 29,36 ---- else: messageFactory = FileMessageFactory() ! sc = get_pathname_option("Storage", "spam_cache") ! hc = get_pathname_option("Storage", "ham_cache") ! spamCorpus = FileCorpus(messageFactory, sc) ! hamCorpus = FileCorpus(messageFactory, hc) # Read in all the trained messages. Index: which_database.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/utilities/which_database.py,v retrieving revision 1.7 retrieving revision 1.8 diff -C2 -d -r1.7 -r1.8 *** which_database.py 16 Dec 2003 05:06:34 -0000 1.7 --- which_database.py 12 Jan 2004 08:36:16 -0000 1.8 *************** *** 27,31 **** sys.path.insert(-1, os.path.dirname(os.getcwd())) ! from spambayes.Options import options import dumbdbm import dbhash --- 27,31 ---- sys.path.insert(-1, os.path.dirname(os.getcwd())) ! from spambayes.Options import options, get_pathname_option import dumbdbm import dbhash *************** *** 73,77 **** print ! hammie = os.path.expanduser(options["Storage", "persistent_storage_file"]) use_dbm = options["Storage", "persistent_use_database"] if not use_dbm: --- 73,77 ---- print ! hammie = get_pathname_option("Storage", "persistent_storage_file") use_dbm = options["Storage", "persistent_use_database"] if not use_dbm: From anadelonbrin at users.sourceforge.net Mon Jan 12 03:36:17 2004 From: anadelonbrin at users.sourceforge.net (Tony Meyer) Date: Mon Jan 12 03:36:28 2004 Subject: [Spambayes-checkins] spambayes/spambayes Options.py, 1.100, 1.101 hammiebulk.py, 1.11, 1.12 Message-ID: Update of /cvsroot/spambayes/spambayes/spambayes In directory sc8-pr-cvs1:/tmp/cvs-serv21286/spambayes Modified Files: Options.py hammiebulk.py Log Message: Well, it's been some time, and no-one disagreed, and three people liked the idea, so here it is. MAY BREAK THINGS! >From now, path/file options are not relative to the current working directory, they are relative to the last configuration file loaded. This still defaults to the current working directory (except on Windows with win32all, where it's still the User Application Data directory). So this should mean no noticeable change, unless your last configuration file is not in the current working directory, and your data is. Index: Options.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/spambayes/Options.py,v retrieving revision 1.100 retrieving revision 1.101 diff -C2 -d -r1.100 -r1.101 *** Options.py 12 Jan 2004 06:49:37 -0000 1.100 --- Options.py 12 Jan 2004 08:36:15 -0000 1.101 *************** *** 1159,1162 **** --- 1159,1173 ---- try: from win32com.shell import shell, shellcon + except ImportError: + # We are on Windows, with no BAYESCUSTOMIZE set, no ini file + # in the current directory, and no win32 extensions installed + # to locate the "user" directory - seeing things are so lamely + # setup, it is worth printing a warning + print >>sys.stderr, "NOTE: We can not locate an INI file " \ + "for SpamBayes, and the Python for Windows extensions " \ + "are not installed, meaning we can't locate your " \ + "'user' directory. An empty configuration file at " \ + "'%s' will be used." % optionsPathname.encode('mbcs') + else: windowsUserDirectory = os.path.join( shell.SHGetFolderPath(0,shellcon.CSIDL_APPDATA,0,0), *************** *** 1165,1168 **** --- 1176,1183 ---- if not os.path.isdir(windowsUserDirectory): os.makedirs(windowsUserDirectory) + except os.error: + # unable to make the directory - stick to default. + pass + else: optionsPathname = os.path.join(windowsUserDirectory, 'bayescustomize.ini') *************** *** 1172,1209 **** if os.path.exists(optionsPathname): options.merge_file(optionsPathname) ! else: ! # If the file doesn't exist, then let's get the user to ! # store their databases and caches here as well, by ! # default, and save the file. ! db_name = os.path.join(windowsUserDirectory, ! "statistics_database.db") ! mi_db = os.path.join(windowsUserDirectory, ! "message_info_database.db") ! h_cache = os.path.join(windowsUserDirectory, ! "ham_cache").encode("mbcs") ! u_cache = os.path.join(windowsUserDirectory, ! "unknown_cache").encode("mbcs") ! s_cache = os.path.join(windowsUserDirectory, ! "spam_cache").encode("mbcs") ! options["Storage", "spam_cache"] = s_cache ! options["Storage", "ham_cache"] = h_cache ! options["Storage", "unknown_cache"] = u_cache ! options["Storage", "persistent_storage_file"] = \ ! db_name.encode("mbcs") ! options["Storage", "messageinfo_storage_file"] = \ ! mi_db.encode("mbcs") ! options.update_file(optionsPathname) ! except os.error: ! # unable to make the directory - stick to default. ! pass ! except ImportError: ! # We are on Windows, with no BAYESCUSTOMIZE set, no ini file ! # in the current directory, and no win32 extensions installed ! # to locate the "user" directory - seeing things are so lamely ! # setup, it is worth printing a warning ! print "NOTE: We can not locate an INI file for SpamBayes, and the" ! print "Python for Windows extensions are not installed, meaning we" ! print "can't locate your 'user' directory. An empty configuration" ! print "file at '%s' will be used." % optionsPathname.encode('mbcs') # Ideally, we should not create the objects at import time - but we have --- 1187,1198 ---- if os.path.exists(optionsPathname): options.merge_file(optionsPathname) ! ! def get_pathname_option(section, option): ! """Return the option relative to the path specified in the ! gloabl optionsPathname, unless it is already an absolute path.""" ! filename = os.path.expanduser(options.get(section, option)) ! if os.path.isabs(filename): ! return filename ! return os.path.join(os.path.dirname(optionsPathname), filename) # Ideally, we should not create the objects at import time - but we have Index: hammiebulk.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/spambayes/hammiebulk.py,v retrieving revision 1.11 retrieving revision 1.12 diff -C2 -d -r1.11 -r1.12 *** hammiebulk.py 9 Oct 2003 03:04:48 -0000 1.11 --- hammiebulk.py 12 Jan 2004 08:36:15 -0000 1.12 *************** *** 52,56 **** import getopt ! from spambayes.Options import options from spambayes import classifier, mboxutils, hammie, Corpus --- 52,56 ---- import getopt ! from spambayes.Options import options, get_pathname_option from spambayes import classifier, mboxutils, hammie, Corpus *************** *** 60,70 **** # Default database name - DEFAULTDB = os.path.expanduser(options["Storage", "persistent_storage_file"]) # This is a bit of a hack to counter the default for # persistent_storage_file changing from ~/.hammiedb to hammie.db # This will work unless a user had hammie.db as their value for # persistent_storage_file ! if DEFAULTDB == options.default("Storage", "persistent_storage_file"): ! DEFAULTDB = os.path.expanduser(os.path.join("~", ".hammiedb")) # Probability at which a message is considered spam --- 60,72 ---- # Default database name # This is a bit of a hack to counter the default for # persistent_storage_file changing from ~/.hammiedb to hammie.db # This will work unless a user had hammie.db as their value for # persistent_storage_file ! if options["Storage", "persistent_storage_file"] == \ ! options.default("Storage", "persistent_storage_file"): ! options["Storage", "persistent_storage_file"] = \ ! os.path.join("~", ".hammiedb")) ! DEFAULTDB = get_pathname_option("Storage", "persistent_storage_file") # Probability at which a message is considered spam From anadelonbrin at users.sourceforge.net Mon Jan 12 03:38:25 2004 From: anadelonbrin at users.sourceforge.net (Tony Meyer) Date: Mon Jan 12 03:38:31 2004 Subject: [Spambayes-checkins] spambayes/spambayes classifier.py, 1.20, 1.21 tokenizer.py, 1.28, 1.29 Message-ID: Update of /cvsroot/spambayes/spambayes/spambayes In directory sc8-pr-cvs1:/tmp/cvs-serv21783/spambayes Modified Files: classifier.py tokenizer.py Log Message: Fix the experimental slurping option so that it only retrieves the text from the URL if necessary, as was the original intent. Index: classifier.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/spambayes/classifier.py,v retrieving revision 1.20 retrieving revision 1.21 diff -C2 -d -r1.20 -r1.21 *** classifier.py 30 Dec 2003 16:26:33 -0000 1.20 --- classifier.py 12 Jan 2004 08:38:23 -0000 1.21 *************** *** 45,48 **** --- 45,65 ---- from spambayes.compatsets import Set + # XXX At time of writing, these are only necessary for the + # XXX experimental url retrieving/slurping code. If that + # XXX gets ripped out, either rip these out, or run + # XXX PyChecker over the code. + import re + import os + import sys + import socket + import pickle + import urllib2 + from email import message_from_string + + DOMAIN_AND_PORT_RE = re.compile(r"([^:/\\]+)(:([\d]+))?") + HTTP_ERROR_RE = re.compile(r"HTTP Error ([\d]+)") + URL_KEY_RE = re.compile(r"[\W]") + # XXX ---- ends ---- + from spambayes.Options import options from spambayes.chi2 import chi2Q *************** *** 57,61 **** LN2 = math.log(2) # used frequently by chi-combining ! slurp_wordstream = [] PICKLE_VERSION = 5 --- 74,78 ---- LN2 = math.log(2) # used frequently by chi-combining ! slurp_wordstream = None PICKLE_VERSION = 5 *************** *** 217,222 **** # at the URL's destination. if len(clues) < options["Classifier", "max_discriminators"] and \ ! prob > h_cut and prob < s_cut: ! sprob, sclues = self.chi2_spamprob(slurp_wordstream, True) if sprob < h_cut or sprob > s_cut: prob = sprob --- 234,241 ---- # at the URL's destination. if len(clues) < options["Classifier", "max_discriminators"] and \ ! prob > h_cut and prob < s_cut and slurp_wordstream: ! slurp_tokens = list(self._generate_slurp()) ! slurp_tokens.extend([w for (w,p) in clues]) ! sprob, sclues = self.chi2_spamprob(slurp_tokens, True) if sprob < h_cut or sprob > s_cut: prob = sprob *************** *** 516,519 **** --- 535,748 ---- last = token + def _generate_slurp(self): + # We don't want to do this recursively and check URLs + # on webpages, so we have this little cheat. + if not hasattr(self, "setup_done"): + self.setup() + self.setup_done = True + if not hasattr(self, "do_slurp") or self.do_slurp: + if slurp_wordstream: + self.do_slurp = False + + tokens = self.slurp(*slurp_wordstream) + self.do_slurp = True + self._save_caches() + return tokens + return [] + + def setup(self): + # Can't import this at the top because it's circular. + # XXX Someone smarter than me, please figure out the right + # XXX way to do this. + from spambayes.FileCorpus import ExpiryFileCorpus, FileMessageFactory + + username = options["globals", "proxy_username"] + password = options["globals", "proxy_password"] + server = options["globals", "proxy_server"] + if server.find(":") != -1: + server, port = server.split(':', 1) + else: + port = 8080 + if server: + # Build a new opener that uses a proxy requiring authorization + proxy_support = urllib2.ProxyHandler({"http" : \ + "http://%s:%s@%s:%d" % \ + (username, password, + server, port)}) + opener = urllib2.build_opener(proxy_support, + urllib2.HTTPHandler) + else: + # Build a new opener without any proxy information. + opener = urllib2.build_opener(urllib2.HTTPHandler) + + # Install it + urllib2.install_opener(opener) + + # Setup the cache for retrieved urls + age = options["URLRetriever", "x-cache_expiry_days"]*24*60*60 + dir = options["URLRetriever", "x-cache_directory"] + if not os.path.exists(dir): + # Create the directory. + if options["globals", "verbose"]: + print >>sys.stderr, "Creating URL cache directory" + os.makedirs(dir) + + self.urlCorpus = ExpiryFileCorpus(age, FileMessageFactory(), + dir, cacheSize=20) + # Kill any old information in the cache + self.urlCorpus.removeExpiredMessages() + + # Setup caches for unretrievable urls + self.bad_url_cache_name = os.path.join(dir, "bad_urls.pck") + self.http_error_cache_name = os.path.join(dir, "http_error_urls.pck") + if os.path.exists(self.bad_url_cache_name): + b_file = file(self.bad_url_cache_name, "r") + self.bad_urls = pickle.load(b_file) + b_file.close() + else: + if options["globals", "verbose"]: + print "URL caches don't exist: creating" + self.bad_urls = {"url:non_resolving": (), + "url:non_html": (), + "url:unknown_error": ()} + if os.path.exists(self.http_error_cache_name): + h_file = file(self.http_error_cache_name, "r") + self.http_error_urls = pickle.load(h_file) + h_file.close() + else: + self.http_error_urls = {} + + def _save_caches(self): + # XXX Note that these caches are never refreshed, which might not + # XXX be a good thing long-term (if a previously invalid URL + # XXX becomes valid, for example). + b_file = file(self.bad_url_cache_name, "w") + pickle.dump(self.bad_urls, b_file) + b_file.close() + h_file = file(self.http_error_cache_name, "w") + pickle.dump(self.http_error_urls, h_file) + h_file.close() + + def slurp(self, proto, url): + # We generate these tokens: + # url:non_resolving + # url:non_html + # url:http_XXX (for each type of http error encounted, + # for example 404, 403, ...) + # And tokenise the received page (but we do not slurp this). + # Actually, the special url: tokens barely showed up in my testing, + # although I would have thought that they would more - this might + # be due to an error, although they do turn up on occasion. In + # any case, we have to do the test, so generating an extra token + # doesn't cost us anything apart from another entry in the db, and + # it's only two entries, plus one for each type of http error + # encountered, so it's pretty neglible. + from spambayes.tokenizer import Tokenizer + + if options["URLRetriever", "x-only_slurp_base"]: + url = self._base_url(url) + + # Check the unretrievable caches + for err in self.bad_urls.keys(): + if url in self.bad_urls[err]: + return [err] + if self.http_error_urls.has_key(url): + return self.http_error_urls[url] + + # We check if the url will resolve first + mo = DOMAIN_AND_PORT_RE.match(url) + domain = mo.group(1) + if mo.group(3) is None: + port = 80 + else: + port = mo.group(3) + try: + not_used = socket.getaddrinfo(domain, port) + except socket.error: + self.bad_urls["url:non_resolving"] += (url,) + return ["url:non_resolving"] + + # If the message is in our cache, then we can just skip over + # retrieving it from the network, and get it from there, instead. + url_key = URL_KEY_RE.sub('_', url) + cached_message = self.urlCorpus.get(url_key) + + if cached_message is None: + # We're going to ignore everything that isn't text/html, + # so we might as well not bother retrieving anything with + # these extensions. + parts = url.split('.') + if parts[-1] in ('jpg', 'gif', 'png', 'css', 'js'): + self.bad_urls["url:non_html"] += (url,) + return ["url:non_html"] + + try: + if options["globals", "verbose"]: + print >>sys.stderr, "Slurping", url + f = urllib2.urlopen("%s://%s" % (proto, url)) + except (urllib2.URLError, socket.error), details: + mo = HTTP_ERROR_RE.match(str(details)) + if mo: + self.http_error_urls[url] = "url:http_" + mo.group(1) + return ["url:http_" + mo.group(1)] + self.bad_urls["url:unknown_error"] += (url,) + return ["url:unknown_error"] + + # Anything that isn't text/html is ignored + content_type = f.info().get('content-type') + if content_type is None or \ + not content_type.startswith("text/html"): + self.bad_urls["url:non_html"] += (url,) + return ["url:non_html"] + + page = f.read() + headers = str(f.info()) + f.close() + fake_message_string = headers + "\r\n" + page + + # Retrieving the same messages over and over again will tire + # us out, so we store them in our own wee cache. + message = self.urlCorpus.makeMessage(url_key) + message.setPayload(fake_message_string) + self.urlCorpus.addMessage(message) + else: + fake_message_string = cached_message.as_string() + + msg = message_from_string(fake_message_string) + + # We don't want to do full header tokenising, as this is + # optimised for messages, not webpages, so we just do the + # basic stuff. + bht = options["Tokenizer", "basic_header_tokenize"] + bhto = options["Tokenizer", "basic_header_tokenize_only"] + options["Tokenizer", "basic_header_tokenize"] = True + options["Tokenizer", "basic_header_tokenize_only"] = True + + tokens = Tokenizer().tokenize(msg) + pf = options["URLRetriever", "x-web_prefix"] + tokens = ["%s%s" % (pf, tok) for tok in tokens] + + # Undo the changes + options["Tokenizer", "basic_header_tokenize"] = bht + options["Tokenizer", "basic_header_tokenize_only"] = bhto + return tokens + + def _base_url(self, url): + # To try and speed things up, and to avoid following + # unique URLS, we convert the URL to as basic a form + # as we can - so http://www.massey.ac.nz/~tameyer/index.html?you=me + # would become http://massey.ac.nz and http://id.example.com + # would become http://example.com + url += '/' + domain, garbage = url.split('/', 1) + parts = domain.split('.') + if len(parts) > 2: + base_domain = parts[-2] + '.' + parts[-1] + if len(parts[-1]) < 3: + base_domain = parts[-3] + '.' + base_domain + else: + base_domain = domain + return base_domain + def _add_slurped(self, wordstream): """Add tokens generated by 'slurping' (i.e. tokenizing *************** *** 522,526 **** for token in wordstream: yield token ! for token in slurp_wordstream: yield token --- 751,756 ---- for token in wordstream: yield token ! slurped_tokens = self._generate_slurp() ! for token in slurped_tokens: yield token Index: tokenizer.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/spambayes/tokenizer.py,v retrieving revision 1.28 retrieving revision 1.29 diff -C2 -d -r1.28 -r1.29 *** tokenizer.py 5 Jan 2004 17:40:30 -0000 1.28 --- tokenizer.py 12 Jan 2004 08:38:23 -0000 1.29 *************** *** 21,36 **** from compatsets import Set - # XXX At time of writing, these are only necessary for the - # XXX experimental url retrieving/slurping code. If that - # XXX gets ripped out, either rip these out, or run - # XXX PyChecker over the code. - import sys - import socket - import pickle - import urllib2 from spambayes import classifier - from email import message_from_string - # XXX ---- ends ---- - from spambayes.Options import options --- 21,25 ---- *************** *** 1076,1198 **** return tokens - DOMAIN_AND_PORT_RE = re.compile(r"([^:/\\]+)(:([\d]+))?") - HTTP_ERROR_RE = re.compile(r"HTTP Error ([\d]+)") - URL_KEY_RE = re.compile(r"[\W]") - class SlurpingURLStripper(URLStripper): def __init__(self): URLStripper.__init__(self) - self.setup_done = False - self.do_slurp = True - - def setup(self): - # Can't import this at the top because it's circular. - # XXX Someone smarter than me, please figure out the right - # XXX way to do this. - from spambayes.FileCorpus import ExpiryFileCorpus, FileMessageFactory - username = options["globals", "proxy_username"] - password = options["globals", "proxy_password"] - server = options["globals", "proxy_server"] - if server.find(":") != -1: - server, port = server.split(':', 1) - else: - port = 8080 - if server: - # Build a new opener that uses a proxy requiring authorization - proxy_support = urllib2.ProxyHandler({"http" : \ - "http://%s:%s@%s:%d" % \ - (username, password, - server, port)}) - opener = urllib2.build_opener(proxy_support, - urllib2.HTTPHandler) - else: - # Build a new opener without any proxy information. - opener = urllib2.build_opener(urllib2.HTTPHandler) - - # Install it - urllib2.install_opener(opener) - - # Setup the cache for retrieved urls - age = options["URLRetriever", "x-cache_expiry_days"]*24*60*60 - dir = options["URLRetriever", "x-cache_directory"] - if not os.path.exists(dir): - # Create the directory. - if options["globals", "verbose"]: - print >>sys.stderr, "Creating URL cache directory" - os.makedirs(dir) - - self.urlCorpus = ExpiryFileCorpus(age, FileMessageFactory(), - dir, cacheSize=20) - # Kill any old information in the cache - self.urlCorpus.removeExpiredMessages() - - # Setup caches for unretrievable urls - self.bad_url_cache_name = os.path.join(dir, "bad_urls.pck") - self.http_error_cache_name = os.path.join(dir, "http_error_urls.pck") - if os.path.exists(self.bad_url_cache_name): - b_file = file(self.bad_url_cache_name, "r") - self.bad_urls = pickle.load(b_file) - b_file.close() - else: - self.bad_urls = {"url:non_resolving": (), - "url:non_html": (), - "url:unknown_error": ()} - if os.path.exists(self.http_error_cache_name): - h_file = file(self.http_error_cache_name, "r") - self.http_error_urls = pickle.load(h_file) - h_file.close() - else: - self.http_error_urls = {} ! def _save_caches(self): ! # XXX Note that these caches are never refreshed, which might not ! # XXX be a good thing long-term (if a previously invalid URL ! # XXX becomes valid, for example). ! b_file = file(self.bad_url_cache_name, "w") ! pickle.dump(self.bad_urls, b_file) ! b_file.close() ! h_file = file(self.http_error_cache_name, "w") ! pickle.dump(self.http_error_urls, h_file) ! h_file.close() def tokenize(self, m): ! # XXX A weakness of this is that the text from URLs is ! # XXX always retrieved, even if it won't be used (if the ! # XXX raw score is outside unsure, for example). The ! # XXX problem is that when tokenizing, we have no idea ! # XXX what the score of the message should be, and so ! # XXX if we need the tokens or not. But when calculating ! # XXX the spamprob, we have no idea what the content of ! # XXX the message is - just the tokens we generated from it ! # XXX (and we can't reverse-engineer the URLs from that). ! # XXX I've (Tony) played around with various ways to get ! # XXX around this, but can't really come up with anything ! # XXX good, apart from moving the decision whether to ! # XXX recalculate the score 'higher' up (out of classifier's ! # XXX spamprob()), but then it seems that code in a *lot* ! # XXX of places will need to be changed to call the new ! # XXX function; not nice given that this is experimental. ! # XXX Either someone else will point out a good way to do this ! # XXX or it can be moved higher up if this ever makes it out ! # XXX of experimental status. ! # XXX This might not matter so much because of the local ! # XXX cache of the 'slurped' content, especially if the cache ! # XXX isn't set to expire content regularly, and if your ham ! # XXX (likely) and spam (unlikely) messages tend to have the ! # XXX same URLs in them, and only unsure change. ! # XXX Also note that the 'slurped' tokens are *always* trained # XXX on; it would be simple to change/parameterize this. - if not self.setup_done: - self.setup() - self.setup_done = True tokens = URLStripper.tokenize(self, m) ! if not (options["URLRetriever", "x-slurp_urls"] and \ ! self.do_slurp): return tokens - # We don't want to do this recursively and check URLs - # on webpages, so we have this little cheat. - self.do_slurp = False - proto, guts = m.groups() if proto != "http": --- 1065,1087 ---- return tokens class SlurpingURLStripper(URLStripper): def __init__(self): URLStripper.__init__(self) ! def analyze(self, text): ! # If there are no URLS, then we need to clear the ! # wordstream, or whatever was there from the last message ! # will be used. ! classifier.slurp_wordstream = None ! # Continue as normal. ! return URLStripper.analyze(self, text) def tokenize(self, m): ! # XXX Note that the 'slurped' tokens are *always* trained # XXX on; it would be simple to change/parameterize this. tokens = URLStripper.tokenize(self, m) ! if not options["URLRetriever", "x-slurp_urls"]: return tokens proto, guts = m.groups() if proto != "http": *************** *** 1203,1329 **** guts = guts[:-1] ! classifier.slurp_wordstream = self.slurp(proto, guts) ! self.do_slurp = True ! self._save_caches() ! return tokens ! ! def slurp(self, proto, url): ! # We generate these tokens: ! # url:non_resolving ! # url:non_html ! # url:http_XXX (for each type of http error encounted, ! # for example 404, 403, ...) ! # And tokenise the received page (but we do not slurp this). ! # Actually, the special url: tokens barely showed up in my testing, ! # although I would have thought that they would more - this might ! # be due to an error, although they do turn up on occasion. In ! # any case, we have to do the test, so generating an extra token ! # doesn't cost us anything apart from another entry in the db, and ! # it's only two entries, plus one for each type of http error ! # encountered, so it's pretty neglible. ! if options["URLRetriever", "x-only_slurp_base"]: ! url = self._base_url(url) ! ! # Check the unretrievable caches ! for err in self.bad_urls.keys(): ! if url in self.bad_urls[err]: ! return [err] ! if self.http_error_urls.has_key(url): ! return self.http_error_urls[url] ! ! # We check if the url will resolve first ! mo = DOMAIN_AND_PORT_RE.match(url) ! domain = mo.group(1) ! if mo.group(3) is None: ! port = 80 ! else: ! port = mo.group(3) ! try: ! not_used = socket.getaddrinfo(domain, port) ! except socket.error: ! self.bad_urls["url:non_resolving"] += (url,) ! return ["url:non_resolving"] ! ! # If the message is in our cache, then we can just skip over ! # retrieving it from the network, and get it from there, instead. ! url_key = URL_KEY_RE.sub('_', url) ! cached_message = self.urlCorpus.get(url_key) ! ! if cached_message is None: ! # We're going to ignore everything that isn't text/html, ! # so we might as well not bother retrieving anything with ! # these extensions. ! parts = url.split('.') ! if parts[-1] in ('jpg', 'gif', 'png', 'css', 'js'): ! self.bad_urls["url:non_html"] += (url,) ! return ["url:non_html"] ! ! try: ! if options["globals", "verbose"]: ! print >>sys.stderr, "Slurping", url ! f = urllib2.urlopen("%s://%s" % (proto, url)) ! except (urllib2.URLError, socket.error), details: ! mo = HTTP_ERROR_RE.match(str(details)) ! if mo: ! self.http_error_urls[url] = "url:http_" + mo.group(1) ! return ["url:http_" + mo.group(1)] ! self.bad_urls["url:unknown_error"] += (url,) ! return ["url:unknown_error"] ! ! # Anything that isn't text/html is ignored ! content_type = f.info().get('content-type') ! if content_type is None or \ ! not content_type.startswith("text/html"): ! self.bad_urls["url:non_html"] += (url,) ! return ["url:non_html"] ! ! page = f.read() ! headers = str(f.info()) ! f.close() ! fake_message_string = headers + "\r\n" + page ! ! # Retrieving the same messages over and over again will tire ! # us out, so we store them in our own wee cache. ! message = self.urlCorpus.makeMessage(url_key) ! message.setPayload(fake_message_string) ! self.urlCorpus.addMessage(message) ! else: ! fake_message_string = cached_message.as_string() ! ! msg = message_from_string(fake_message_string) ! ! # We don't want to do full header tokenising, as this is ! # optimised for messages, not webpages, so we just do the ! # basic stuff. ! bht = options["Tokenizer", "basic_header_tokenize"] ! bhto = options["Tokenizer", "basic_header_tokenize_only"] ! options["Tokenizer", "basic_header_tokenize"] = True ! options["Tokenizer", "basic_header_tokenize_only"] = True ! ! tokens = Tokenizer().tokenize(msg) ! pf = options["URLRetriever", "x-web_prefix"] ! tokens = ["%s%s" % (pf, tok) for tok in tokens] ! ! # Undo the changes ! options["Tokenizer", "basic_header_tokenize"] = bht ! options["Tokenizer", "basic_header_tokenize_only"] = bhto return tokens - - def _base_url(self, url): - # To try and speed things up, and to avoid following - # unique URLS, we convert the URL to as basic a form - # as we can - so http://www.massey.ac.nz/~tameyer/index.html?you=me - # would become http://massey.ac.nz and http://id.example.com - # would become http://example.com - url += '/' - domain, garbage = url.split('/', 1) - parts = domain.split('.') - if len(parts) > 2: - base_domain = parts[-2] + '.' + parts[-1] - if len(parts[-1]) < 3: - base_domain = parts[-3] + '.' + base_domain - else: - base_domain = domain - return base_domain if options["URLRetriever", "x-slurp_urls"]: --- 1092,1097 ---- guts = guts[:-1] ! classifier.slurp_wordstream = (proto, guts) return tokens if options["URLRetriever", "x-slurp_urls"]: From anadelonbrin at users.sourceforge.net Mon Jan 12 03:42:55 2004 From: anadelonbrin at users.sourceforge.net (Tony Meyer) Date: Mon Jan 12 03:43:02 2004 Subject: [Spambayes-checkins] spambayes/spambayes message.py,1.45,1.46 Message-ID: Update of /cvsroot/spambayes/spambayes/spambayes In directory sc8-pr-cvs1:/tmp/cvs-serv22429/spambayes Modified Files: message.py Log Message: Improve the messageinfo database. Previously it was fixed to store two attributes (classification and training), but the idea was that it was meant to be expandable so that other attributes can be saved (for example, sb_pop3dnd saves flags, and Anthony wants to save message-ids). As it was, this meant that the db's couldn't be backwards compatible, which isn't nice. The scheme now works for old messageinfo dbs (so nothing should break), but will let you save any number of different attributes, simply by changing an attribute of the message object (which means that different messages can even save different things, if that's what is wanted). Also use the relative-to-ini not relative-to-cwd pathnames stuff with the options. Index: message.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/spambayes/message.py,v retrieving revision 1.45 retrieving revision 1.46 diff -C2 -d -r1.45 -r1.46 *** message.py 16 Dec 2003 05:06:34 -0000 1.45 --- message.py 12 Jan 2004 08:42:53 -0000 1.46 *************** *** 102,106 **** from spambayes import dbmstorage ! from spambayes.Options import options from spambayes.tokenizer import tokenize --- 102,106 ---- from spambayes import dbmstorage ! from spambayes.Options import options, get_pathname_option from spambayes.tokenizer import tokenize *************** *** 116,126 **** if self.db is not None: try: ! (msg.c, msg.t) = self.db[msg.getId()] except KeyError: pass def _setState(self, msg): if self.db is not None: ! self.db[msg.getId()] = (msg.c, msg.t) self.store() --- 116,137 ---- if self.db is not None: try: ! attributes = self.db[msg.getId()] except KeyError: pass + else: + if not isinstance(attributes, types.ListType): + # Old-style message info db, which only + # handles storing 'c' and 't'. + (msg.c, msg.t) = attributes + return + for att, val in attributes: + setattr(msg, att, val) def _setState(self, msg): if self.db is not None: ! attributes = [] ! for att in msg.stored_attributes: ! attributes.append((att, getattr(msg, att))) ! self.db[msg.getId()] = attributes self.store() *************** *** 194,199 **** # so that these files don't litter lots of working directories. # Once there is a master db, this option can be removed. ! message_info_db_name = options["Storage", "messageinfo_storage_file"] ! message_info_db_name = os.path.expanduser(message_info_db_name) if options["Storage", "persistent_use_database"]: msginfoDB = MessageInfoDB(message_info_db_name) --- 205,209 ---- # so that these files don't litter lots of working directories. # Once there is a master db, this option can be removed. ! message_info_db_name = get_pathname_option("Storage", "messageinfo_storage_file") if options["Storage", "persistent_use_database"]: msginfoDB = MessageInfoDB(message_info_db_name) *************** *** 208,211 **** --- 218,222 ---- # persistent state + self.stored_attributes = ['c', 't',] self.id = None self.c = None From anadelonbrin at users.sourceforge.net Mon Jan 12 03:43:57 2004 From: anadelonbrin at users.sourceforge.net (Tony Meyer) Date: Mon Jan 12 03:44:02 2004 Subject: [Spambayes-checkins] spambayes/scripts sb_pop3dnd.py,1.6,1.7 Message-ID: Update of /cvsroot/spambayes/spambayes/scripts In directory sc8-pr-cvs1:/tmp/cvs-serv22580/scripts Modified Files: sb_pop3dnd.py Log Message: Store the IMAP flags. Fix an incorrectly loaded flag (typo). Index: sb_pop3dnd.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/scripts/sb_pop3dnd.py,v retrieving revision 1.6 retrieving revision 1.7 diff -C2 -d -r1.6 -r1.7 *** sb_pop3dnd.py 31 Dec 2003 02:59:51 -0000 1.6 --- sb_pop3dnd.py 12 Jan 2004 08:43:55 -0000 1.7 *************** *** 62,69 **** todo = """ - o Message flags are currently not persisted, but should be. The - IMAPFileMessage class should be extended to do this. The same - goes for the 'internaldate' of the message. These could be put - in the message info database, no doubt. o The RECENT flag should be unset at some point, but when? The RFC says that a message is recent if this is the first session --- 62,65 ---- *************** *** 92,96 **** """ ! # This module is part of the spambayes project, which is Copyright 2002-3 # The Python Software Foundation and is covered by the Python Software # Foundation license. --- 88,92 ---- """ ! # This module is part of the spambayes project, which is Copyright 2002-4 # The Python Software Foundation and is covered by the Python Software # Foundation license. *************** *** 159,162 **** --- 155,163 ---- def __init__(self, date): message.Message.__init__(self) + # We want to persist more information than the generic + # Message class. + self.stored_attributes.extend(["date", "deleted", "flagged", + "seen", "draft", "recent", + "answered"]) self.date = date self.clear_flags() *************** *** 186,190 **** if self.draft: yield "\\DRAFT" ! if self.draft: yield "\\RECENT" --- 187,191 ---- if self.draft: yield "\\DRAFT" ! if self.recent: yield "\\RECENT" From montanaro at users.sourceforge.net Mon Jan 12 09:13:03 2004 From: montanaro at users.sourceforge.net (Skip Montanaro) Date: Mon Jan 12 09:13:07 2004 Subject: [Spambayes-checkins] spambayes/spambayes Dibbler.py,1.12,1.13 Message-ID: Update of /cvsroot/spambayes/spambayes/spambayes In directory sc8-pr-cvs1:/tmp/cvs-serv14404 Modified Files: Dibbler.py Log Message: Split digest auth info properly. Simple split-on-comma fails if the uri contains commas. Patch from Ian on the spambayes-dev mailing list. Index: Dibbler.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/spambayes/Dibbler.py,v retrieving revision 1.12 retrieving revision 1.13 diff -C2 -d -r1.12 -r1.13 *** Dibbler.py 31 Dec 2003 14:09:36 -0000 1.12 --- Dibbler.py 12 Jan 2004 14:13:01 -0000 1.13 *************** *** 345,348 **** --- 345,352 ---- and driving the plugins.""" + # RE to extract option="value" fields from + # digest auth login field + _login_splitter = re.compile('([a-zA-Z])+=(".*?"|.*?),?') + def __init__(self, clientSocket, server, context): # Grumble: asynchat.__init__ doesn't take a 'map' argument, *************** *** 614,618 **** return (s[0] == '"' and s[-1] == '"') and s[1:-1] or s ! options = dict([s.split('=') for s in login.split(", ")]) userName = stripQuotes(options["username"]) password = self._server.getPasswordForUser(userName) --- 618,622 ---- return (s[0] == '"' and s[-1] == '"') and s[1:-1] or s ! options = dict(self._login_splitter.findall(login)) userName = stripQuotes(options["username"]) password = self._server.getPasswordForUser(userName) From bwarsaw at users.sourceforge.net Mon Jan 12 09:15:40 2004 From: bwarsaw at users.sourceforge.net (Barry A. Warsaw) Date: Mon Jan 12 09:15:43 2004 Subject: [Spambayes-checkins] spambayes/spambayes hammiebulk.py,1.12,1.13 Message-ID: Update of /cvsroot/spambayes/spambayes/spambayes In directory sc8-pr-cvs1:/tmp/cvs-serv14869 Modified Files: hammiebulk.py Log Message: Fixed typo causing syntax error. Index: hammiebulk.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/spambayes/hammiebulk.py,v retrieving revision 1.12 retrieving revision 1.13 diff -C2 -d -r1.12 -r1.13 *** hammiebulk.py 12 Jan 2004 08:36:15 -0000 1.12 --- hammiebulk.py 12 Jan 2004 14:15:38 -0000 1.13 *************** *** 67,71 **** options.default("Storage", "persistent_storage_file"): options["Storage", "persistent_storage_file"] = \ ! os.path.join("~", ".hammiedb")) DEFAULTDB = get_pathname_option("Storage", "persistent_storage_file") --- 67,71 ---- options.default("Storage", "persistent_storage_file"): options["Storage", "persistent_storage_file"] = \ ! os.path.join("~", ".hammiedb") DEFAULTDB = get_pathname_option("Storage", "persistent_storage_file") From bwarsaw at users.sourceforge.net Mon Jan 12 09:19:30 2004 From: bwarsaw at users.sourceforge.net (Barry A. Warsaw) Date: Mon Jan 12 09:19:33 2004 Subject: [Spambayes-checkins] spambayes/scripts sb_imapfilter.py,1.18,1.19 Message-ID: Update of /cvsroot/spambayes/spambayes/scripts In directory sc8-pr-cvs1:/tmp/cvs-serv15976 Modified Files: sb_imapfilter.py Log Message: get_substance(): Catch and ignore MessageParseErrors when parsing the data['RFC822'] text. Index: sb_imapfilter.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/scripts/sb_imapfilter.py,v retrieving revision 1.18 retrieving revision 1.19 diff -C2 -d -r1.18 -r1.19 *** sb_imapfilter.py 12 Jan 2004 08:36:15 -0000 1.18 --- sb_imapfilter.py 12 Jan 2004 14:19:28 -0000 1.19 *************** *** 368,372 **** # we go through the hoops of creating a new message, and then # copying over all its internals. ! new_msg = email.Parser.Parser().parsestr(data["RFC822"]) self._headers = new_msg._headers self._unixfrom = new_msg._unixfrom --- 368,376 ---- # we go through the hoops of creating a new message, and then # copying over all its internals. ! try: ! new_msg = email.Parser.Parser().parsestr(data["RFC822"]) ! except email.Errors.MessageParseError, e: ! print 'Skipping unparseable message: %s' % e ! return self._headers = new_msg._headers self._unixfrom = new_msg._unixfrom From anadelonbrin at users.sourceforge.net Mon Jan 12 17:27:41 2004 From: anadelonbrin at users.sourceforge.net (Tony Meyer) Date: Mon Jan 12 17:27:44 2004 Subject: [Spambayes-checkins] spambayes/spambayes ProxyUI.py,1.39,1.40 Message-ID: Update of /cvsroot/spambayes/spambayes/spambayes In directory sc8-pr-cvs1:/tmp/cvs-serv10664/spambayes Modified Files: ProxyUI.py Log Message: We don't always have the score available, so handle that case for the new 'discard for edges' code. Also, fix [ 874784 ] Error in onReview code Index: ProxyUI.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/spambayes/ProxyUI.py,v retrieving revision 1.39 retrieving revision 1.40 diff -C2 -d -r1.39 -r1.40 *** ProxyUI.py 12 Jan 2004 06:46:33 -0000 1.39 --- ProxyUI.py 12 Jan 2004 22:27:39 -0000 1.40 *************** *** 292,298 **** self._getTimeRange(self._keyToTimestamp(key)) row = self.html.reviewRow.clone() ! score = float(messageInfo.score.rstrip('%')) if label == 'Spam': ! if score > options["html_ui", "spam_discard_level"]: r_att = getattr(row, 'discard') else: --- 292,302 ---- self._getTimeRange(self._keyToTimestamp(key)) row = self.html.reviewRow.clone() ! try: ! score = float(messageInfo.score.rstrip('%')) ! except ValueError: ! score = None if label == 'Spam': ! if score is not None \ ! and score > options["html_ui", "spam_discard_level"]: r_att = getattr(row, 'discard') else: *************** *** 300,304 **** "default_spam_action"]) elif label == 'Ham': ! if score < options["html_ui", "ham_discard_level"]: r_att = getattr(row, 'discard') else: --- 304,309 ---- "default_spam_action"]) elif label == 'Ham': ! if score is not None \ ! and score < options["html_ui", "ham_discard_level"]: r_att = getattr(row, 'discard') else: *************** *** 352,356 **** numTrained = 0 numDeferred = 0 ! if params.get('go') != 'refresh': for key, value in params.items(): if key.startswith('classify:'): --- 357,361 ---- numTrained = 0 numDeferred = 0 ! if params.get('go') != 'Refresh': for key, value in params.items(): if key.startswith('classify:'): From anadelonbrin at users.sourceforge.net Mon Jan 12 18:39:44 2004 From: anadelonbrin at users.sourceforge.net (Tony Meyer) Date: Mon Jan 12 18:39:52 2004 Subject: [Spambayes-checkins] spambayes CHANGELOG.txt, 1.30, 1.31 WHAT_IS_NEW.txt, 1.24, 1.25 Message-ID: Update of /cvsroot/spambayes/spambayes In directory sc8-pr-cvs1:/tmp/cvs-serv25690 Modified Files: CHANGELOG.txt WHAT_IS_NEW.txt Log Message: Bring up to date. Index: CHANGELOG.txt =================================================================== RCS file: /cvsroot/spambayes/spambayes/CHANGELOG.txt,v retrieving revision 1.30 retrieving revision 1.31 diff -C2 -d -r1.30 -r1.31 *** CHANGELOG.txt 29 Dec 2003 04:46:31 -0000 1.30 --- CHANGELOG.txt 12 Jan 2004 23:39:42 -0000 1.31 *************** *** 3,8 **** --- 3,54 ---- Alpha Release 8 =============== + Tony Meyer 13/01/2004 Fix [ 874784 ] Error in onReview code + Skip Montanaro 13/01/2004 UserInterface: Split digest auth info properly. Simple split-on-comma fails if the uri contains commas. + Barry A. Warsaw 13/01/2004 imapfilter: Catch and ignore MessageParseErrors when parsing the data['RFC822'] text. + Tony Meyer 12/01/2004 Improve the messageinfo database so that more than two attributes can be saved (it's still backwards compatible with the old type). + Tony Meyer 12/01/2004 Path/file options are no longer relative to the current working directory, they are relative to the last configuration file loaded. + Tony Meyer 12/01/2004 pop3dnd: Store the IMAP flags. + Tony Meyer 12/01/2004 pop3dnd: Fix an incorrectly checked flag (typo). + Anthony Baxter 12/01/2004 New options ham_discard_level and spam_discard_level. These make the interface default to discard hams/spams in the training interface. + Tony Meyer 11/01/2004 mkgraph.py: Add a docstring. + Tony Meyer 11/01/2004 mkgraph.py: Add -f command line arg to pass a filename rather than reading from stdin (which is still the default) + Tony Meyer 11/01/2004 mkgraph.py: Add a training_is_ham line to the error graph which shows the percentage of training data that is ham (i.e. shows the imbalance). + Tony Meyer 11/01/2004 mkgraph.py: Modify the outputing so that it can be in different formats for those of us without plotmtv. The -c command line option outputs all the lines in the same set of rows, rather than in their own set as is the default. The -s arg specifies the separator for this sort of output (defaults to a comma, so that csv files are output). + Tony Meyer 11/01/2003 incremental.py: Add a docstring and the ability to print it with -h or --help to incremental.py + Tony Meyer 11/01/2003 regimes.py: Add a docstring that outlines the various regimes in hopefully easy to understand terms. Print this out if regimes.py is executed. + Tony Meyer 11/01/2003 regimes.py: Add a new regime - balanced_corrected. + Tony Meyer 08/01/2004 Fix [ 805852 ] need python-dev package on Debian + Skip Montanaro 08/01/2004 table.py: space the table out a little more. + Skip Montanaro 07/01/2004 mkreversemap.py: New script which generates a pickle file mapping features to mailbox files and message-id's. Use with extractmessages.py. + Skip Montanaro 07/01/2004 extractmessages.py: New script; use with mkreversemap.py to identify messages in your training database which contain interesting tokens. + Tony Meyer 07/01/2004 Fix [ 872044 ] HTTP review page date problems. + Skip Montanaro 06/01/2004 Add experimental option and code to pick out some semantic bits from URLs + Tony Meyer 05/01/2004 Add extra utility functions to oe_mailbox for dealing with Outlook Express. + Tony Meyer 05/01/2004 Have autoconfigure confirm that configuration has occured. + Tony Meyer 05/01/2004 Do better 'is installed' checks in autoconfigure. + Adam Walker 05/01/2004 Start SMTP proxy in a trainable state. + Tony Meyer 02/01/2003 UserInterface: Fix import error reported. + Richie Hindle 01/01/2004 Default to twenty search results on web interface rather than just one. + Richie Hindle 01/01/2004 Made the search form do a GET rather than a POST. + Richie Hindle 01/01/2004 Fix for 842984: If webbrowser.open_new() fails, print a message saying "Please point your web browser at http://localhost:8880/" rather than bombing out. + Richie Hindle 01/01/2004 New script: utilities/hammer.py: Hammers the core SpamBayes code, repeatedly training and classifying using faked-up messages. + Tony Meyer 31/12/2003 pop3dnd: Fix fetching an envelope. + Tony Meyer 31/12/2003 pop3dnd: Handle storing no flags. + Tony Meyer 31/12/2003 pop3dnd: Update the RETR'ing of messages to reflect what sb_server currently does. + Tony Meyer 31/12/2003 pop3dnd: Clean out some cruft that isn't necessary with the latest version of twisted (1.1) + Tony Meyer 31/12/2003 pop3dnd: Add two new Message classes, one for messages that are stored in memory, and one for messages that are re-generated each time the message is loaded. + Tony Meyer 31/12/2003 pop3dnd: Start our UIDs at 1, not 0, because Eudora likes this more. + Tony Meyer 31/12/2003 pop3dnd: Don't override the imported name "message". + Tony Meyer 31/12/2003 pop3dnd: Add a new folder - INBOX - this holds any messages from SpamBayes to the user. (Having INBOX as an alias for Spam wasn't working well, and being able to communicate within the confines of the mailer is nice, too). + Tony Meyer 31/12/2003 pop3dnd: Don't let the user set so much via the command line. Use a config file, you lazy person. + Tony Meyer 31/12/2003 pop3proxy_tray: When we stopped sb_server and then restarted, we didn't init the state, so it wouldn't work. Fix that. + Tony Meyer 31/12/2003 Web interface: As Richie pointed out, the status message was only updated when the state was recreated. Fix this. + Tony Meyer 31/12/2003 Web interface: Output plurals correctly in stats information. + Tony Meyer 31/12/2003 We printed out false positive numbers in the false negatives section, and vice versa. Fix. + Tony Meyer 30/12/2003 IMAP interface: Quote folder names when displaying them - otherwise if the folder names contained certain characters it could result in bad html (if the name was ">foo", for example). Tony Meyer 29/12/2003 Web interface: Improve the 'online' help message for the review page, and add messages for the stats and home pages. Tim Peters 29/12/2003 Many improvements to the mksets.py testtools script. + Tim Peters 28/12/2003 sort+group.py: Sort msgs by full-precision timestamp (not just by day). Normalize Received time to UTC first. Use email.Utils to parse dates instead of hand-rolling our own parser + Tim Peters 28/12/2003 sort+group.py: Preserve files' extensions (if any) during renaming. Tim Peters 28/12/2003 Outlook: export.py - the -n option now gives the number of Set subdirectories desired, instead of a number of msgs per Set subdir "to shoot for". Tim Peters 28/12/2003 Added a new -t option to rebal.py, may have broken -s and -r options. *************** *** 20,23 **** --- 66,70 ---- Mark Hammond 21/12/2003 Outlook: DWhen doing a "batch train" (eg, selecting multiple messages and saying "Delete as" or "Recover from",) the DB was saved in between each and every message. Now only saved at the end (which was always the intent) Mark Hammond 21/12/2003 Outlook: DAs part of checking our configuration is invalid, make sure the user hasn't set us up such that either Spam/Unsure folders isn't also being watched for new messages + Mark Hammond 21/12/2003 Outlook: If the user attempts to close the Manager dialog while there is a problem preventing us being enabled, confirm they really want to close it Mark Hammond 20/12/2003 dump_props.py: Add -c option, which writes output to the Windows clipboard. Mark Hammond 20/12/2003 Outlook: Include the foldername in many messages, so help track down wierd bugs from user logs. Say what we are watching a folder for. *************** *** 29,33 **** --- 76,84 ---- Mark Hammond 19/12/2003 Outlook: Don't record in the training database unless we are successful in the filter - otherwise future attempts to filter will get all screwed up, as it will think it already was Mark Hammond 19/12/2003 Outlook: Move some of our init code from OnConnection to OnStartupComplete + Mark Hammond 19/12/2003 Outlook: Try and tone down the toolbar message in the log to prevent people reporting it as a bug + Mark Hammond 19/12/2003 Outlook: Handle situations where Outlook starts up in a confused state, which then confused us. + Mark Hammond 19/12/2003 Outlook: Ask if you want the slow, non-filter tests run, and add E_OBJECT_CHANGED tests, as per [ 803798 ] MAPI_E_OBJECT_CHANGED error saving spam score Tony Meyer 18/12/2003 Bring pspam into the modern SpamBayes world. + Mark Hammond 18/12/2003 Outlook: When the 'New Folder' button was used to create a folder, that folder was not used when you closed the dialog, even though it was selected. Mark Hammond 17/12/2003 Tray app: Better icons and icon loading code. Tony Meyer 17/12/2003 Add the basis of a new experimental (and highly debatable) option to 'slurp' URLs. *************** *** 41,44 **** --- 92,96 ---- Mark Hammond 15/12/2003 Fix [ 833439 ] default_bayes_customize.ini is confusing. Mark Hammond 14/12/2003 Move the option loading code to a function, then call this function as the module loads. + Mark Hammond 14/12/2003 test_programs: When "calling" URLs, check the output for tracebacks, check the exit code of processes we spawn, and add test for "[ 859215 ] "Restore Defaults" causes assertion error at exit". Tim Peters 14/12/2003 Removed support code for the defunct experimental_ham_spam_imbalance_adjustment option Mark Hammond 14/12/2003 Fix [ 856628 ] reload(Options) fails in windows binary *************** *** 48,51 **** --- 100,107 ---- Skip Montanaro 10/12/2003 Add ability for "x-" options (deprecated, or experimental). Mark Hammond 10/12/2003 Outlook: Try and add the Spam field to the 'Unsure' folder in the same way we do for the Spam and watch folders. + Mark Hammond 10/12/2003 Fix [ 856141 ] Spam field not added to unsure or empty folders + Mark Hammond 08/12/2003 Outlook: Add/Fix a number of 'unicode file' related comments. + Mark Hammond 08/12/2003 Outlook: Allow multiple manager objects to work in the same process (but not at the same time): + Mark Hammond 08/12/2003 Outlook: A number of changes to better support us existing in the 'COM Addins' list when running the binary version Tony Meyer 04/12/2003 Tray app: Change the default (double-click) behaviour of the tray to "review messages" rather than "display information". Tony Meyer 04/12/2003 Tray app: use SetDefaultItem (so the default action is in bold in the menu). Index: WHAT_IS_NEW.txt =================================================================== RCS file: /cvsroot/spambayes/spambayes/WHAT_IS_NEW.txt,v retrieving revision 1.24 retrieving revision 1.25 diff -C2 -d -r1.24 -r1.25 *** WHAT_IS_NEW.txt 12 Jan 2004 08:36:14 -0000 1.24 --- WHAT_IS_NEW.txt 12 Jan 2004 23:39:42 -0000 1.25 *************** *** 1,9 **** This file covers the major changes between each release. For more details, the reader is referred to the changelog (changelog.txt in the main directory ! of the archive), or for extreme details, to the check-ins archive (please see ) ! Changes are broken into sections for each application, plus one that will ! probably only interest developers, and one for everything else. Any actions necessary to move to this release from the previous release are --- 1,9 ---- This file covers the major changes between each release. For more details, the reader is referred to the changelog (changelog.txt in the main directory ! of the archive), or for extreme details, to the check-ins archive (see ) ! Changes are broken into sections, so that it's easier for you to find the ! changes that are relevant to you. Any actions necessary to move to this release from the previous release are *************** *** 57,60 **** --- 57,70 ---- watched folders. o Improve matters when the default (Outlook message) store is offline. + o If the user attempts to close the Manager dialog while there is a + problem preventing us being enabled, confirm they really want to close + it. + o Try and tone down the toolbar message in the log to prevent people + reporting it as a bug. + o When the 'New Folder' button was used to create a folder, that folder + was not used when you closed the dialog, even though it was selected. + o Add Spam field to unsure and empty folders. + o Fix things so that the plug-in should better appear in the "COM Addins" + list when running the binary version. POP3 Proxy / SMTP Proxy *************** *** 85,88 **** --- 95,107 ---- o Fixed an infinite loop when you break the browser connection to sb_server when sb_server is busy training. + o New options "Ham Discard Level" and "Spam Discard Level". These make the + interface default to discarding hams/spams in the training interface. + o UserInterface: Split digest auth info properly. + o Default to twenty search results rather than just one. + o The status message wasn't updated as often as it should have been. + o Output plurals correctly in stats information. + o We printed out false positive numbers in the false negatives section of + the stats, and vice versa. + o Quote IMAP folder names when displaying them. POP3 Proxy Service / POP3 Proxy Tray Application *************** *** 93,96 **** --- 112,116 ---- display the default in bold. o If a proxy is already running, don't start the service. + o When we stopped the proxy and then restarted it didn't work. IMAP Filter *************** *** 100,103 **** --- 120,124 ---- o If sb_imapfilter.py is run without any switches, just serve the web interface (but don't launch a browser). + o Ignore errors that occur when parsing a message. sb_filter *************** *** 119,124 **** --- 140,150 ---- o Many improvements to the mksets.py script. o Many improvements to the rebal.py script. + o Many improvements to the sort+group.py script. o Many improvements to the export.py script (for Outlook). + o Added additional input/output methods to mkgraph.py. + o Improvements to the documentation for mkgraph.py, regimes.py and + incremental.py. o Added a makefile to the testtools directory to make using timcv.py easier. + o Added a new regime - "balanced_corrected". Tokenizer *************** *** 138,141 **** --- 164,169 ---- ------- o Option names are always case insensitive, no matter what. + o Non-absolute file/path options are relative to the last configuration + file loaded, not the current working directory, as previously. o Moved the option loading code to a function. o Generalized the DirOfTxtFileMailbox class in mboxutils to assume all *************** *** 147,154 **** o Fix bug where if one was using Python 2.2, Windows and bsddb the database would never open correctly. - o New script: sb_evoscore.py - A shim script between sb_xmlrpcserver.py - and Ximian Evolution. o Fix the pspam scripts, muttrc and spambayes.el so that they work with the current SpamBayes package. --- 175,193 ---- o Fix bug where if one was using Python 2.2, Windows and bsddb the database would never open correctly. o Fix the pspam scripts, muttrc and spambayes.el so that they work with the current SpamBayes package. + o New script: sb_evoscore.py - A shim script between sb_xmlrpcserver.py + and Ximian Evolution. + o New script: mkreversemap.py - generates a pickle file mapping features + to mailbox files and message-id's. + o New script: extractmessages.py - use with mkreversemap.py to identify + messages in your training database which contain interesting tokens. + o New script: hammer.py: Hammers the core SpamBayes code, repeatedly + training and classifying using faked-up messages. + o Previous releases have included the sb_pop3dnd.py script (once named + sb_overkill.py). With this release, this script should be fully + usable. It provides the same POP3 proxy as sb_server, but also + provides a local IMAP server so that you can train messages by dragging + and dropping them within the mail client. *************** *** 173,177 **** =================== The following bugs tracked via the Sourceforge system were fixed: ! 818871, 833439, 803798, 787676, 860410, 856628, 859215 A URL containing the details of these bugs can be made by appending the --- 212,217 ---- =================== The following bugs tracked via the Sourceforge system were fixed: ! 818871, 833439, 803798, 787676, 860410, 856628, 859215, 856141, 842984, ! 872044, 805852, 874784 A URL containing the details of these bugs can be made by appending the *************** *** 182,189 **** Feature Requests Added ====================== ! The following feature requests tracked via the Sourceforge system were added for this release: 827138 Patches integrated =================== --- 222,233 ---- Feature Requests Added ====================== ! The following feature request tracked via the Sourceforge system was added for this release: 827138 + A url containing the details of these feature requests can be made by + appending the request number to this url: + http://sourceforge.net/tracker/index.php?func=detail&group_id=61702&atid=498104&aid= + Patches integrated =================== *************** *** 192,195 **** --- 236,243 ---- 842464, 831388, 809008, 831388 + A url containing the details of these feature requests can be made by + appending the request number to this url: + http://sourceforge.net/tracker/index.php?func=detail&group_id=61702&atid=498105&aid= + Deprecated Options ================== *************** *** 242,246 **** spambayes@python.org with anacdotal results after a period of time, or the full testtools scripts can be used. For details about using these, please ! read the "README-DEVEL.txt" file that comes with the SpamBayes archive. Experimental options are always turned off by default. --- 290,295 ---- spambayes@python.org with anacdotal results after a period of time, or the full testtools scripts can be used. For details about using these, please ! read the "README-DEVEL.txt" file that comes with the SpamBayes source ! archive. Experimental options are always turned off by default. *************** *** 268,269 **** --- 317,321 ---- message is retrieved and used, if it makes a difference to the classification. + + o [Tokenizer] x-pick_apart_urls + Pick out some semantic bits from URLs. From montanaro at users.sourceforge.net Wed Jan 14 22:05:24 2004 From: montanaro at users.sourceforge.net (Skip Montanaro) Date: Wed Jan 14 22:05:28 2004 Subject: [Spambayes-checkins] spambayes/utilities extractmessages.py, 1.2, 1.3 Message-ID: Update of /cvsroot/spambayes/spambayes/utilities In directory sc8-pr-cvs1:/tmp/cvs-serv10971 Modified Files: extractmessages.py Log Message: restart the message counter for each mailbox Index: extractmessages.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/utilities/extractmessages.py,v retrieving revision 1.2 retrieving revision 1.3 diff -C2 -d -r1.2 -r1.3 *** extractmessages.py 6 Jan 2004 15:45:10 -0000 1.2 --- extractmessages.py 15 Jan 2004 03:05:22 -0000 1.3 *************** *** 37,41 **** def extractmessages(features, mapdb, hamfile, spamfile): """extract messages which contain given features""" - i = 0 hamids = {} spamids = {} --- 37,40 ---- *************** *** 56,61 **** # now run through each mailbox in hamids and spamids and print # matching messages to relevant ham or spam files - i = 0 for mailfile in hamids: msgids = hamids[mailfile] for msg in getmbox(mailfile): --- 55,60 ---- # now run through each mailbox in hamids and spamids and print # matching messages to relevant ham or spam files for mailfile in hamids: + i = 0 msgids = hamids[mailfile] for msg in getmbox(mailfile): *************** *** 68,71 **** --- 67,71 ---- for mailfile in spamids: + i = 0 msgids = spamids[mailfile] for msg in getmbox(mailfile): From montanaro at users.sourceforge.net Wed Jan 14 22:15:40 2004 From: montanaro at users.sourceforge.net (Skip Montanaro) Date: Wed Jan 14 22:15:44 2004 Subject: [Spambayes-checkins] spambayes/utilities loosecksum.py,1.3,1.4 Message-ID: Update of /cvsroot/spambayes/spambayes/utilities In directory sc8-pr-cvs1:/tmp/cvs-serv12621 Modified Files: loosecksum.py Log Message: allow multiple mailboxes on the command line, not just a single message on stdin or a file containing one message Index: loosecksum.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/utilities/loosecksum.py,v retrieving revision 1.3 retrieving revision 1.4 diff -C2 -d -r1.3 -r1.4 *** loosecksum.py 21 Jan 2003 21:19:09 -0000 1.3 --- loosecksum.py 15 Jan 2004 03:15:38 -0000 1.4 *************** *** 33,36 **** --- 33,38 ---- import binascii + from spambayes.mboxutils import getmbox + def flatten(obj): # I do not know how to use the email package very well - all I want here *************** *** 46,51 **** raise TypeError, ("unrecognized body type: %s" % type(obj)) ! def generate_checksum(f): ! data = flatten(email.Parser.Parser().parse(f)) # modelled after Justin Mason's fuzzy checksummer for SpamAssassin. --- 48,53 ---- raise TypeError, ("unrecognized body type: %s" % type(obj)) ! def generate_checksum(msg): ! data = flatten(msg) # modelled after Justin Mason's fuzzy checksummer for SpamAssassin. *************** *** 87,95 **** pass if not args: ! inf = sys.stdin else: ! inf = file(args[0]) ! print generate_checksum(inf) if __name__ == "__main__": --- 89,99 ---- pass if not args: ! mboxes = [getmbox("-")] else: ! mboxes = [getmbox(a) for a in args] ! for mbox in mboxes: ! for msg in mbox: ! print generate_checksum(msg) if __name__ == "__main__": From montanaro at users.sourceforge.net Wed Jan 14 22:18:08 2004 From: montanaro at users.sourceforge.net (Skip Montanaro) Date: Wed Jan 14 22:18:12 2004 Subject: [Spambayes-checkins] spambayes/scripts sb_dbexpimp.py,1.4,1.5 Message-ID: Update of /cvsroot/spambayes/spambayes/scripts In directory sc8-pr-cvs1:/tmp/cvs-serv13021 Modified Files: sb_dbexpimp.py Log Message: add -o option to allow users to set arbitrary global options from the command line Index: sb_dbexpimp.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/scripts/sb_dbexpimp.py,v retrieving revision 1.4 retrieving revision 1.5 diff -C2 -d -r1.4 -r1.5 *** sb_dbexpimp.py 16 Dec 2003 05:06:33 -0000 1.4 --- sb_dbexpimp.py 15 Jan 2004 03:18:06 -0000 1.5 *************** *** 55,58 **** --- 55,61 ---- wordinfo will be merged into an existing database. Run dbExpImp -h for more information. + -o: section:option:value : + set [section, option] in the options database to value + -h : help *************** *** 233,237 **** try: ! opts, args = getopt.getopt(sys.argv[1:], 'iehmvd:D:f:') except getopt.error, msg: print >>sys.stderr, str(msg) + '\n\n' + __doc__ --- 236,240 ---- try: ! opts, args = getopt.getopt(sys.argv[1:], 'iehmvd:D:f:o:') except getopt.error, msg: print >>sys.stderr, str(msg) + '\n\n' + __doc__ *************** *** 266,269 **** --- 269,274 ---- elif opt == '-v': options["globals", "verbose"] = True + elif opt in ('-o', '--option'): + options.set_from_cmdline(arg, sys.stderr) if (dbFN and flatFN): From montanaro at users.sourceforge.net Wed Jan 14 22:23:28 2004 From: montanaro at users.sourceforge.net (Skip Montanaro) Date: Wed Jan 14 22:23:31 2004 Subject: [Spambayes-checkins] spambayes/scripts sb_imapfilter.py,1.19,1.20 Message-ID: Update of /cvsroot/spambayes/spambayes/scripts In directory sc8-pr-cvs1:/tmp/cvs-serv13800 Modified Files: sb_imapfilter.py Log Message: add -o option to allow users to set arbitrary global options from the command line Index: sb_imapfilter.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/scripts/sb_imapfilter.py,v retrieving revision 1.19 retrieving revision 1.20 diff -C2 -d -r1.19 -r1.20 *** sb_imapfilter.py 12 Jan 2004 14:19:28 -0000 1.19 --- sb_imapfilter.py 15 Jan 2004 03:23:26 -0000 1.20 *************** *** 26,29 **** --- 26,33 ---- -l minutes : period of time between filtering operations -b : Launch a web browser showing the user interface. + -o section:option:value : + set [section, option] in the options database + to value + Examples: *************** *** 697,701 **** global imap try: ! opts, args = getopt.getopt(sys.argv[1:], 'hbtcvpl:e:i:d:D:') except getopt.error, msg: print >>sys.stderr, str(msg) + '\n\n' + __doc__ --- 701,705 ---- global imap try: ! opts, args = getopt.getopt(sys.argv[1:], 'hbtcvpl:e:i:d:D:o:') except getopt.error, msg: print >>sys.stderr, str(msg) + '\n\n' + __doc__ *************** *** 743,746 **** --- 747,752 ---- elif opt == '-l': sleepTime = int(arg) * 60 + elif opt == '-o': + options.set_from_cmdline(arg, sys.stderr) # Let the user know what they are using... From montanaro at users.sourceforge.net Wed Jan 14 22:27:42 2004 From: montanaro at users.sourceforge.net (Skip Montanaro) Date: Wed Jan 14 22:27:45 2004 Subject: [Spambayes-checkins] spambayes/scripts sb_mboxtrain.py,1.7,1.8 Message-ID: Update of /cvsroot/spambayes/spambayes/scripts In directory sc8-pr-cvs1:/tmp/cvs-serv14475a Modified Files: sb_mboxtrain.py Log Message: add -o option to allow users to set arbitrary global options from the command line Index: sb_mboxtrain.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/scripts/sb_mboxtrain.py,v retrieving revision 1.7 retrieving revision 1.8 diff -C2 -d -r1.7 -r1.8 *** sb_mboxtrain.py 17 Nov 2003 21:47:47 -0000 1.7 --- sb_mboxtrain.py 15 Jan 2004 03:27:40 -0000 1.8 *************** *** 38,41 **** --- 38,44 ---- -r remove mail which was trained on (Maildir only) + + -o section:option:value + set [section, option] in the options database to value """ *************** *** 283,287 **** try: ! opts, args = getopt.getopt(sys.argv[1:], 'hfqnrd:D:g:s:') except getopt.error, msg: usage(2, msg) --- 286,290 ---- try: ! opts, args = getopt.getopt(sys.argv[1:], 'hfqnrd:D:g:s:o:') except getopt.error, msg: usage(2, msg) *************** *** 318,321 **** --- 321,326 ---- usedb = False pck = arg + elif opt == '-o': + options.set_from_cmdline(arg, sys.stderr) if args: usage(2, "Positional arguments not allowed") From montanaro at users.sourceforge.net Wed Jan 14 22:29:18 2004 From: montanaro at users.sourceforge.net (Skip Montanaro) Date: Wed Jan 14 22:29:20 2004 Subject: [Spambayes-checkins] spambayes/scripts sb_notesfilter.py,1.5,1.6 Message-ID: Update of /cvsroot/spambayes/spambayes/scripts In directory sc8-pr-cvs1:/tmp/cvs-serv14833 Modified Files: sb_notesfilter.py Log Message: add -o option to allow users to set arbitrary global options from the command line Index: sb_notesfilter.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/scripts/sb_notesfilter.py,v retrieving revision 1.5 retrieving revision 1.6 diff -C2 -d -r1.5 -r1.6 *** sb_notesfilter.py 16 Dec 2003 05:06:33 -0000 1.5 --- sb_notesfilter.py 15 Jan 2004 03:29:16 -0000 1.6 *************** *** 103,106 **** --- 103,109 ---- statistics output would otherwise be lost when the window closes. + -o section:option:value : + set [section, option] in the options database + to value Examples: *************** *** 344,348 **** try: ! opts, args = getopt.getopt(sys.argv[1:], 'htcpd:D:l:r:f:') except getopt.error, msg: print >>sys.stderr, str(msg) + '\n\n' + __doc__ --- 347,351 ---- try: ! opts, args = getopt.getopt(sys.argv[1:], 'htcpd:D:l:r:f:o:') except getopt.error, msg: print >>sys.stderr, str(msg) + '\n\n' + __doc__ *************** *** 379,382 **** --- 382,387 ---- elif opt == '-p': doPrompt = True + elif opt == '-o': + options.set_from_cmdline(arg, sys.stderr) if (bdbname and ldbname and sbfname and (doTrain or doClassify)): From montanaro at users.sourceforge.net Wed Jan 14 22:30:27 2004 From: montanaro at users.sourceforge.net (Skip Montanaro) Date: Wed Jan 14 22:30:31 2004 Subject: [Spambayes-checkins] spambayes/scripts sb_pop3dnd.py,1.7,1.8 Message-ID: Update of /cvsroot/spambayes/spambayes/scripts In directory sc8-pr-cvs1:/tmp/cvs-serv15135 Modified Files: sb_pop3dnd.py Log Message: add -o option to allow users to set arbitrary global options from the command line Index: sb_pop3dnd.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/scripts/sb_pop3dnd.py,v retrieving revision 1.7 retrieving revision 1.8 diff -C2 -d -r1.7 -r1.8 *** sb_pop3dnd.py 12 Jan 2004 08:43:55 -0000 1.7 --- sb_pop3dnd.py 15 Jan 2004 03:30:25 -0000 1.8 *************** *** 968,972 **** # Read the arguments. try: ! opts, args = getopt.getopt(sys.argv[1:], 'hbd:D:u:') except getopt.error, msg: print >>sys.stderr, str(msg) + '\n\n' + __doc__ --- 968,972 ---- # Read the arguments. try: ! opts, args = getopt.getopt(sys.argv[1:], 'hbd:D:u:o:') except getopt.error, msg: print >>sys.stderr, str(msg) + '\n\n' + __doc__ *************** *** 980,983 **** --- 980,985 ---- elif opt == '-b': launchUI = True + elif opt == '-o': + options.set_from_cmdline(arg, sys.stderr) # Let the user know what they are using... From montanaro at users.sourceforge.net Wed Jan 14 22:34:58 2004 From: montanaro at users.sourceforge.net (Skip Montanaro) Date: Wed Jan 14 22:35:01 2004 Subject: [Spambayes-checkins] spambayes/scripts sb_server.py,1.18,1.19 Message-ID: Update of /cvsroot/spambayes/spambayes/scripts In directory sc8-pr-cvs1:/tmp/cvs-serv15793 Modified Files: sb_server.py Log Message: add -o option to allow users to set arbitrary global options from the command line Index: sb_server.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/scripts/sb_server.py,v retrieving revision 1.18 retrieving revision 1.19 diff -C2 -d -r1.18 -r1.19 *** sb_server.py 12 Jan 2004 08:36:15 -0000 1.18 --- sb_server.py 15 Jan 2004 03:34:56 -0000 1.19 *************** *** 26,29 **** --- 26,33 ---- -b : Launch a web browser showing the user interface. + -o section:option:value : + set [section, option] in the options database + to value + All command line arguments and switches take their default values from the [pop3proxy] and [html_ui] sections of *************** *** 882,886 **** # Read the arguments. try: ! opts, args = getopt.getopt(sys.argv[1:], 'hbpsd:D:l:u:') except getopt.error, msg: print >>sys.stderr, str(msg) + '\n\n' + __doc__ --- 886,890 ---- # Read the arguments. try: ! opts, args = getopt.getopt(sys.argv[1:], 'hbpsd:D:l:u:o:') except getopt.error, msg: print >>sys.stderr, str(msg) + '\n\n' + __doc__ *************** *** 908,911 **** --- 912,917 ---- elif opt == '-u': state.uiPort = int(arg) + elif opt == '-o': + options.set_from_cmdline(arg, sys.stderr) # Let the user know what they are using... From montanaro at users.sourceforge.net Wed Jan 14 22:39:13 2004 From: montanaro at users.sourceforge.net (Skip Montanaro) Date: Wed Jan 14 22:39:16 2004 Subject: [Spambayes-checkins] spambayes/scripts sb_upload.py,1.1,1.2 Message-ID: Update of /cvsroot/spambayes/spambayes/scripts In directory sc8-pr-cvs1:/tmp/cvs-serv16383 Modified Files: sb_upload.py Log Message: add -o option to allow users to set arbitrary global options from the command line Index: sb_upload.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/scripts/sb_upload.py,v retrieving revision 1.1 retrieving revision 1.2 diff -C2 -d -r1.1 -r1.2 *** sb_upload.py 5 Sep 2003 01:16:46 -0000 1.1 --- sb_upload.py 15 Jan 2004 03:39:11 -0000 1.2 *************** *** 6,9 **** --- 6,10 ---- usage: %(progname)s [-h] [-n] [-s server] [-p port] [-r N] + [-o section:option:value] Options: *************** *** 13,16 **** --- 14,18 ---- -p, --port= - provide alternate server port (default %(port)s) -r, --prob= - feed the message to the trainer w/ prob N [0.0...1.0] + -o, --option= - set [section, option] in the options database to value """ *************** *** 96,102 **** try: ! opts, args = getopt.getopt(argv, "hns:p:r:", ["help", "null", "server=", "port=", ! "prob="]) except getopt.error: usage(globals(), locals()) --- 98,104 ---- try: ! opts, args = getopt.getopt(argv, "hns:p:r:o:", ["help", "null", "server=", "port=", ! "prob=", "option="]) except getopt.error: usage(globals(), locals()) *************** *** 119,122 **** --- 121,126 ---- sys.exit(1) prob = n + elif opt in ('-o', '--option'): + options.set_from_cmdline(arg, sys.stderr) if args: From montanaro at users.sourceforge.net Wed Jan 14 22:40:26 2004 From: montanaro at users.sourceforge.net (Skip Montanaro) Date: Wed Jan 14 22:40:30 2004 Subject: [Spambayes-checkins] spambayes/scripts sb_xmlrpcserver.py,1.4,1.5 Message-ID: Update of /cvsroot/spambayes/spambayes/scripts In directory sc8-pr-cvs1:/tmp/cvs-serv16598 Modified Files: sb_xmlrpcserver.py Log Message: add -o option to allow users to set arbitrary global options from the command line Index: sb_xmlrpcserver.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/scripts/sb_xmlrpcserver.py,v retrieving revision 1.4 retrieving revision 1.5 diff -C2 -d -r1.4 -r1.5 *** sb_xmlrpcserver.py 12 Jan 2004 08:36:15 -0000 1.4 --- sb_xmlrpcserver.py 15 Jan 2004 03:40:24 -0000 1.5 *************** *** 14,17 **** --- 14,19 ---- -d use the DBM store instead of cPickle. + -o section:option:value + set [section, option] in the options database to value IP *************** *** 72,76 **** """Main program; parse options and go.""" try: ! opts, args = getopt.getopt(sys.argv[1:], 'hdp:') except getopt.error, msg: usage(2, msg) --- 74,78 ---- """Main program; parse options and go.""" try: ! opts, args = getopt.getopt(sys.argv[1:], 'hdp:o:') except getopt.error, msg: usage(2, msg) *************** *** 88,91 **** --- 90,95 ---- elif opt == "-d": usedb = True + elif opt == '-o': + options.set_from_cmdline(arg, sys.stderr) if len(args) != 1: From anadelonbrin at users.sourceforge.net Thu Jan 15 18:10:38 2004 From: anadelonbrin at users.sourceforge.net (Tony Meyer) Date: Thu Jan 15 18:10:42 2004 Subject: [Spambayes-checkins] website faq.txt, 1.56, 1.57 windows.ht, 1.30, 1.31 Message-ID: Update of /cvsroot/spambayes/website In directory sc8-pr-cvs1:/tmp/cvs-serv14283 Modified Files: faq.txt windows.ht Log Message: We *think* that the plug-in works with Win95, so make that consistent. Also change the FAQ wording a bit, because I'm sure we don't work with Win3.1 . Index: faq.txt =================================================================== RCS file: /cvsroot/spambayes/website/faq.txt,v retrieving revision 1.56 retrieving revision 1.57 diff -C2 -d -r1.56 -r1.57 *** faq.txt 4 Jan 2004 18:17:27 -0000 1.56 --- faq.txt 15 Jan 2004 23:10:36 -0000 1.57 *************** *** 439,444 **** ----------------------------------------------------- ! To our knowledge, the current version of the plug-in should work with any ! version of Windows and Outlook 2000 or above. The `troubleshooting guide`_ for the Outlook plugin contains the most up-to-date help for working around known problems. --- 439,444 ---- ----------------------------------------------------- ! To our knowledge, the current version of the plug-in should work with ! Windows 95 and above and Outlook 2000 or above. The `troubleshooting guide`_ for the Outlook plugin contains the most up-to-date help for working around known problems. Index: windows.ht =================================================================== RCS file: /cvsroot/spambayes/website/windows.ht,v retrieving revision 1.30 retrieving revision 1.31 diff -C2 -d -r1.30 -r1.31 *** windows.ht 18 Dec 2003 17:56:13 -0000 1.30 --- windows.ht 15 Jan 2004 23:10:36 -0000 1.31 *************** *** 19,23 ****

      The Outlook addin is an application of the SpamBayes project. It works with ! Outlook 2000 and later on Windows 98 and later. It does not work with Outlook Express - see below for other options

      --- 19,23 ----

      The Outlook addin is an application of the SpamBayes project. It works with ! Outlook 2000 and later on Windows 95 and later. It does not work with Outlook Express - see below for other options

      From tim_one at users.sourceforge.net Mon Jan 19 12:56:20 2004 From: tim_one at users.sourceforge.net (Tim Peters) Date: Mon Jan 19 12:56:35 2004 Subject: [Spambayes-checkins] spambayes/testtools simplexloop.py,1.4,1.5 Message-ID: Update of /cvsroot/spambayes/spambayes/testtools In directory sc8-pr-cvs1:/tmp/cvs-serv477/testtools Modified Files: simplexloop.py Log Message: The code was syntactically invalid (missing a "]"). Index: simplexloop.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/testtools/simplexloop.py,v retrieving revision 1.4 retrieving revision 1.5 diff -C2 -d -r1.4 -r1.5 *** simplexloop.py 5 Sep 2003 01:15:29 -0000 1.4 --- simplexloop.py 19 Jan 2004 17:56:16 -0000 1.5 *************** *** 33,37 **** from spambayes import Options ! start = (Options.options["Tokenizer", "unknown_word_prob", Options.options["Tokenzier", "minimum_prob_strength"], Options.options["Tokenizer", "unknown_word_strength"]) --- 33,37 ---- from spambayes import Options ! start = (Options.options["Tokenizer", "unknown_word_prob"], Options.options["Tokenzier", "minimum_prob_strength"], Options.options["Tokenizer", "unknown_word_strength"]) From tim_one at users.sourceforge.net Mon Jan 19 12:58:33 2004 From: tim_one at users.sourceforge.net (Tim Peters) Date: Mon Jan 19 12:58:37 2004 Subject: [Spambayes-checkins] spambayes/testtools mkgraph.py,1.4,1.5 Message-ID: Update of /cvsroot/spambayes/spambayes/testtools In directory sc8-pr-cvs1:/tmp/cvs-serv1021/testtools Modified Files: mkgraph.py Log Message: Whitespace normalization. Index: mkgraph.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/testtools/mkgraph.py,v retrieving revision 1.4 retrieving revision 1.5 diff -C2 -d -r1.4 -r1.5 *** mkgraph.py 11 Jan 2004 03:15:07 -0000 1.4 --- mkgraph.py 19 Jan 2004 17:58:31 -0000 1.5 *************** *** 99,103 **** Output.add_line(k, (n * 100.0 / (d or 1))) ! Output.line_title(linelabel="unsure", linecolor=2) for k in xrange(len(nspam_unsure)): n = nham_unsure[k] + nspam_unsure[k] --- 99,103 ---- Output.add_line(k, (n * 100.0 / (d or 1))) ! Output.line_title(linelabel="unsure", linecolor=2) for k in xrange(len(nspam_unsure)): n = nham_unsure[k] + nspam_unsure[k] From tim_one at users.sourceforge.net Mon Jan 19 12:58:33 2004 From: tim_one at users.sourceforge.net (Tim Peters) Date: Mon Jan 19 12:58:39 2004 Subject: [Spambayes-checkins] spambayes/spambayes Options.py, 1.101, 1.102 Stats.py, 1.4, 1.5 oe_mailbox.py, 1.8, 1.9 Message-ID: Update of /cvsroot/spambayes/spambayes/spambayes In directory sc8-pr-cvs1:/tmp/cvs-serv1021/spambayes Modified Files: Options.py Stats.py oe_mailbox.py Log Message: Whitespace normalization. Index: Options.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/spambayes/Options.py,v retrieving revision 1.101 retrieving revision 1.102 diff -C2 -d -r1.101 -r1.102 *** Options.py 12 Jan 2004 08:36:15 -0000 1.101 --- Options.py 19 Jan 2004 17:58:20 -0000 1.102 *************** *** 913,925 **** ("ham_discard_level", "Ham Discard Level", 0.0, """Hams scoring less than this percentage will default to being ! discarded in the training interface (they won't be trained). You'll ! need to turn off the 'Train when filtering' option, above, for this to have any effect""", REAL, RESTORE), ("spam_discard_level", "Spam Discard Level", 100.0, ! """Spams scoring more than this percentage will default to being ! discarded in the training interface (they won't be trained). You'll ! need to turn off the 'Train when filtering' option, above, for this to have any effect""", REAL, RESTORE), --- 913,925 ---- ("ham_discard_level", "Ham Discard Level", 0.0, """Hams scoring less than this percentage will default to being ! discarded in the training interface (they won't be trained). You'll ! need to turn off the 'Train when filtering' option, above, for this to have any effect""", REAL, RESTORE), ("spam_discard_level", "Spam Discard Level", 100.0, ! """Spams scoring more than this percentage will default to being ! discarded in the training interface (they won't be trained). You'll ! need to turn off the 'Train when filtering' option, above, for this to have any effect""", REAL, RESTORE), Index: Stats.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/spambayes/Stats.py,v retrieving revision 1.4 retrieving revision 1.5 diff -C2 -d -r1.4 -r1.5 *** Stats.py 30 Dec 2003 22:41:49 -0000 1.4 --- Stats.py 19 Jan 2004 17:58:28 -0000 1.5 *************** *** 109,113 **** else: format_dict[key] = 'were' ! push("SpamBayes has processed %(num_seen)d message%(sp1)s - " \ "%(cls_ham)d (%(perc_ham).0f%%) good, " \ --- 109,113 ---- else: format_dict[key] = 'were' ! push("SpamBayes has processed %(num_seen)d message%(sp1)s - " \ "%(cls_ham)d (%(perc_ham).0f%%) good, " \ Index: oe_mailbox.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/spambayes/oe_mailbox.py,v retrieving revision 1.8 retrieving revision 1.9 diff -C2 -d -r1.8 -r1.9 *** oe_mailbox.py 11 Jan 2004 00:06:24 -0000 1.8 --- oe_mailbox.py 19 Jan 2004 17:58:28 -0000 1.9 *************** *** 557,563 **** # win32con to be available for the module to be imported. permission = win32con.KEY_READ | win32con.KEY_SET_VALUE ! possible_root_keys = [] ! # This appears to be the place for OE6 and WinXP # (So I'm guessing also for NT4) --- 557,563 ---- # win32con to be available for the module to be imported. permission = win32con.KEY_READ | win32con.KEY_SET_VALUE ! possible_root_keys = [] ! # This appears to be the place for OE6 and WinXP # (So I'm guessing also for NT4) From tim_one at users.sourceforge.net Mon Jan 19 12:58:52 2004 From: tim_one at users.sourceforge.net (Tim Peters) Date: Mon Jan 19 12:58:56 2004 Subject: [Spambayes-checkins] spambayes/scripts sb_imapfilter.py, 1.20, 1.21 sb_pop3dnd.py, 1.8, 1.9 Message-ID: Update of /cvsroot/spambayes/spambayes/scripts In directory sc8-pr-cvs1:/tmp/cvs-serv1021/scripts Modified Files: sb_imapfilter.py sb_pop3dnd.py Log Message: Whitespace normalization. Index: sb_imapfilter.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/scripts/sb_imapfilter.py,v retrieving revision 1.20 retrieving revision 1.21 diff -C2 -d -r1.20 -r1.21 *** sb_imapfilter.py 15 Jan 2004 03:23:26 -0000 1.20 --- sb_imapfilter.py 19 Jan 2004 17:58:16 -0000 1.21 *************** *** 28,32 **** -o section:option:value : set [section, option] in the options database ! to value --- 28,32 ---- -o section:option:value : set [section, option] in the options database ! to value Index: sb_pop3dnd.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/scripts/sb_pop3dnd.py,v retrieving revision 1.8 retrieving revision 1.9 diff -C2 -d -r1.8 -r1.9 *** sb_pop3dnd.py 15 Jan 2004 03:30:25 -0000 1.8 --- sb_pop3dnd.py 19 Jan 2004 17:58:20 -0000 1.9 *************** *** 213,217 **** """Indicate whether this message has subparts.""" return False ! def getSubPart(self, part): """Retrieve a MIME sub-message --- 213,217 ---- """Indicate whether this message has subparts.""" return False ! def getSubPart(self, part): """Retrieve a MIME sub-message From anadelonbrin at users.sourceforge.net Tue Jan 20 21:26:00 2004 From: anadelonbrin at users.sourceforge.net (Tony Meyer) Date: Tue Jan 20 21:26:04 2004 Subject: [Spambayes-checkins] website faq.txt, 1.57, 1.58 windows.ht, 1.31, 1.32 Message-ID: Update of /cvsroot/spambayes/website In directory sc8-pr-cvs1:/tmp/cvs-serv31914 Modified Files: faq.txt windows.ht Log Message: OK, so we *don't* think that the plug-in will necessarily work with Win95, according to Mark, although maybe it will if IE is up-to-date enough (Mark also mentioned Office, but that must be, surely, if they're using Outlook 2k). Anyway, retract the promise and add a glimmer of hope for those few souls left in the Win95 world. Index: faq.txt =================================================================== RCS file: /cvsroot/spambayes/website/faq.txt,v retrieving revision 1.57 retrieving revision 1.58 diff -C2 -d -r1.57 -r1.58 *** faq.txt 15 Jan 2004 23:10:36 -0000 1.57 --- faq.txt 21 Jan 2004 02:25:58 -0000 1.58 *************** *** 440,446 **** To our knowledge, the current version of the plug-in should work with ! Windows 95 and above and Outlook 2000 or above. The `troubleshooting guide`_ ! for the Outlook plugin contains the most up-to-date help for working around ! known problems. .. _troubleshooting guide: http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/*checkout*/spambayes/spambayes/Outlook2000/docs/troubleshooting.html?rev=HEAD&content-type=text/html --- 440,448 ---- To our knowledge, the current version of the plug-in should work with ! Windows 98 and above and Outlook 2000 or above. You may be able to get the ! plug-in to work with Windows 95 if you install the most recent version of ! Internet Explorer possible, but we are not certain about this. The ! `troubleshooting guide`_ for the Outlook plugin contains the most ! up-to-date help for working around known problems. .. _troubleshooting guide: http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/*checkout*/spambayes/spambayes/Outlook2000/docs/troubleshooting.html?rev=HEAD&content-type=text/html Index: windows.ht =================================================================== RCS file: /cvsroot/spambayes/website/windows.ht,v retrieving revision 1.31 retrieving revision 1.32 diff -C2 -d -r1.31 -r1.32 *** windows.ht 15 Jan 2004 23:10:36 -0000 1.31 --- windows.ht 21 Jan 2004 02:25:58 -0000 1.32 *************** *** 18,25 ****

      General Information

      ! The Outlook addin is an application of the SpamBayes project. It works with ! Outlook 2000 and later on Windows 95 and later. It does not work with ! Outlook Express - see below for other options !

      In general, you should download and install the latest version, as shown above. Older versions and the release notes can be viewed at the --- 18,28 ----

      General Information

      ! The Outlook addin is an application of the SpamBayes project. To our ! knowledge, the current version of the plug-in should work with ! Windows 98 and above and Outlook 2000 or above. You may be able to get the ! plug-in to work with Windows 95 if you install the most recent version of ! Internet Explorer possible, but we are not certain about this. It does ! not work with Outlook Express - see below for ! other options.

      In general, you should download and install the latest version, as shown above. Older versions and the release notes can be viewed at the From kpitt at users.sourceforge.net Wed Jan 21 14:40:53 2004 From: kpitt at users.sourceforge.net (Kenny Pitt) Date: Wed Jan 21 14:40:56 2004 Subject: [Spambayes-checkins] spambayes/Outlook2000 addin.py,1.124,1.125 Message-ID: Update of /cvsroot/spambayes/spambayes/Outlook2000 In directory sc8-pr-cvs1:/tmp/cvs-serv27832 Modified Files: addin.py Log Message: Fix confusing error log message. Timer delay values are configured in seconds, not milliseconds. Index: addin.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/Outlook2000/addin.py,v retrieving revision 1.124 retrieving revision 1.125 diff -C2 -d -r1.124 -r1.125 *** addin.py 22 Dec 2003 03:16:29 -0000 1.124 --- addin.py 21 Jan 2004 19:40:51 -0000 1.125 *************** *** 293,297 **** print "*" * 50 print "The timer is configured to fire way " + too + \ ! "(delay=%s milliseconds, interval=%s milliseconds)" \ % (start_delay, interval) print "Please adjust your configuration. The timer is NOT enabled..." --- 293,297 ---- print "*" * 50 print "The timer is configured to fire way " + too + \ ! " (delay=%s seconds, interval=%s seconds)" \ % (start_delay, interval) print "Please adjust your configuration. The timer is NOT enabled..." From kpitt at users.sourceforge.net Wed Jan 21 14:48:55 2004 From: kpitt at users.sourceforge.net (Kenny Pitt) Date: Wed Jan 21 14:48:58 2004 Subject: [Spambayes-checkins] spambayes/windows spambayes.iss,1.6,1.7 Message-ID: Update of /cvsroot/spambayes/spambayes/windows In directory sc8-pr-cvs1:/tmp/cvs-serv29607 Modified Files: spambayes.iss Log Message: Minor fixes to capitalization, spelling, and punctuation. Index: spambayes.iss =================================================================== RCS file: /cvsroot/spambayes/spambayes/windows/spambayes.iss,v retrieving revision 1.6 retrieving revision 1.7 diff -C2 -d -r1.6 -r1.7 *** spambayes.iss 30 Dec 2003 01:57:18 -0000 1.6 --- spambayes.iss 21 Jan 2004 19:48:53 -0000 1.7 *************** *** 109,114 **** if Result then begin closeit := 'The Outlook mail delivery agent is still running.' + #13 + #13 + ! 'If you only recently closed Outlook, wait a few seconds and click Retry' + #13 + #13 + ! 'If this message persists, you may need to log off from Windows, and try again' Result := CheckNoAppMutex('InternetMailTransport', closeit); end; --- 109,114 ---- if Result then begin closeit := 'The Outlook mail delivery agent is still running.' + #13 + #13 + ! 'If you only recently closed Outlook, wait a few seconds and click Retry.' + #13 + #13 + ! 'If this message persists, you may need to log off from Windows, and try again.' Result := CheckNoAppMutex('InternetMailTransport', closeit); end; *************** *** 118,125 **** closeit:= 'An existing SpamBayes server is already running.' + #13 + #13 + 'Please shutdown this server before installing. If the SpamBayes tray icon' + #13 + ! 'is running, Right-click it and select "Exit SpamBayes"' + #13 + 'If the Windows Service version of SpamBayes is running, please stop' + #13 + ! 'it via "Control Panel->Administrative Tools->Services"' + #13 + #13 ! 'If this message persists, you may need to restart Windows' Result := CheckNoAppMutex('SpamBayesServer', closeit); end; --- 118,125 ---- closeit:= 'An existing SpamBayes server is already running.' + #13 + #13 + 'Please shutdown this server before installing. If the SpamBayes tray icon' + #13 + ! 'is running, Right-click it and select "Exit SpamBayes".' + #13 + 'If the Windows Service version of SpamBayes is running, please stop' + #13 + ! 'it via "Control Panel->Administrative Tools->Services".' + #13 + #13 ! 'If this message persists, you may need to restart Windows.' Result := CheckNoAppMutex('SpamBayesServer', closeit); end; *************** *** 170,177 **** if InstallOutlook and not IsOutlookInstalled and not WarnedNoOutlook then begin if MsgBox( ! 'Outlook appears to not be installed.' + #13 + #13 + 'This addin only works with Microsoft Outlook 2000 and later - it' + #13 + ! 'does not work with Outlook express.' + #13 + #13 + ! 'If you know that Outlook is installed, you may with to continue.' + #13 + #13 + 'Would you like to change your selection?', mbConfirmation, MB_YESNO) = idNo then begin --- 170,177 ---- if InstallOutlook and not IsOutlookInstalled and not WarnedNoOutlook then begin if MsgBox( ! 'Outlook does not appear to be installed.' + #13 + #13 + 'This addin only works with Microsoft Outlook 2000 and later - it' + #13 + ! 'does not work with Outlook Express.' + #13 + #13 + ! 'If you know that Outlook is installed, you may wish to continue.' + #13 + #13 + 'Would you like to change your selection?', mbConfirmation, MB_YESNO) = idNo then begin *************** *** 185,189 **** if MsgBox( 'You have selected to install both the Outlook Addin and the Server/Proxy Applications.' + #13 + #13 + ! 'Unless you regularly use both Outlook and another mailer on the same system' + #13 + 'you do not need both applications.' + #13 + #13 + 'Would you like to change your selection?', --- 185,189 ---- if MsgBox( 'You have selected to install both the Outlook Addin and the Server/Proxy Applications.' + #13 + #13 + ! 'Unless you regularly use both Outlook and another mailer on the same system,' + #13 + 'you do not need both applications.' + #13 + #13 + 'Would you like to change your selection?', *************** *** 196,200 **** if not InstallOutlook and not InstallProxy then begin ! MsgBox('You must select one of the applications', mbError, MB_OK); Continue; end --- 196,200 ---- if not InstallOutlook and not InstallProxy then begin ! MsgBox('You must select one of the applications.', mbError, MB_OK); Continue; end From montanaro at users.sourceforge.net Wed Jan 21 16:38:52 2004 From: montanaro at users.sourceforge.net (Skip Montanaro) Date: Wed Jan 21 16:38:55 2004 Subject: [Spambayes-checkins] spambayes/contrib findbest.py,NONE,1.1 Message-ID: Update of /cvsroot/spambayes/spambayes/contrib In directory sc8-pr-cvs1:/tmp/cvs-serv21468 Added Files: findbest.py Log Message: script to help choose the next "best" unsure to train on --- NEW FILE: findbest.py --- #!/usr/bin/env python ''' Find the next "best" unsure message to train on. %(prog)s [ -h ] [ -s ] [ -b N ] ham spam unsure Given a number of unsure messages and a desire to keep your training database small, the question naturally arises, "Which message should I add to my database next?". A common approach might be to sort the unsures by their SpamBayes scores and train on the one which scores lowest. That is a reasonable approach, but there is no guarantee the lowest scoring unsure is in any way related to the other unsure messages. This script offers a different approach. Given an existing pile of ham and spam, it trains on them to establish a baseline, then for each message in the unsure pile, it trains on that message, scores the entire unsure pile against the resulting training database, then untrains on that message. For each such message the following output is generated: * spamprob of the candidate message * number of other unsure messages which would score as spam if it was added to the training database * overall mean of all scored messages after training * standard deviation of all scored messages after training * message-id of the candidate message With no options, all candidate unsure messages are trained and scored against. At the end of the run, a file, "best.pck" is written out which is a dictionary keyed by the overall mean rounded to three decimal places. The values are lists of message-ids which generate that mean. Three options affect the behavior of the program. If the -h flag is given, this help message is displayed and the program exits. If the -s flag is given, no messages which score as spam are tested as candidates. If the -b N flag is given, only the messages which generated the N highest means in the last run without the -b flag are tested as candidates. Because the program runtime can be very slow (O(n^2) in the number of unsure messages), if you have a fairly large pile of unsure messages, these options can speed things up dramatically. If the -b flag is used, a new "best.pck" file is not written. Typically you would run once without the -b flag, then several times with the -b flag, adding one message to the spam pile after each run. After adding several messages to your spam file, you might then redistribute the unsure pile to move spams and hams to their respective folders, then start again with a smaller unsure pile. The ham, spam and unsure command line arguments can be anything suitable for feeding to spambayes.mboxutils.getmbox(). The "best.pck" file is searched for and written to these files in this order: * best.pck in the current directory * $HOME/tmp/best.pck * $HOME/best.pck ''' import sys import os import cPickle as pickle import getopt import math from spambayes.mboxutils import getmbox from spambayes.classifier import Classifier from spambayes.hammie import Hammie from spambayes.tokenizer import tokenize from spambayes.Options import options cls = Classifier() h = Hammie(cls) def counter(tag, i): if tag: sys.stdout.write("\r%s: %4d" % (tag, i)) else: sys.stdout.write("\r%4d" % i) sys.stdout.flush() def learn(mbox, h, is_spam): i = 0 tag = is_spam and "Spam" or "Ham" for msg in getmbox(mbox): counter(tag, i) i += 1 h.train(msg, is_spam) print def score(unsure, h, cls, scores, msgids=None, skipspam=False): """See what effect on others each msg in unsure has""" ham_cutoff = options["Categorization", "ham_cutoff"] spam_cutoff = options["Categorization", "spam_cutoff"] # compute a base - number of messages in unsure already in the # region of interest n = 0 total = 0.0 okalready = set() add = okalready.add for msg in getmbox(unsure): prob = cls.spamprob(tokenize(msg)) if prob >= spam_cutoff: n += 1 add(msg['message-id']) else: n += 1 total += prob first_mean = total/n print len(okalready), "messages already score as spam" print "initial mean spam prob: %.3f" % first_mean print "%5s %3s %5s %5s %s" % ("prob", "new", "mean", "sdev", "msgid") # one by one, train on each message and see what effect it has on # the other messages in the mailbox for msg in getmbox(unsure): msgid = msg['message-id'] if msgids is not None and msgid not in msgids: continue msgprob = cls.spamprob(tokenize(msg)) if skipspam and msgprob >= spam_cutoff: continue n = j = 0 h.train(msg, True) # see how many other messages in unsure now score as spam total = 0.0 probs = [] for trial in getmbox(unsure): # don't score messages which previously scored as spam if trial['message-id'] in okalready: continue n += 1 if n % 10 == 0: counter("", n) prob = cls.spamprob(tokenize(trial)) probs.append(prob) total += prob if prob >= spam_cutoff: j += 1 counter("", n) h.untrain(msg, True) mean = total/n meankey = round(mean, 3) scores.setdefault(meankey, []).append(msgid) sdev = math.sqrt(sum([(mean-prob)**2 for prob in probs])/n) print "\r%.3f %3d %.3f %.3f %s" % (msgprob, j, mean, sdev, msgid) prog = os.path.basename(sys.argv[0]) def usage(msg=None): if msg is not None: print >> sys.stderr, msg print >> sys.stderr, __doc__.strip() % globals() def main(args): try: opts, args = getopt.getopt(args, "b:sh") except getopt.error, msg: usage(msg) return 1 best = 0 skipspam = False for opt, arg in opts: if opt == "-h": usage() return 0 if opt == "-b": best = int(arg) elif opt == "-s": skipspam = True if len(args) != 3: usage("require ham, spam and unsure message piles") return 1 ham, spam, unsure = args choices = ["best.pck"] if "HOME" in os.environ: home = os.environ["HOME"] choices.append(os.path.join(home, "tmp", "best.pck")) choices.append(os.path.join(home, "best.pck")) choices.append(None) for bestfile in choices: if bestfile is None: break if os.path.exists(bestfile): break try: file(bestfile, "w") except IOError: pass else: os.unlink(bestfile) if bestfile is None: usage("can't find a place to write best.pck file") return 1 print "establish base training" learn(ham, h, False) learn(spam, h, True) print "scoring" if best: last_scores = pickle.load(file(bestfile)) last_scores = last_scores.items() last_scores.sort() msgids = set() for (k, v) in last_scores[-best:]: msgids.update(set(v)) else: msgids = None scores = {} try: score(unsure, h, cls, scores, msgids, skipspam) except KeyboardInterrupt: # allow early termination without loss of computed scores pass if not best: pickle.dump(scores, file(picklefile, 'w')) return 0 if __name__ == "__main__": sys.exit(main(sys.argv[1:])) From montanaro at users.sourceforge.net Wed Jan 21 16:44:07 2004 From: montanaro at users.sourceforge.net (Skip Montanaro) Date: Wed Jan 21 16:44:11 2004 Subject: [Spambayes-checkins] spambayes/contrib findbest.py,1.1,1.2 Message-ID: Update of /cvsroot/spambayes/spambayes/contrib In directory sc8-pr-cvs1:/tmp/cvs-serv22782 Modified Files: findbest.py Log Message: wordsmith the doc string a bit. add a small challenge. Index: findbest.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/contrib/findbest.py,v retrieving revision 1.1 retrieving revision 1.2 diff -C2 -d -r1.1 -r1.2 *** findbest.py 21 Jan 2004 21:38:49 -0000 1.1 --- findbest.py 21 Jan 2004 21:44:05 -0000 1.2 *************** *** 8,13 **** Given a number of unsure messages and a desire to keep your training database small, the question naturally arises, "Which message should I add ! to my database next?". A common approach might be to sort the unsures by ! their SpamBayes scores and train on the one which scores lowest. That is a reasonable approach, but there is no guarantee the lowest scoring unsure is in any way related to the other unsure messages. --- 8,13 ---- Given a number of unsure messages and a desire to keep your training database small, the question naturally arises, "Which message should I add ! to my database next?". A common approach is to sort the unsures by their ! SpamBayes scores and train on the one which scores lowest. This is a reasonable approach, but there is no guarantee the lowest scoring unsure is in any way related to the other unsure messages. *************** *** 59,62 **** --- 59,66 ---- * $HOME/best.pck + [To do? Someone might consider the reverse operation. Given a pile of ham + and spam, which message can be removed with the least impact? What pile of + mail should that removal be tested against?] + ''' From montanaro at users.sourceforge.net Fri Jan 23 07:37:07 2004 From: montanaro at users.sourceforge.net (Skip Montanaro) Date: Fri Jan 23 07:37:25 2004 Subject: [Spambayes-checkins] spambayes/contrib findbest.py,1.2,1.3 Message-ID: Update of /cvsroot/spambayes/spambayes/contrib In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv5888 Modified Files: findbest.py Log Message: missed an instance when renaming a variable... Index: findbest.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/contrib/findbest.py,v retrieving revision 1.2 retrieving revision 1.3 diff -C2 -d -r1.2 -r1.3 *** findbest.py 21 Jan 2004 21:44:05 -0000 1.2 --- findbest.py 23 Jan 2004 12:35:59 -0000 1.3 *************** *** 242,246 **** if not best: ! pickle.dump(scores, file(picklefile, 'w')) return 0 --- 242,246 ---- if not best: ! pickle.dump(scores, file(bestfile, 'w')) return 0 From anadelonbrin at projects.sourceforge.net Mon Jan 26 16:32:01 2004 From: anadelonbrin at projects.sourceforge.net (Tony Meyer) Date: Mon Jan 26 17:06:29 2004 Subject: [Spambayes-checkins] spambayes/spambayes ImapUI.py,1.34,1.35 Message-ID: Update of /cvsroot/spambayes/spambayes/spambayes In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv28959/spambayes Modified Files: ImapUI.py Log Message: Add options that the ProxyUI has that ImapUI should have. Index: ImapUI.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/spambayes/ImapUI.py,v retrieving revision 1.34 retrieving revision 1.35 diff -C2 -d -r1.34 -r1.35 *** ImapUI.py 5 Jan 2004 17:44:33 -0000 1.34 --- ImapUI.py 26 Jan 2004 21:31:40 -0000 1.35 *************** *** 64,69 **** ('imap', 'password'), ('imap', 'use_ssl'), - ('Interface Options', None), - ('html_ui', 'allow_remote_connections'), ('Header Options', None), ('Headers', 'notate_to'), --- 64,67 ---- *************** *** 102,105 **** --- 100,107 ---- ('Interface Options', None), ('html_ui', 'display_adv_find'), + ('html_ui', 'allow_remote_connections'), + ('html_ui', 'http_authentication'), + ('html_ui', 'http_user_name'), + ('html_ui', 'http_password'), ) From anadelonbrin at projects.sourceforge.net Tue Jan 27 02:53:39 2004 From: anadelonbrin at projects.sourceforge.net (Tony Meyer) Date: Tue Jan 27 02:54:33 2004 Subject: [Spambayes-checkins] spambayes/scripts sb_mboxtrain.py,1.8,1.9 Message-ID: Update of /cvsroot/spambayes/spambayes/scripts In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv30101/scripts Modified Files: sb_mboxtrain.py Log Message: Fix [ 881427 ] sb_mboxtrain.py requires -d or -D Index: sb_mboxtrain.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/scripts/sb_mboxtrain.py,v retrieving revision 1.8 retrieving revision 1.9 diff -C2 -d -r1.8 -r1.9 *** sb_mboxtrain.py 15 Jan 2004 03:27:40 -0000 1.8 --- sb_mboxtrain.py 27 Jan 2004 07:53:35 -0000 1.9 *************** *** 17,25 **** creating it is slower, but loading it is much faster, especially for large word databases. Recommended for use with ! hammiefilter or any procmail-based filter. -D DBNAME use the pickle store. A pickle is smaller and faster to create, ! but much slower to load. Recommended for use with pop3proxy and ! hammiesrv. -g PATH mbox or directory of known good messages (non-spam) to train on. --- 17,25 ---- creating it is slower, but loading it is much faster, especially for large word databases. Recommended for use with ! s_filter or any procmail-based filter. -D DBNAME use the pickle store. A pickle is smaller and faster to create, ! but much slower to load. Recommended for use with sb_server and ! sb_xmlrpcserver. -g PATH mbox or directory of known good messages (non-spam) to train on. *************** *** 51,55 **** import sys, os, getopt, email from spambayes import hammie ! from spambayes.Options import options program = sys.argv[0] --- 51,55 ---- import sys, os, getopt, email from spambayes import hammie ! from spambayes.Options import options, get_pathname_option program = sys.argv[0] *************** *** 327,331 **** if usedb == None: ! usage(2, "Must specify one of -d or -D") h = hammie.open(pck, usedb, "c") --- 327,334 ---- if usedb == None: ! # Use settings in configuration file. ! usedb = options["Storage", "persistent_use_database"] ! pck = get_pathname_option("Storage", ! "persistent_storage_file") h = hammie.open(pck, usedb, "c") From anadelonbrin at projects.sourceforge.net Tue Jan 27 03:36:54 2004 From: anadelonbrin at projects.sourceforge.net (Tony Meyer) Date: Tue Jan 27 03:37:47 2004 Subject: [Spambayes-checkins] spambayes/scripts sb_imapfilter.py,1.21,1.22 Message-ID: Update of /cvsroot/spambayes/spambayes/scripts In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv6805/scripts Modified Files: sb_imapfilter.py Log Message: Fix [ 870799 ] imap trying to fetch invalid message UID Index: sb_imapfilter.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/scripts/sb_imapfilter.py,v retrieving revision 1.21 retrieving revision 1.22 diff -C2 -d -r1.21 -r1.22 *** sb_imapfilter.py 19 Jan 2004 17:58:16 -0000 1.21 --- sb_imapfilter.py 27 Jan 2004 08:36:51 -0000 1.22 *************** *** 449,452 **** --- 449,467 ---- self._check(response, 'search') new_id = response[1][0] + + # See [ 870799 ] imap trying to fetch invalid message UID + # It seems that although the save gave a "NO" response to the + # first save, the message was still saved (without the flags, + # probably). This isn't really good behaviour on the server's + # part, but, as usual, we try and deal with it. So, if we get + # more than one undeleted message with the same SpamBayes id, + # delete all of them apart from the last one, and use that. + multiple_ids = new_id.split() + for id_to_remove in multiple_ids[:-1]: + response = imap.uid("STORE", id_to_remove, "+FLAGS.SILENT", + "(\\Deleted \\Seen)") + self._check(response, 'store') + new_id = multiple_ids[-1] + # Let's hope it doesn't, but, just in case, if the search # turns up empty, we make the assumption that the new From anadelonbrin at projects.sourceforge.net Tue Jan 27 03:37:17 2004 From: anadelonbrin at projects.sourceforge.net (Tony Meyer) Date: Tue Jan 27 03:38:12 2004 Subject: [Spambayes-checkins] spambayes/spambayes ImapUI.py,1.35,1.36 Message-ID: Update of /cvsroot/spambayes/spambayes/spambayes In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv6915/spambayes Modified Files: ImapUI.py Log Message: Remove a line that snuck in from ProxyUI.py at some point. Index: ImapUI.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/spambayes/ImapUI.py,v retrieving revision 1.35 retrieving revision 1.36 diff -C2 -d -r1.35 -r1.36 *** ImapUI.py 26 Jan 2004 21:31:40 -0000 1.35 --- ImapUI.py 27 Jan 2004 08:37:14 -0000 1.36 *************** *** 129,133 **** def onHome(self): """Serve up the homepage.""" - state.buildStatusStrings() stateDict = self.classifier.__dict__.copy() stateDict["warning"] = "" --- 129,132 ---- From montanaro at projects.sourceforge.net Thu Jan 29 09:37:26 2004 From: montanaro at projects.sourceforge.net (Skip Montanaro) Date: Thu Jan 29 10:21:45 2004 Subject: [Spambayes-checkins] spambayes/scripts sb_mboxtrain.py,1.9,1.10 Message-ID: Update of /cvsroot/spambayes/spambayes/scripts In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv23128 Modified Files: sb_mboxtrain.py Log Message: Copy stat info from input file to output file for Maildir and MHdir training. From Shawn Dyer, who wrote on spambayes-dev: When I ran sb_mboxtrain.py on my Maildir inbox, the timestamp on all of the files was touched, which messed up the way Courier IMAP sorts by receive date. Here is a suggested patch for sb_mboxtrain.py to preserve the timestamps on the Maildir messages that it trains. It is a fairly trivial patch. I now use it locally and can confirm that it preserves the original modified time from the original email files when it adds the trained header. The same technique would probably be useful for MHDir messages, but I do not use that and so cannot test. Index: sb_mboxtrain.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/scripts/sb_mboxtrain.py,v retrieving revision 1.9 retrieving revision 1.10 diff -C2 -d -r1.9 -r1.10 *** sb_mboxtrain.py 27 Jan 2004 07:53:35 -0000 1.9 --- sb_mboxtrain.py 29 Jan 2004 14:37:23 -0000 1.10 *************** *** 50,53 **** --- 50,54 ---- import sys, os, getopt, email + import shutil from spambayes import hammie from spambayes.Options import options, get_pathname_option *************** *** 146,149 **** --- 147,152 ---- f.write(msg.as_string()) f.close() + shutil.copystat(cfn, tfn) + # XXX: This will raise an exception on Windows. Do any Windows # people actually use Maildirs? *************** *** 245,248 **** --- 248,252 ---- f.write(msg.as_string()) f.close() + shutil.copystat(cfn, tfn) # XXX: This will raise an exception on Windows. Do any Windows From montanaro at projects.sourceforge.net Thu Jan 29 09:40:30 2004 From: montanaro at projects.sourceforge.net (Skip Montanaro) Date: Thu Jan 29 10:21:47 2004 Subject: [Spambayes-checkins] spambayes CHANGELOG.txt,1.31,1.32 Message-ID: Update of /cvsroot/spambayes/spambayes In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv23807 Modified Files: CHANGELOG.txt Log Message: note modtime change to sb_mboxtrain.py Index: CHANGELOG.txt =================================================================== RCS file: /cvsroot/spambayes/spambayes/CHANGELOG.txt,v retrieving revision 1.31 retrieving revision 1.32 diff -C2 -d -r1.31 -r1.32 *** CHANGELOG.txt 12 Jan 2004 23:39:42 -0000 1.31 --- CHANGELOG.txt 29 Jan 2004 14:40:28 -0000 1.32 *************** *** 3,6 **** --- 3,7 ---- Alpha Release 8 =============== + Skip Montanaro 29/01/2004 sb_mboxtrain.py: preserve modtimes in Maildir & MH mailboxes Tony Meyer 13/01/2004 Fix [ 874784 ] Error in onReview code Skip Montanaro 13/01/2004 UserInterface: Split digest auth info properly. Simple split-on-comma fails if the uri contains commas. From montanaro at projects.sourceforge.net Thu Jan 29 09:43:44 2004 From: montanaro at projects.sourceforge.net (Skip Montanaro) Date: Thu Jan 29 10:21:48 2004 Subject: [Spambayes-checkins] spambayes/contrib findbest.py,1.3,1.4 Message-ID: Update of /cvsroot/spambayes/spambayes/contrib In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv24366 Modified Files: findbest.py Log Message: minor tweakage Index: findbest.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/contrib/findbest.py,v retrieving revision 1.3 retrieving revision 1.4 diff -C2 -d -r1.3 -r1.4 *** findbest.py 23 Jan 2004 12:35:59 -0000 1.3 --- findbest.py 29 Jan 2004 14:43:42 -0000 1.4 *************** *** 110,122 **** for msg in getmbox(unsure): prob = cls.spamprob(tokenize(msg)) if prob >= spam_cutoff: - n += 1 add(msg['message-id']) else: - n += 1 total += prob first_mean = total/n ! print len(okalready), "messages already score as spam" print "initial mean spam prob: %.3f" % first_mean --- 110,121 ---- for msg in getmbox(unsure): prob = cls.spamprob(tokenize(msg)) + n += 1 if prob >= spam_cutoff: add(msg['message-id']) else: total += prob first_mean = total/n ! print len(okalready), "out of", n, "messages already score as spam" print "initial mean spam prob: %.3f" % first_mean From montanaro at projects.sourceforge.net Thu Jan 29 09:44:21 2004 From: montanaro at projects.sourceforge.net (Skip Montanaro) Date: Thu Jan 29 10:21:49 2004 Subject: [Spambayes-checkins] spambayes CHANGELOG.txt,1.32,1.33 Message-ID: Update of /cvsroot/spambayes/spambayes In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv24604 Modified Files: CHANGELOG.txt Log Message: mention findbest.py Index: CHANGELOG.txt =================================================================== RCS file: /cvsroot/spambayes/spambayes/CHANGELOG.txt,v retrieving revision 1.32 retrieving revision 1.33 diff -C2 -d -r1.32 -r1.33 *** CHANGELOG.txt 29 Jan 2004 14:40:28 -0000 1.32 --- CHANGELOG.txt 29 Jan 2004 14:44:19 -0000 1.33 *************** *** 4,7 **** --- 4,8 ---- =============== Skip Montanaro 29/01/2004 sb_mboxtrain.py: preserve modtimes in Maildir & MH mailboxes + Skip Montanaro 21/01/2004 added findbest.py to contrib/ Tony Meyer 13/01/2004 Fix [ 874784 ] Error in onReview code Skip Montanaro 13/01/2004 UserInterface: Split digest auth info properly. Simple split-on-comma fails if the uri contains commas. From montanaro at projects.sourceforge.net Thu Jan 29 10:02:14 2004 From: montanaro at projects.sourceforge.net (Skip Montanaro) Date: Thu Jan 29 10:21:51 2004 Subject: [Spambayes-checkins] spambayes/spambayes tokenizer.py, 1.29, 1.30 Options.py, 1.102, 1.103 Message-ID: Update of /cvsroot/spambayes/spambayes/spambayes In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv27571/spambayes Modified Files: tokenizer.py Options.py Log Message: >From a suggestion by someone whose name I forgot... Recognize "abbreviated" URLs of the form www.xyz.com or ftp.xyz.com as http://www.xyz.com and ftp://ftp.xyz.com, respectively. This gets rid of some fairly common "skip:w NNN" tokens. Enabled by the new tokenizer option, x-fancy_url_recognition. I don't see any particular reason not to make this the default, but guarding it with the option allows people to more easily test for negative side effects. Index: tokenizer.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/spambayes/tokenizer.py,v retrieving revision 1.29 retrieving revision 1.30 diff -C2 -d -r1.29 -r1.30 *** tokenizer.py 12 Jan 2004 08:38:23 -0000 1.29 --- tokenizer.py 29 Jan 2004 15:02:11 -0000 1.30 *************** *** 986,989 **** --- 986,1005 ---- # Strip and specially tokenize embedded URLish thingies. + url_fancy_re = re.compile(r""" + \b # the preceeding character must not be alphanumeric + (?: + (?: + (https? | ftp) # capture the protocol + :// # skip the boilerplate + )| + (?= ftp\.[^\.\s<>"'\x7f-\xff] )| # allow the protocol to be missing, but only if + (?= www\.[^\.\s<>"'\x7f-\xff] ) # the rest of the url starts "www.x" or "ftp.x" + ) + # Do a reasonable attempt at detecting the end. It may or may not + # be in HTML, may or may not be in quotes, etc. If it's full of % + # escapes, cool -- that's a clue too. + ([^\s<>"'\x7f-\xff]+) # capture the guts + """, re.VERBOSE) # ' + url_re = re.compile(r""" (https? | ftp) # capture the protocol *************** *** 995,998 **** --- 1011,1015 ---- """, re.VERBOSE) # ' + urlsep_re = re.compile(r"[;?:@&=+,$.]") *************** *** 1000,1008 **** def __init__(self): # The empty regexp matches anything at once. ! Stripper.__init__(self, url_re.search, re.compile("").search) def tokenize(self, m): proto, guts = m.groups() assert guts tokens = ["proto:" + proto] pushclue = tokens.append --- 1017,1036 ---- def __init__(self): # The empty regexp matches anything at once. ! if options["Tokenizer", "x-fancy_url_recognition"]: ! search = url_fancy_re.search ! else: ! search = url_re.search ! Stripper.__init__(self, search, re.compile("").search) def tokenize(self, m): proto, guts = m.groups() assert guts + if proto is None: + if guts.lower().startswith("www"): + proto = "http" + elif guts.lower().startswith("ftp"): + proto = "ftp" + else: + proto = "unknown" tokens = ["proto:" + proto] pushclue = tokens.append Index: Options.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/spambayes/Options.py,v retrieving revision 1.102 retrieving revision 1.103 diff -C2 -d -r1.102 -r1.103 *** Options.py 19 Jan 2004 17:58:20 -0000 1.102 --- Options.py 29 Jan 2004 15:02:11 -0000 1.103 *************** *** 151,154 **** --- 151,159 ---- BOOLEAN, RESTORE), + ("x-fancy_url_recognition", "Extract URLs without http:// prefix", False, + """(EXPERIMENTAL) Recognize 'www.python.org' or ftp.python.org as URLs + instead of just long words.""", + BOOLEAN, RESTORE), + ("replace_nonascii_chars", "Replace non-ascii characters", False, """If true, replace high-bit characters (ord(c) >= 128) and control From kpitt at projects.sourceforge.net Thu Jan 29 13:27:57 2004 From: kpitt at projects.sourceforge.net (Kenny Pitt) Date: Thu Jan 29 14:21:24 2004 Subject: [Spambayes-checkins] spambayes/windows spambayes.iss,1.7,1.8 Message-ID: Update of /cvsroot/spambayes/spambayes/windows In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv24614 Modified Files: spambayes.iss Log Message: pythoncom23.dll and pywintypes32.dll don't need to be installed in the bin directory. See CVS revision 1.34 of build_exe.py in py2exe. http://cvs.sourceforge.net/viewcvs.py/py2exe/py2exe/sandbox/py2exe/Attic/build_exe.py Index: spambayes.iss =================================================================== RCS file: /cvsroot/spambayes/spambayes/windows/spambayes.iss,v retrieving revision 1.7 retrieving revision 1.8 diff -C2 -d -r1.7 -r1.8 *** spambayes.iss 21 Jan 2004 19:48:53 -0000 1.7 --- spambayes.iss 29 Jan 2004 18:27:54 -0000 1.8 *************** *** 19,24 **** Source: "py2exe\dist\lib\*.*"; DestDir: "{app}\lib"; Flags: ignoreversion Source: "py2exe\dist\bin\python23.dll"; DestDir: "{app}\bin"; Flags: ignoreversion - Source: "py2exe\dist\lib\pythoncom23.dll"; DestDir: "{app}\bin"; Flags: ignoreversion - Source: "py2exe\dist\lib\PyWinTypes23.dll"; DestDir: "{app}\bin"; Flags: ignoreversion Source: "py2exe\dist\bin\outlook_addin.dll"; DestDir: "{app}\bin"; Check: InstallingOutlook; Flags: ignoreversion --- 19,22 ---- From montanaro at projects.sourceforge.net Fri Jan 30 15:07:43 2004 From: montanaro at projects.sourceforge.net (Skip Montanaro) Date: Fri Jan 30 15:09:14 2004 Subject: [Spambayes-checkins] spambayes/spambayes TestToolsUI.py,1.2,1.3 Message-ID: Update of /cvsroot/spambayes/spambayes/spambayes In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv30090 Modified Files: TestToolsUI.py Log Message: 2.2 compatibility Index: TestToolsUI.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/spambayes/TestToolsUI.py,v retrieving revision 1.2 retrieving revision 1.3 diff -C2 -d -r1.2 -r1.3 *** TestToolsUI.py 30 Dec 2003 16:26:33 -0000 1.2 --- TestToolsUI.py 30 Jan 2004 20:07:40 -0000 1.3 *************** *** 21,24 **** --- 21,26 ---- # Foundation license. + from __future__ import generators + __author__ = "Tony Meyer " __credits__ = "All the Spambayes folk."