From T.A.Meyer at massey.ac.nz Mon Sep 1 11:53:05 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Sun Aug 31 18:53:57 2003 Subject: [spambayes-dev] Bug messages Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130308C653@its-xchg4.massey.ac.nz> > If there's an ongoing problem here, it has to be due to > something SF is doing. [...] Oh well, it's hardly important, anyway. Thanks for the clarification :) =Tony Meyer From T.A.Meyer at massey.ac.nz Mon Sep 1 12:34:06 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Sun Aug 31 19:34:48 2003 Subject: [spambayes-dev] Bug messages Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130308C68F@its-xchg4.massey.ac.nz> > If it's any consolation, it's not just spambayes -- I just > noticed that the same thing is happening to Python bug reports, like > > [ python-Bugs-793822 ] gc.get_referrers() isinherently dangerous It seems I was wrong about where the space is missing, too. It looked like it was between the second and third words after the [] bit, but this: [spambayes-bugs] [ spambayes-Bugs-798029 ] Outlook missing HTML inall pst mail has the space missing later. I guess this is happening to all of sf's messages, so they'll presumably fix it at some point. =Tony Meyer From T.A.Meyer at massey.ac.nz Mon Sep 1 14:14:15 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Sun Aug 31 21:15:02 2003 Subject: [spambayes-dev] RE: IMAP setup Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130308C706@its-xchg4.massey.ac.nz> > I hadn't gotten time to actually look at the source yet, but > I've been planning on setting up a mail server with > imapfilter. IMAP is a server-based process, so I assumed that > imapfilter would run on the server, but reading the top of > the source I'm not at all sure that's the case. Have I missed > the boat in a really, really big way here? The imapfilter is designed to run on the client's machine, in the same way that their mail program does. Various people have talked about integrating spambayes into an IMAP server, but I don't really anyone posting any details of actually doing so (you could google through the archives to check). > My plan was to run Sendmail, MailScanner, SpamAssassin, etc > in their normal fashion, but then let individual users have > their own specific SpamBayes IMAP filter to really clean things up. If it's to be individual, is there a reason to not just let them run imapfilter themselves? If you need to keep it on the server, you should be able to run imapfilter for each user; you'd probably want to optimise it for this if you have more than a couple of users. =Tony Meyer From T.A.Meyer at massey.ac.nz Mon Sep 1 14:15:42 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Sun Aug 31 21:16:21 2003 Subject: [spambayes-dev] Bug messages Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130308C70A@its-xchg4.massey.ac.nz> > If there's an ongoing problem here, it has to be due to > something SF is doing. [...] Oh well, it's hardly important, anyway. Thanks for the clarification :) =Tony Meyer From barry at python.org Mon Sep 1 19:32:37 2003 From: barry at python.org (Barry Warsaw) Date: Mon Sep 1 14:32:38 2003 Subject: [spambayes-dev] IMAP setup In-Reply-To: <3F529685.BBDF094D@whidbey.com> References: <1ED4ECF91CDED24C8D012BCF2B034F1302EDAED9@its-xchg4.massey.ac.nz> <3F529685.BBDF094D@whidbey.com> Message-ID: <1062441123.11466.16.camel@anthem> On Sun, 2003-08-31 at 20:44, G. Armour Van Horn wrote: > I hadn't gotten time to actually look at the source yet, but I've been > planning on setting up a mail server with imapfilter. IMAP is a > server-based process, so I assumed that imapfilter would run on the > server, but reading the top of the source I'm not at all sure that's the > case. Have I missed the boat in a really, really big way here? No, I don't think that's how imapfilter actually runs. It's a client-side program that connects to the server to do the filtering and moving. I'm using imap-based mail readers and I still haven't hit upon the best use of spambayes in my environment. I have some ideas, using Evolution's program based filters but I haven't had time to flesh out the details. -Barry From richie at entrian.com Tue Sep 2 00:04:08 2003 From: richie at entrian.com (Richie Hindle) Date: Mon Sep 1 18:04:12 2003 Subject: [spambayes-dev] 1.0a5 Release Message-ID: Hello all, I think we may be ready for 1.0a5 - does anybody have anything they'd like to shoe-horn in before I go ahead? I'll build a 1.0a5rc1 first, and if I could have some volunteers to test it (or at least smoke-test it) that would be lovely. It looks as though enough last-minute jobs have cropped up to push this release out to Spambayes' birthday (4th September)... what kind of cake should we get? 8-) Is everyone in favour of immediately releasing 1.0a6, including the Grand Script Renaming, the Grand Removal Of Old Config Names, and Nothing Else? Do we have a plan about what the new script names should be?? (I think I was off-list when this was discussed...) -- Richie Hindle richie@entrian.com From T.A.Meyer at massey.ac.nz Tue Sep 2 12:50:54 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Mon Sep 1 19:51:43 2003 Subject: [spambayes-dev] 1.0a5 Release Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130308CA00@its-xchg4.massey.ac.nz> > I think we may be ready for 1.0a5 - does anybody have > anything they'd like to shoe-horn in before I go ahead? Not here. I even managed to get those bugs that I thought I was going to have to ignore. > I'll build a 1.0a5rc1 first, and if I could have some > volunteers to test it (or at least smoke-test it) that would > be lovely. You can send one this way. > It looks as though enough last-minute jobs have cropped up to > push this release out to Spambayes' birthday (4th > September)... what kind of cake should we get? 8-) You could share yours, couldn't you? > Is everyone in favour of immediately releasing 1.0a6, > including the Grand Script Renaming, the Grand Removal Of Old > Config Names, and Nothing Else? +1 here, obviously. I can do this if you like. > Do we have a plan about what > the new script names should be?? This is what was proposed - anyone want any changes? Tony> i.e. we create a scripts directory and move into it: Tony> dbExpImp.py renamed to sb-dbExpImp.py Should this be sb-dbexpimp.py? All the rest are lower case. Tony> hammiecli.py renamed to sb-client.py Tony> hammiesrv.py renamed to sb-server.py Tony> imapfilter.py renamed to sb-imapfilter.py Tony> mboxtrain.py renamed to sb-mboxtrain.py Tony> notesfilter.py renamed to sb-notesfilter.py Tony> pop3proxy.py renamed to sb-pop3proxy.py Tony> smtpproxy.py renamed to sb-smtpproxy.py Tony> unheader.py renamed to sb-unheader.py hammiefilter.py renamed to sb-? """hammiefilter reads a message from stdin and writes a message to stdout with the X-Spambayes-* header(s).""" mailsort.py renamed to sb-? proxy-tee.py renamed to sb-? hammie.py deleted (it's been deprecated for a while, IIRC) testtools/pop3proxytest.py renamed to spambayes/test/test_pop3proxy.py Plus I'll rename the overkill :) script to sb-pop3dnd.py, because that's the best I've come up with. (dnd = drag'n'drop, as it allows drag'n'drop training with POP3 mail). =Tony Meyer From mhammond at keypoint.com.au Tue Sep 2 15:55:37 2003 From: mhammond at keypoint.com.au (Mark Hammond) Date: Tue Sep 2 00:55:41 2003 Subject: [spambayes-dev] New win32all versions Message-ID: <062b01c3710e$73ff2e10$f502a8c0@eden> It seems the starship is down, so I have put my new win32all versions at http://www.keypoint.com.au/~mhammond/win32all-158.exe for python 2.2 and http://www.keypoint.com.au/~mhammond/win32all-159.exe for python 2.3. These will end up on starship as soon as it comes back up. Outlook CVS users who would like to get some HTML tags back in their clues should grab this. See bug https://sourceforge.net/tracker/index.php?func=detail&aid=798029&group_id=61 702&atid=498103 for more details. Mark From T.A.Meyer at massey.ac.nz Tue Sep 2 18:13:50 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Tue Sep 2 01:14:32 2003 Subject: [spambayes-dev] New win32all versions Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F13031E4ECE@its-xchg4.massey.ac.nz> > It seems the starship is down, so I have put my new win32all > versions at > http://www.keypoint.com.au/~mhammond/win32all-158.exe for > python 2.2 and > http://www.keypoint.com.au/~mhammond/win32all-> 159.exe for python 2.3. > > These will end up on starship as soon as it comes back up. What a perfect opportunity to release these on sourceforge (under pywin32, obviously)... =Tony Meyer From richie at entrian.com Tue Sep 2 08:01:41 2003 From: richie at entrian.com (Richie Hindle) Date: Tue Sep 2 02:01:47 2003 Subject: [spambayes-dev] 1.0a5 Release In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F130308CA00@its-xchg4.massey.ac.nz> References: <1ED4ECF91CDED24C8D012BCF2B034F130308CA00@its-xchg4.massey.ac.nz> Message-ID: > Should this be sb-dbexpimp.py? All the rest are lower case. Yes, I think so, other than the fact that it's pronounced db-ex-pimp. 8-) > hammiesrv.py renamed to sb-server.py > pop3proxy.py renamed to sb-pop3proxy.py Um. I've been thinking for a long time that pop3proxy should be renamed to sb-server, because it hosts a lot more than just the POP3 proxy (it includes the SMTP proxy and the web interface, and possibly more in future). And you can quite sensibly run it with no POP3 proxies configured. In an ideal world I'd incorporate hammiesrv into pop3proxy and rename the whole thing to sb-server, but there may be no time for that (it would mean rewriting hammiesrv to work with asyncore). I'll try to find time for it. If I can't combine the two in time, how about renaming pop3proxy to sb-server, and hammiesrv to sb-xmlrpcserver? It's clumsy, but it will go away eventually. > hammiefilter.py renamed to sb-? sb-filter.py is my vote. > mailsort.py renamed to sb-? sb-mailsort.py? I don't use it - does anyone that uses it have a better name? > proxy-tee.py renamed to sb-? Difficult. For those not familiar with proxytee, it acts as a procmail filter that uploads a message to the web interface for later training (and doesn't change the message in any way). It has nothing to do with proxying - the only reason the word 'proxy' appears is that you have to run pop3proxy.py to get the web interface... sb-upload.py? sb-web-upload.py? sb-web-train.py? sb-web-store.py? sb-add-review.py? sb-train-later.py? Skip? Any preferences? (Or are you still trying to find the kettle? 8-) -- Richie Hindle richie@entrian.com From T.A.Meyer at massey.ac.nz Tue Sep 2 19:05:40 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Tue Sep 2 02:06:26 2003 Subject: [spambayes-dev] 1.0a5 Release Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F13031E4EE8@its-xchg4.massey.ac.nz> > > Should this be sb-dbexpimp.py? All the rest are lower case. > Yes, I think so, other than the fact that it's pronounced > db-ex-pimp. 8-) lol. > If I can't combine the two in time, how about renaming > pop3proxy to sb-server, and hammiesrv to sb-xmlrpcserver? > It's clumsy, but it will go away eventually. +1 (Hmm...we're going to need quite a comprehensive transition document, huh!) > > hammiefilter.py renamed to sb-? > sb-filter.py is my vote. +1 > > proxy-tee.py renamed to sb-? > Difficult. [...] > sb-upload.py? sb-web-upload.py? > sb-web-train.py? sb-web-store.py? sb-add-review.py? > sb-train-later.py? I like sb-web-upload.py best, but am happy to defer to Skip :) =Tony Meyer From aleem.bawany at utoronto.ca Tue Sep 2 04:20:02 2003 From: aleem.bawany at utoronto.ca (Aleem B) Date: Tue Sep 2 03:20:33 2003 Subject: [spambayes-dev] Regarding Whitelisting Message-ID: <000301c3711a$3e5cbfa0$0a00a8c0@Aleem> I am a new user of Spambayes and not on the devlist so it is possible this topic has come up many times before (and for the same reason reply-all to this mail). As posted in the FAQ: > The main reason is that for the most part SpamBayes doesn't > need it! The users need it. I can know with certainty that the mail from my potential employer will end up in my inbox and not get lost with spam or overlooked in spam box, eventually costing me my job. There is comfort in knowing that the mail will show up in my inbox and I won't end up missing something important. > As long as you keep training on unsure or mis-classified mail, > SpamBayes will learn what you consider good mail without > needing any specific lists. With whitelists mail would not get "mis-classified" in the first place. > In addition, tokens are generated from email addresses, so an > automatic 'whitelist' (of sorts) is generated, as is a similar > blacklist. This can still generate false positives. > Whitelists and blacklists are problematic anyway, because > 'spoofing' (pretending you are someone else) is reasonably > simple, and also very common. When whitelisting an entire domain *@hotmail.com, this may be problematic. For normal email addresses I doubt if anyone will know the email address of the person I am whitelisting. I do not recall ever recieving spam mail from someone I know, except in the case of self-propogating worms which falls under the antivirus department. > So, more often than not, they'll lead to incorrect results. If I am whitelisting the email of my potential employer e.g. john.doe@acme.net then it is very unlikely that a spammer will be able to guess this or use it coincidentally. So the more often than not argument does not follow. Besides, the decision to whitelist an email address (and risk getting mail from a spammer forging that very address), should be left to the user. Furthermore, going through daily spam, finding the false positives and resurrecting them is more troublesome than going through the inbox and marking an email as spam. > However, there are some commercial products based on SpamBayes > that offer whitelisting - see the related page for more > information. Infact, most spam filters offer whitelisting. SpamBayes just happens to be an execellent mail filter and probably my favorite, so it would be great to see whitelisting implemented in future versions. aleem [ http://aleembawany.com/ ] From T.A.Meyer at massey.ac.nz Tue Sep 2 20:32:56 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Tue Sep 2 03:33:40 2003 Subject: [spambayes-dev] Regarding Whitelisting Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F13031E4EF8@its-xchg4.massey.ac.nz> > The users need it. I can know with certainty that the mail > from my potential employer will end up in my inbox and not > get lost with spam or overlooked in spam box, eventually > costing me my job. There is comfort in knowing that the mail > will show up in my inbox and I won't end up missing something > important. You say this without any evidence, of course. If you can trust a whitelist, why can't you trust the classifier? > With whitelists mail would not get "mis-classified" in the > first place. Not true. Thanks to spoofing, you'd end up with lots of false-negatives. Or if you personally don't, many other spambayes users would. > Besides, the decision to whitelist an email address (and risk > getting mail from a spammer forging that very address), > should be left to the user. We're not stopping you whitelisting; we're simply not adding it to spambayes. > Furthermore, going through daily spam, finding the false > positives and resurrecting them is more troublesome than > going through the inbox and marking an email as spam. False positives are much worse than false negatives, yes. But you're still basing this on no evidence that there will be these false positives. > Infact, most spam filters offer whitelisting. SpamBayes just > happens to be an execellent mail filter and probably my > favorite, so it would be great to see whitelisting > implemented in future versions. It really is extraordinarily unlikely that this will occur. You (or anyone else) is most welcome to patch the code to create whitelisting yourself, of course. You're also free to use InBoxer or some other spambayes-based project that does offer whitelisting. In any case, why not just use the rules of whatever mail client you use to whitelist? There's no rule saying that you have to classify every message that you get via spambayes. =Tony Meyer From ta-meyer at ihug.co.nz Tue Sep 2 21:34:40 2003 From: ta-meyer at ihug.co.nz (Tony Meyer) Date: Tue Sep 2 04:35:20 2003 Subject: [spambayes-dev] Automatic configuration Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130212ADFB@its-xchg4.massey.ac.nz> I've just checked in a module containing various functions to automatically setup spambayes to work with various mail clients (so far: Eudora, Mozilla Mail, Opera (M2), Outlook Express). It will setup both the mail client and spambayes with the required server/port information, and in Eudora's case, will also add a filter. [It's windows/autoconfigure.py] I've tested this with the latest (I think) versions of all of these programs under Windows XP. If people using other flavours could test them as well, that would be great. It's fairly likely that the functions will work with non windows clients (apart from Outlook Express); it all depends on if the config file changes format or not. (Maybe it shouldn't be in the windows directory, and should be elsewhere?) One bit that isn't done yet is figuring out where the configuration file is located, so you have to provide this to the appropriate function. I'm sure you've already guessed this, but it would be good to backup your configuration file (or, in OE's case, the registry), before you do this. If you blow up the world, don't blame it on me :) =Tony Meyer From romain.guy at jext.org Tue Sep 2 14:35:11 2003 From: romain.guy at jext.org (Romain GUY) Date: Tue Sep 2 07:40:00 2003 Subject: [spambayes-dev] SpamBayes service monitor for Windows Message-ID: <200392133511.353536@Thinthalion> Hello, I've just began to work on a service monitor for SpamBayes. This tool will allow the user to graphically install, remove, start, stop and restart the SpamBayes service for Windows. It will appear as a single traybar icon and would also allow to open a web browser in the web configuration page. If you ever tried it, it will look similar to Apache2 monitor for Windows. If you don't want such a tool, just tell me to stop now :-) Please note this won't be ready before the features freeze anyway :) -- Romain GUY romain.guy@jext.org http://www.jext.org http://progx.jext.org From adam.walker at rbwconsulting.com Tue Sep 2 11:23:26 2003 From: adam.walker at rbwconsulting.com (Adam Walker) Date: Tue Sep 2 10:23:34 2003 Subject: [spambayes-dev] SpamBayes service monitor for Windows In-Reply-To: <200392133511.353536@Thinthalion> Message-ID: <20030902142329.78A0E862CE@plunder.dreamhost.com> This should probably be rolled into the (new) windows/pop3proxy_tray.py script. > I've just began to work on a service monitor for SpamBayes. This > tool will allow the user to graphically install, remove, start, stop and > restart the SpamBayes service for Windows. It will appear as a single > traybar icon and would also allow to open a web browser in the web > configuration page. If you ever tried it, it will look similar to Apache2 > monitor for Windows. > > If you don't want such a tool, just tell me to stop now :-) Please > note this won't be ready before the features freeze anyway :) From tim.one at comcast.net Tue Sep 2 11:47:17 2003 From: tim.one at comcast.net (Tim Peters) Date: Tue Sep 2 10:47:53 2003 Subject: [spambayes-dev] Re: Bug messages Message-ID: [Tony] > Recently (I think), the messages to the bug list are missing a space. > Specifically, the space after the second word of the (bug) subject. > For example: > > [spambayes-bugs] [ spambayes-Feature Requests-791246 ] IMAP: keepnew > messages unread > Summary: IMAP: keep new messages unread > > Is this some weird sourceforge thing, or something we have setup > wrong? > (Or somehow something I've got wrong? ;) On my box, it looks like it's an Outlook bug! Barry Warsaw stared at some examples with me, and in the "missing space" cases I see, the actual message received has a Subject header spanning multiple lines, like: Subject: [spambayes-bugs] [spambayes-Feature Requests-791246] IMAP: keep new messages unread Outlook apparently neglects to insert a space when it pastes the lines together for display. spambayes sees the correct Subject line, though, as that's obtained from Python's email pkg, which does insert the appropriate space(s) when pasting continued header lines together. From skip at pobox.com Tue Sep 2 11:21:21 2003 From: skip at pobox.com (Skip Montanaro) Date: Tue Sep 2 11:21:36 2003 Subject: [spambayes-dev] 1.0a5 Release In-Reply-To: References: <1ED4ECF91CDED24C8D012BCF2B034F130308CA00@its-xchg4.massey.ac.nz> Message-ID: <16212.46449.852665.609916@montanaro.dyndns.org> Richie> Skip? Any preferences? (Or are you still trying to find the Richie> kettle? 8-) Between all the recent virus crap and a move out of our old house last week (but not yet into our new house) I've been pretty swamped. sb-upload sounds like a decent replacement for proxytee to me. One thing to be aware of is that old versions won't magically go away. That means after 1.0a6 is released, if I execute cvs up python setup.py install I will still have a /usr/local/bin/hammiefilter.py as well as /usr/local/bin/sb-filter.py. Consequently, anything I have that calls hammiefilter.py will still continue to "work", but over time will probably result in some subtle errors as sb-filter and the modules it uses diverge from the old hammiefilter.py script. Accordingly, it would be nice if the setup script knew about the old names and warned if they still exist in the bin directory when executing the install action. Skip From skip at pobox.com Tue Sep 2 11:27:45 2003 From: skip at pobox.com (Skip Montanaro) Date: Tue Sep 2 11:27:53 2003 Subject: [spambayes-dev] Regarding Whitelisting In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F13031E4EF8@its-xchg4.massey.ac.nz> References: <1ED4ECF91CDED24C8D012BCF2B034F13031E4EF8@its-xchg4.massey.ac.nz> Message-ID: <16212.46833.894787.304727@montanaro.dyndns.org> >>>>> "Tony" == Tony Meyer writes: >> The users need it. I can know with certainty that the mail from my >> potential employer will end up in my inbox and not get lost with spam >> or overlooked in spam box, eventually costing me my job. There is >> comfort in knowing that the mail will show up in my inbox and I won't >> end up missing something important. Tony> You say this without any evidence, of course. If you can trust a Tony> whitelist, why can't you trust the classifier? I was thinking about this last week. The problem is that a job offer from a potential employer is a fairly rare sort of email. Consequently, it's quite possible there is not enough evidence in the message to tip it toward the ham side. If the employer uses a lot of terms which appear spammy, it might not even make it to "unsure". Still, a potential employer's email address can be forged just as easily as any other address. Skip From tim.one at comcast.net Tue Sep 2 13:05:00 2003 From: tim.one at comcast.net (Tim Peters) Date: Tue Sep 2 12:05:38 2003 Subject: [spambayes-dev] Regarding Whitelisting In-Reply-To: <16212.46833.894787.304727@montanaro.dyndns.org> Message-ID: [Skip] > I was thinking about this last week. The problem is that a job offer > from a potential employer is a fairly rare sort of email. > Consequently, it's quite possible there is not enough evidence in the > message to tip it toward the ham side. If the employer uses a lot of > terms which appear spammy, it might not even make it to "unsure". > > Still, a potential employer's email address can be forged just as > easily as any other address. OTOH, before you get it, how could you predict which email address the employment offer may appear to come from? It may come from the company or from someone at home or on the road, from someone you've talked with before or from someone you've never heard of in the HR department, and you're never going to guess in advance the email address of the private investigator firm hired to check you out . Something every email user secretly knows, but doesn't want to believe, is that email is an inappropriate medium for truly important communication. So it goes. From aleem.bawany at utoronto.ca Tue Sep 2 14:44:27 2003 From: aleem.bawany at utoronto.ca (Aleem B) Date: Tue Sep 2 13:44:54 2003 Subject: [spambayes-dev] Regarding Whitelisting In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F13031E4EF8@its-xchg4.massey.ac.nz> Message-ID: <001b01c37171$795e9270$0a00a8c0@Aleem> [please reply-all when responding to this mail] Meyer, Tony wrote: >> The users need it. I can know with certainty that the mail >> from my potential employer will end up in my inbox and not >> get lost with spam or overlooked in spam box, eventually >> costing me my job. There is comfort in knowing that the mail >> will show up in my inbox and I won't end up missing something >> important. > > You say this without any evidence, of course. If you can trust a > whitelist, why can't you trust the classifier? > Because the classifier can generate false positives - a whitelist would take precedence over the classifier. >> With whitelists mail would not get "mis-classified" in the >> first place. > > Not true. Thanks to spoofing, you'd end up with lots of > false-negatives. Or if you personally don't, many other > spambayes users > would. > This is the part that I don't understand. How often do you receive spam forged from people in your address book? I have never recieved any spam from a personal address I recognize. I don't suspect I would ever be recieving spam from t.a.meyer(at)massey(dot)ac(dot)nz. And if I did it would because there is a spammer on this list or you are a spammer (obviously not). In either case, the spammer would have to be extremely lucky and managed to match an address in my whitelist. You do the stats. >> Besides, the decision to whitelist an email address (and risk >> getting mail from a spammer forging that very address), >> should be left to the user. > > We're not stopping you whitelisting; we're simply not adding it to > spambayes. > I'm trying to make a case for it, because the case against it is weak. >> Furthermore, going through daily spam, finding the false >> positives and resurrecting them is more troublesome than >> going through the inbox and marking an email as spam. > > False positives are much worse than false negatives, yes. But you're > still basing this on no evidence that there will be these false > positives. > The classifier can generate false positives - what evidence do I need? >> Infact, most spam filters offer whitelisting. SpamBayes just >> happens to be an execellent mail filter and probably my >> favorite, so it would be great to see whitelisting >> implemented in future versions. > > It really is extraordinarily unlikely that this will occur. You (or > anyone else) is most welcome to patch the code to create whitelisting > yourself, of course. You're also free to use InBoxer or some other > spambayes-based project that does offer whitelisting. > > In any case, why not just use the rules of whatever mail > client you use > to whitelist? There's no rule saying that you have to classify every > message that you get via spambayes. > Thanks, that's what I have done now - created a new folder and enforced rules to move whitelist type mails in there so SpamBayes doesn't get to them. aleem [ http://aleembawany.com/ ] From aleem.bawany at utoronto.ca Tue Sep 2 15:01:07 2003 From: aleem.bawany at utoronto.ca (Aleem B) Date: Tue Sep 2 14:01:39 2003 Subject: [spambayes-dev] Regarding Whitelisting In-Reply-To: Message-ID: <001e01c37173$cd3dc120$0a00a8c0@Aleem> Tim Peters wrote: > [Skip] > OTOH, before you get it, how could you predict which email address the > employment offer may appear to come from? It may come from > the company or > from someone at home or on the road, from someone you've > talked with before > or from someone you've never heard of in the HR department, > and you're never > going to guess in advance the email address of the private > investigator firm hired to check you out . > This was not a contrived example. I am/was in this situation and while looking for jobs, I am whitelisting entire company domains as well which I will unlist in a week or month's time and I am quite certain I will not get any spam forged from those addresses. > Something every email user secretly knows, but doesn't want > to believe, is that email is an inappropriate medium for truly > important communication. So it goes. Why so? I think you are mixing two different issues here: reliability and security. For the latter you can use encryption and for the former, well, it is totally dependent on the reliability of your ISPs mail servers (backup server, backup power etc). I wasn't getting phone calls during the recent blackout (telephone networks were too busy) but the mails got through when I did eventually get a chance to check. It is also the cheapest communication medium and quite hence preferred for communicating over longer distances. The only reason it is not *preferred* is due to the lack of expression and context, resulting in miscommunication or misunderstanding. Aleem [ http://aleembawany.com/ ] From popiel at wolfskeep.com Tue Sep 2 12:40:33 2003 From: popiel at wolfskeep.com (T. Alexander Popiel) Date: Tue Sep 2 14:40:38 2003 Subject: [spambayes-dev] Regarding Whitelisting In-Reply-To: Message from "Aleem B" of "Tue, 02 Sep 2003 13:44:27 -0300." <001b01c37171$795e9270$0a00a8c0@Aleem> References: <001b01c37171$795e9270$0a00a8c0@Aleem> Message-ID: <20030902184033.DEB5F2DE90@cashew.wolfskeep.com> In message: <001b01c37171$795e9270$0a00a8c0@Aleem> "Aleem B" writes: > >>> With whitelists mail would not get "mis-classified" in the >>> first place. >> >> Not true. Thanks to spoofing, you'd end up with lots of >> false-negatives. Or if you personally don't, many other >> spambayes users >> would. > >This is the part that I don't understand. How often do >you receive spam forged from people in your address book? I get about six a day, presumably because one of the spammers that raped a mailing list got a clue and uses other members of that mailing list as from addresses when sending to addresses culled from that source. >>> Besides, the decision to whitelist an email address (and risk >>> getting mail from a spammer forging that very address), >>> should be left to the user. >> >> We're not stopping you whitelisting; we're simply not adding it to >> spambayes. >> >I'm trying to make a case for it, because the case against it is >weak. What I don't understand is why people want one tool to do everything. I have multiple MTAs which are separate from my MDA which is separate from my MUA, with several filters in between... why should whitelisting be added to spambayes, when spambayes does what it does very well, and other tools (like procmail) can trivially do whitelisting very well, and they can be easily used in conjunction? Is this another case of unix mentality (use multiple tools which each do their own thing well) is getting in the way of general acceptance by the masses? >> False positives are much worse than false negatives, yes. But you're >> still basing this on no evidence that there will be these false >> positives. > >The classifier can generate false positives - what evidence do I need? The same evidence that you're demanding for false negatives from whitelists (which I provide anecdotally above). - Alex From skip at pobox.com Tue Sep 2 14:43:02 2003 From: skip at pobox.com (Skip Montanaro) Date: Tue Sep 2 14:43:07 2003 Subject: [spambayes-dev] Regarding Whitelisting In-Reply-To: <001b01c37171$795e9270$0a00a8c0@Aleem> References: <1ED4ECF91CDED24C8D012BCF2B034F13031E4EF8@its-xchg4.massey.ac.nz> <001b01c37171$795e9270$0a00a8c0@Aleem> Message-ID: <16212.58550.775987.1184@montanaro.dyndns.org> >> Not true. Thanks to spoofing, you'd end up with lots of >> false-negatives. Or if you personally don't, many other spambayes >> users would. Aleem> This is the part that I don't understand. How often do you Aleem> receive spam forged from people in your address book? I have Aleem> never recieved any spam from a personal address I recognize. Then you have a very small circle of friends. ;-) I'm received spam purporting to be from Guido. During this rash of SoBig and Blaster crap I received lots of forged mail purporting to have been sent by Barry Warsaw. I have met both Guido and Barry in person and can report with some confidence that neither of them looks like a spammer. ;-) Skip From skip at pobox.com Tue Sep 2 14:47:34 2003 From: skip at pobox.com (Skip Montanaro) Date: Tue Sep 2 14:47:38 2003 Subject: [spambayes-dev] Regarding Whitelisting In-Reply-To: <001e01c37173$cd3dc120$0a00a8c0@Aleem> References: <001e01c37173$cd3dc120$0a00a8c0@Aleem> Message-ID: <16212.58822.187748.691646@montanaro.dyndns.org> Tim> Something every email user secretly knows, but doesn't want to Tim> believe, is that email is an inappropriate medium for truly Tim> important communication. So it goes. Aleem> Why so? Because email can be forged so easily. Aleem> For [security] you can use encryption ... How many Outlook or Outlook Express users do you think really know how to digitally sign their outgoing emails? I'm assuming here that most HR types use Windows and that OL and OLex are the two most popular Windows email clients. Both assumptions may be false, but even if they are, I suspect my assertion still covers a large fraction of that population. Skip From tim.one at comcast.net Tue Sep 2 16:04:25 2003 From: tim.one at comcast.net (Tim Peters) Date: Tue Sep 2 15:04:59 2003 Subject: [spambayes-dev] Regarding Whitelisting In-Reply-To: <16212.58550.775987.1184@montanaro.dyndns.org> Message-ID: [Skip] > Then you have a very small circle of friends. ;-) I'm received spam > purporting to be from Guido. During this rash of SoBig and Blaster > crap I received lots of forged mail purporting to have been sent by > Barry Warsaw. I have met both Guido and Barry in person and can > report with some confidence that neither of them looks like a > spammer. ;-) Same here. And, in fact, the last employment offer I got in email came from Guido -- that doesn't mean I want to see 500 penis enalargement ads per week too just because they purport to come from Guido. Or maybe they really do . In any case, spambayes is doing a perfect job so far of distinguishing real email from Guido and Barry (etc -- they're not the only ones) and spam claiming to come from them. If someone adds a whitelist option to spambayes, my only demand is that there be a big red button that promises to disable it. From aleem.bawany at utoronto.ca Tue Sep 2 16:37:38 2003 From: aleem.bawany at utoronto.ca (Aleem B) Date: Tue Sep 2 15:38:04 2003 Subject: [spambayes-dev] Regarding Whitelisting In-Reply-To: <20030902184033.DEB5F2DE90@cashew.wolfskeep.com> Message-ID: <002d01c37181$49181220$0a00a8c0@Aleem> T. Alexander Popiel wrote: > In message: <001b01c37171$795e9270$0a00a8c0@Aleem> > "Aleem B" writes: >> >>>> With whitelists mail would not get "mis-classified" in the >>>> first place. >>> >>> Not true. Thanks to spoofing, you'd end up with lots of >>> false-negatives. Or if you personally don't, many other spambayes >>> users would. >> >> This is the part that I don't understand. How often do >> you receive spam forged from people in your address book? > > I get about six a day, presumably because one of the spammers > that raped a mailing list got a clue and uses other members > of that mailing list as from addresses when sending to > addresses culled from that source. > Whitelisting is merely a way of providing more control/power to the user. It is an option and users can choose to have comfort of knowing that mails from a certain address won't be marked as spam. In your specific scenario, you would probably opt against whitelisting those specific addresses from which your recieve spam. So you simply let the user decide what he thinks best. >>>> Besides, the decision to whitelist an email address (and risk >>>> getting mail from a spammer forging that very address), >>>> should be left to the user. >>> >>> We're not stopping you whitelisting; we're simply not adding it to >>> spambayes. >>> >> I'm trying to make a case for it, because the case against it is >> weak. > > What I don't understand is why people want one tool to do everything. > I have multiple MTAs which are separate from my MDA which is separate > from my MUA, with several filters in between... why should > whitelisting be added to spambayes, when spambayes does what it does > very well, and other tools (like procmail) can trivially do > whitelisting very well, and they can be easily used in conjunction? > > Is this another case of unix mentality (use multiple tools which each > do their own thing well) is getting in the way of general acceptance > by the masses? > Whitelisting is a concept well ingrained with spam detection. I don't see why they should be two different tools. Whitelisting lends itself to the spamming vocabulary for a reason. Besides whitelist here is being used in context to spam, so effectively I am only requesting you consider having a "whitelist for spam(mers)". >>> False positives are much worse than false negatives, yes. But you're >>> still basing this on no evidence that there will be these false >>> positives. >> >> The classifier can generate false positives - what evidence do I >> need? > > The same evidence that you're demanding for false negatives from > whitelists (which I provide anecdotally above). > I think evidence and argument are being used interchangeably here. I still don't see what kind of evidence you are demanding. Whitelists circumvent spam detectors and don't generate false positives. Whitelists can generate false negatives (as you argue) but the user has control over his whitelist (as I mentioned above). It is a concept analogous VIP lists. If there are VIP impersonators you can take them off the VIP list and let the filter handle them. aleem [ http://aleembawany.com/ ] From aleem.bawany at utoronto.ca Tue Sep 2 16:53:39 2003 From: aleem.bawany at utoronto.ca (Aleem B) Date: Tue Sep 2 15:54:04 2003 Subject: [spambayes-dev] Regarding Whitelisting In-Reply-To: <16212.58822.187748.691646@montanaro.dyndns.org> Message-ID: <002e01c37183$85c6ab30$0a00a8c0@Aleem> Skip Montanaro wrote: > Tim> Something every email user secretly knows, but > doesn't want to > Tim> believe, is that email is an inappropriate medium for truly > Tim> important communication. So it goes. > > Aleem> Why so? > > Because email can be forged so easily. > > Aleem> For [security] you can use encryption ... > > How many Outlook or Outlook Express users do you think really > know how to > digitally sign their outgoing emails? I'm assuming here that > most HR types > use Windows and that OL and OLex are the two most popular > Windows email > clients. Both assumptions may be false, but even if they > are, I suspect my > assertion still covers a large fraction of that population. > > Skip If users don't look their door, despite having a doorlock is their problem and not the problem of the people who make doorlocks. What more can you do? aleem [ http://aleembawany.com/ ] From seant at iname.com Tue Sep 2 16:55:56 2003 From: seant at iname.com (Sean True) Date: Tue Sep 2 15:56:23 2003 Subject: [spambayes-dev] Regarding Whitelisting In-Reply-To: Message-ID: <000401c3718c$39c863a0$0201a8c0@swapwizard.com> > > [Skip] > > Then you have a very small circle of friends. ;-) I'm received spam > > purporting to be from Guido. During this rash of SoBig and Blaster > > crap I received lots of forged mail purporting to have been sent by > > Barry Warsaw. I have met both Guido and Barry in person and can > > report with some confidence that neither of them looks like a > > spammer. ;-) > > Same here. And, in fact, the last employment offer I got in > email came from > Guido -- that doesn't mean I want to see 500 penis > enalargement ads per week > too just because they purport to come from Guido. Or maybe > they really do > . In any case, spambayes is doing a perfect job so far of > distinguishing real email from Guido and Barry (etc -- > they're not the only > ones) and spam claiming to come from them. If someone adds a > whitelist > option to spambayes, my only demand is that there be a big > red button that > promises to disable it. > And then there are customers who really, really want to have it. We added it to SpamAtBay/InBoxer, and I've never been hurt by it, but it doesn't help me (personally) much, either. But managing it is yet another pile of not very stimulating work.j -- Sean From popiel at wolfskeep.com Tue Sep 2 14:06:28 2003 From: popiel at wolfskeep.com (T. Alexander Popiel) Date: Tue Sep 2 16:06:47 2003 Subject: [spambayes-dev] Regarding Whitelisting In-Reply-To: Message from "Aleem B" of "Tue, 02 Sep 2003 15:37:38 -0300." <002d01c37181$49181220$0a00a8c0@Aleem> References: <002d01c37181$49181220$0a00a8c0@Aleem> Message-ID: <20030902200628.F36A92DE90@cashew.wolfskeep.com> In message: <002d01c37181$49181220$0a00a8c0@Aleem> "Aleem B" writes: >T. Alexander Popiel wrote: >> >> What I don't understand is why people want one tool to do everything. >> [...] why should whitelisting be added to spambayes, when spambayes >> does what it does very well, and other tools (like procmail) can >> trivially do whitelisting very well, and they can be easily used in >> conjunction? > >Whitelisting is a concept well ingrained with spam detection. I >don't see why they should be two different tools. I think they should be different tools because they do intrinsically different things: spambayes is a statistical analysis tool, and procmail (or anything else used for whitelisting) is an explicit patternmatch tool. The fact that both are being applied to incoming email to weed out spam (or keep in ham) is irrelevant to the fact that they're intrinsically different operations. It may be that people don't want to be exposed to the idea that there's multiple ways of filtering email, and that multiple methods may be simultaneously useful. Alternately, this may be a case where people think that getting it all from one program brings efficiency akin to having a single vendor for office supplies. Or it may be something else... but in any case, I don't understand it. Frankly, I don't see any utility to combining the two tasks. How would it improve my life? >I think evidence and argument are being used interchangeably here. Sloppy use of language is a bane to communication. - Alex From vanhorn at whidbey.com Tue Sep 2 14:41:14 2003 From: vanhorn at whidbey.com (G. Armour Van Horn) Date: Tue Sep 2 16:41:23 2003 Subject: [spambayes-dev] Correcting training References: <002d01c37181$49181220$0a00a8c0@Aleem> <20030902200628.F36A92DE90@cashew.wolfskeep.com> Message-ID: <3F55006A.10D21A39@whidbey.com> Greetings: I've been using the soon-to-be-renamed pop3proxy, and have been for about 20 days now. I normally review the results several times a day, and try hard to correctly classify every incoming message. As a result, I now have trained on Spam: 8197 Ham: 9537 on my main installation. Let's just say that the results are mildly awesome. However, I don't think the filter has ever caught a "Nigerian" spam or other "business proposal" sorts of messages. Sometimes I hit the "Train" button too soon, and later find one of these in my inbox. Sometimes I will copy the source and go "Train as Spam" with it, but as far as I know, that just means that the message has been trained as both Ham and Spam. Would it be possible to add an "Untrain as Ham, Train as Spam" function? Or am I the only user stupid enough to train before reading all the mail? Van -- ---------------------------------------------------------- Sign up now for Quotes of the Day, a handful of quotations on a theme delivered every morning. Enlightenment! Daily, for free! mailto:twisted@whidbey.com?subject=Subscribe_QOTD For web hosting and maintenance, visit Van's home page: http://www.domainvanhorn.com/van/ ---------------------------------------------------------- From richie at entrian.com Wed Sep 3 00:08:18 2003 From: richie at entrian.com (Richie Hindle) Date: Tue Sep 2 18:08:24 2003 Subject: [spambayes-dev] 1.0a5rc1 available Message-ID: <0g4alvsncpn05ip0mcsv1v8csl23r66ttd@4ax.com> Spambayes 1.0a5rc1 is available for download at: http://www.sundog.demon.co.uk/spambayes-1.0a5rc1.zip http://www.sundog.demon.co.uk/spambayes-1.0a5rc1.tar.gz Please give it a quick bash - if there are no (new!) problems, it will become 1.0a5 on Thursday morning. Apologies for the unusual address but entrian.com seems to have some, er, issues, at the moment - like not existing. Please email me at richie@sundog.demon.co.uk if you need to contact me. -- Richie Hindle richie@entrian.com, but richie@sundog.demon.co.uk for the time being. From T.A.Meyer at massey.ac.nz Wed Sep 3 13:21:01 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Tue Sep 2 20:21:50 2003 Subject: [spambayes-dev] Re: Bug messages Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F13031E50C0@its-xchg4.massey.ac.nz> > On my box, it looks like it's an Outlook bug! That would explain why you, me and Mark are the ones that see it ;) > Barry Warsaw > stared at some examples with me [explanation snipped] Ah, that all makes sense. Foolish of me to blame the nice open-source folks instead of the nasty Microsoft ones, wasn't it? Time to get on with life and these little disappointments ;) Cheers, Tony From T.A.Meyer at massey.ac.nz Wed Sep 3 13:22:18 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Tue Sep 2 20:23:06 2003 Subject: [spambayes-dev] 1.0a5 Release Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F13031E50C4@its-xchg4.massey.ac.nz> > One thing to be aware of is that old versions won't magically > go away. I was wondering about this (in a non-Spambayes context) the other day. Is there a "setup.py uninstall" type command? Or once you've install something into your Python directory, are you just stuck with it, or stuck removing it manually? =Tony Meyer From T.A.Meyer at massey.ac.nz Wed Sep 3 13:39:14 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Tue Sep 2 20:50:01 2003 Subject: [spambayes-dev] 1.0a5rc1 available Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F13031E50D3@its-xchg4.massey.ac.nz> > Spambayes 1.0a5rc1 is available for download at: > > http://www.sundog.demon.co.uk/spambayes-1.0a5rc1.zip > http://www.sundog.demon.co.uk/spambayes-1.0a5rc1.tar.gz > > Please give it a quick bash - if there are no (new!) > problems, it will become 1.0a5 on Thursday morning. I've only looked at the zip, but it seems fine here, packaging-wise. There was one little bug with imapfilter that made it difficult to start up before configuring (I could swear I've fixed this exact bug before!). I've checked in a fix for this. If you want to replace imapfilter.py with the new one (it's a one line fix), that would be great; but if not, I don't care if if just makes it into the next release. Looks to me like all is go :) Thanks for doing the packaging! =Tony Meyer From T.A.Meyer at massey.ac.nz Wed Sep 3 13:50:19 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Tue Sep 2 20:53:38 2003 Subject: [spambayes-dev] Correcting training Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F13031E50E1@its-xchg4.massey.ac.nz> > However, I don't think the filter has ever caught a > "Nigerian" spam or other "business proposal" sorts of > messages. It did take me quite a lot of training to get these correctly classified, and even now they sit at about 90% rather than the 100% that just about everything else gets. I imagine that just a few mistrained would make it extremely difficult to correctly classify them. > Sometimes I will copy > the source and go "Train as Spam" with it, but as far as I > know, that just means that the message has been trained as > both Ham and Spam. That's correct. However, if you retrain it from the 'review' page, then it will be untrained and retrained (assuming that the messageinfo.db file is all happy and running correctly), as spambayes will recognise the id and take the correct action. How do I get the message back on the review page once I've already trained, you ask? Use the "Find message" query, I answer :) If you set spambayes to add the id header, then you can look at the message's headers, find the id, paste it into the box on the web interface, and it will present that message in a 'review' page, which lets you retrain. smtpproxy, assuming it's able to find the id, should also correctly untrain/train, as should the (also-soon-to-be-renamed overkill.py script, once it's complete). > Would it be possible to add an "Untrain as Ham, Train as > Spam" function? This sounds like a reasonable request - could you open a tracker on sourceforge for it? Given that we're only a day off a brief feature-freeze, I can't see this happening before then. It should be easy enough to get into 1.1a1, though (is that how the version numbering works?). Otherwise it might get forgotten. Feel free to assign it to me if you like (anadelonbrin). =Tony Meyer From T.A.Meyer at massey.ac.nz Wed Sep 3 13:52:58 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Tue Sep 2 20:53:53 2003 Subject: [spambayes-dev] New win32all versions Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F13031E50E5@its-xchg4.massey.ac.nz> > It seems the starship is down, so I have put my new win32all > versions at > http://www.keypoint.com.au/~mhammond/win32all-158.exe for > python 2.2 and > http://www.keypoint.com.au/~mhammond/win32all-159.exe for python 2.3. > > These will end up on starship as soon as it comes back up. Has anyone else successfully grabbed these? I've tried to download them three times and every time when I try to launch them (once the download is complete), I get an error about the file size not being what it should have been. Is this just me? BTW, I gather from c.l.p that it's only starship's dns that's down, not the site itself. Does this mean that if we go to the correct ip versions 158 and 159 will be there? =Tony Meyer From mhammond at keypoint.com.au Wed Sep 3 11:55:02 2003 From: mhammond at keypoint.com.au (Mark Hammond) Date: Tue Sep 2 20:55:18 2003 Subject: [spambayes-dev] Regarding Whitelisting In-Reply-To: <000301c3711a$3e5cbfa0$0a00a8c0@Aleem> Message-ID: <027701c371b6$0304f930$f502a8c0@eden> Replying to a few mails at once here: [Aleem] > The users need it. I can know with certainty that the mail > from my potential employer will end up in my inbox and not > get lost with spam or overlooked in spam box, eventually > costing me my job. There is comfort in knowing that the mail > will show up in my inbox and I won't end up missing something > important. I have to admit that I do find this argument fairly compelling. I can perfectly understand the comfort level this would provide to users, given they didn't write the damn thing . I fully understand a user choosing to risk forged spam etc for the sake of being absolutely sure the mail will not be filtered by us. It is also an oft requested feature, which we should not igore. However, the crux of the issue is as Sean said: [Sean] > But managing it is yet another pile of not very stimulating work. But for SpamBayes, you have to add "implementing" too The key is, I believe, in what Alex said: [Alex] > What I don't understand is why people want one tool to do everything. > I have multiple MTAs which are separate from my MDA which is separate > from my MUA, with several filters in between... why should whitelisting > be added to spambayes, when spambayes does what it does very well, and > other tools (like procmail) can trivially do whitelisting very well, > and they can be easily used in conjunction? Because they can't for the vast majority of users. This includes me, and I believe most of the spambayes-dev crowd that use the Outlook addin. The "windows style" solution is to link this toolchain via components, which means it is effectively *inside* Outlook. While this may grate to your way of thinking, just remember the level of functionality we have achieved with the Outlook addin is not available using a discrete tool chain. This is getting back to the "filter hooks" idea Sean and I have been mulling over - in that world, a whitelist would be a new filter external to SpamBayes, but still working in conjunction with it. > Is this another case of unix mentality (use multiple tools which each > do their own thing well) is getting in the way of general acceptance More a case of the unix mentality getting in the way of "tools for the rest of us" being available on unix . I think the simple answer for the Outlook addin is to direct users to do the following: * Enable the timer (this will be in the UI in the next version) * Create a standard Outlook rule to move "whitelisted" mail to a special folder, possibly a sub-folder of the inbox. This will prevent SpamBayes from filtering the mail. The longer answer for the Outlook addin is for someone to implement it in a reasonable way and provide a patch. As you can see from the other responses, it doesn't appear likely it will come from any of the existing developers (even if they did have the time, which they don't!) Regards, Mark. From T.A.Meyer at massey.ac.nz Wed Sep 3 13:57:36 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Tue Sep 2 20:58:22 2003 Subject: [spambayes-dev] Regarding Whitelisting Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F13031E50EC@its-xchg4.massey.ac.nz> > I was thinking about this last week. The problem is that a > job offer from a potential employer is a fairly rare sort of > email. Consequently, it's quite possible there is not enough > evidence in the message to tip it toward the ham side. If > the employer uses a lot of terms which appear spammy, it > might not even make it to "unsure". I would have thought that potential employers would use lots of words that are hammy to me, since the job is probably doing something that I have corresponded with people about in the past. It's only anecdotal (and I've only had 4 or 5 since using spambayes), but this is how job offers to me have worked (and I've never seen spambayes classify them over 0.0 (rounded), even when I'd never heard of the person before). To me, it seems that if you know enough about the potential employer to whitelist them, then you know enough to get your mail program to ensure that the mail isn't filtered by spambayes or anything else. If you don't know what much, then whitelisting isn't going to help, either. =Tony Meyer From anthony at interlink.com.au Wed Sep 3 16:07:36 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Thu Sep 4 13:59:14 2003 Subject: [spambayes-dev] Regarding Whitelisting In-Reply-To: <16212.58550.775987.1184@montanaro.dyndns.org> Message-ID: <200309030507.h8357bdn016133@localhost.localdomain> >>> Skip Montanaro wrote > Then you have a very small circle of friends. ;-) I'm received spam > purporting to be from Guido. During this rash of SoBig and Blaster crap I > received lots of forged mail purporting to have been sent by Barry Warsaw. > I have met both Guido and Barry in person and can report with some > confidence that neither of them looks like a spammer. ;-) It's only me that Barry's spamming, then? never-trust-a-musician, Anthony -- Anthony Baxter It's never too late to have a happy childhood. From anthony at interlink.com.au Wed Sep 3 16:14:43 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Thu Sep 4 13:59:23 2003 Subject: [spambayes-dev] Regarding Whitelisting In-Reply-To: <027701c371b6$0304f930$f502a8c0@eden> Message-ID: <200309030514.h835Ehap016223@localhost.localdomain> >>> "Mark Hammond" wrote > [Aleem] > > The users need it. I can know with certainty that the mail > > from my potential employer will end up in my inbox and not > > get lost with spam or overlooked in spam box, eventually > > costing me my job. There is comfort in knowing that the mail > > will show up in my inbox and I won't end up missing something > > important. > > I have to admit that I do find this argument fairly compelling. I can > perfectly understand the comfort level this would provide to users, given > they didn't write the damn thing . I fully understand a user choosing > to risk forged spam etc for the sake of being absolutely sure the mail will > not be filtered by us. It is also an oft requested feature, which we should > not igore. Countered against this is the added complexity of the configuration and then having to deal with the increased user feedback. At the moment, spambayes does one thing, and does it well. If adding whitelisting is so vital, rather than add it to the core tools, add an interface for plugins to get called before and after the scoring. > However, the crux of the issue is as Sean said: > [Sean] > > But managing it is yet another pile of not very stimulating work. > > But for SpamBayes, you have to add "implementing" too Implementing the user interface is what's going to truly suck. > The longer answer for > the Outlook addin is for someone to implement it in a reasonable way and > provide a patch. As you can see from the other responses, it doesn't appear > likely it will come from any of the existing developers (even if they did > have the time, which they don't!) Yep. I think the FAQ answer for whitelisting should possibly feature the phrase "send code". -- Anthony Baxter It's never too late to have a happy childhood. From T.A.Meyer at massey.ac.nz Wed Sep 3 14:22:18 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Thu Sep 4 14:27:37 2003 Subject: [spambayes-dev] Regarding Whitelisting Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F13031E510E@its-xchg4.massey.ac.nz> > I don't suspect I would ever be receiving spam > from t.a.meyer@massey.ac.nz. And if I did it would > because there is a spammer on this list or you are a spammer > (obviously not). Heh. If I was a spammer, I think working on a spam filter would be an excellent hobby, since it would provide an insight into what would work best . Actually, you could get spam from t.a.meyer@massey.ac.nz in other ways, since I use the address elsewhere. A spammer could collect addresses and then send out spam that is 'from' hundreds of different addresses. How will a whitelist fight that? > I'm trying to make a case for it, because the case against it is weak. Realistically, you'd get better results by convincing someone that knows Python to write the code to do this, and submitting it as a patch. > Thanks, that's what I have done now - created a new folder > and enforced rules to move whitelist type mails in there so > SpamBayes doesn't get to them. So why did you need whitelisting again? <0.5 wink> =Tony Meyer From T.A.Meyer at massey.ac.nz Wed Sep 3 14:09:33 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Thu Sep 4 14:28:37 2003 Subject: [spambayes-dev] Regarding Whitelisting Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F13031E50F9@its-xchg4.massey.ac.nz> > Something every email user secretly knows, but doesn't want > to believe, is that email is an inappropriate medium for > truly important communication. So it goes. Absolutely. I'm not sure that every mail does know this, but they should (heh - they teach this in the business courses here). Email isn't reliable, even if you take all the filters out. There's some weird problem at the moment where email between me and a friend who works about 20 minutes drives from here takes about 4 days to be delivered. Sure, this is just anecdotal, and not all that common, but it does happen. I read an article about mail that was delayed *months* because someone mucked up a server. Let's face it - truly important communication should be face-to-face. At the very least it should involve some sort of receipt (even if just "here is the coffee you wanted" "thanks"). I wouldn't trust regular old postal mail for anything important either; I've had things lost far too many times. =Tony Meyer From T.A.Meyer at massey.ac.nz Wed Sep 3 14:16:40 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Thu Sep 4 14:28:43 2003 Subject: [spambayes-dev] Regarding Whitelisting Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F13031E5102@its-xchg4.massey.ac.nz> [ Too lazy to be tidy and reply to many at once like Mark ;) ] > I think the simple answer for the Outlook addin is to direct > users to do the following: > * Enable the timer (this will be in the UI in the next version) > * Create a standard Outlook rule to move "whitelisted" mail > to a special folder, possibly a sub-folder of the inbox. > > This will prevent SpamBayes from filtering the mail. Or even if they have a filtering rule that filters off all 'non-whitelisted' mail (i.e. everything not from certain addresses) into a 'Other Inbox' folder, and have spambayes filter that one. > The longer answer for the Outlook addin is for someone to > implement it in a reasonable way and provide a patch. Agreed. I'm sure no-one would object to a default-to-off whitelisting addition to the plug-in, if someone was providing all the implementation/maintenance work... For clarity, there are two reasons (other than a lack of time) that I can't see myself doing this: 1. It's a piece of cake to do this with Outlook (or OE, or any other client) without building it into spambayes. (In the same way that it's simple to get Outlook to auto-delete mail from the spam folder, without adding that as a spambayes feature). 2. I think it would end up hurting results. #1 is really the killer, for me. Even Aleem said that he managed to do this. Where, then, is the problem? =Tony Meyer From sourceforge at rodland.no Thu Sep 4 10:14:45 2003 From: sourceforge at rodland.no (Fredrik Rodland) Date: Thu Sep 4 14:40:24 2003 Subject: [spambayes-dev] New win32all versions In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F13031E50E5@its-xchg4.massey.ac.nz> Message-ID: > -----Original Message----- > From: spambayes-dev-bounces@python.org > [mailto:spambayes-dev-bounces@python.org]On Behalf Of Meyer, Tony > Sent: 3. september 2003 02:53 > To: Mark Hammond; spambayes-dev@python.org > Subject: RE: [spambayes-dev] New win32all versions > > > > It seems the starship is down, so I have put my new win32all > > versions at > > http://www.keypoint.com.au/~mhammond/win32all-158.exe for > > python 2.2 and > > http://www.keypoint.com.au/~mhammond/win32all-159.exe for python 2.3. > > > > These will end up on starship as soon as it comes back up. > > Has anyone else successfully grabbed these? I've tried to download them > three times and every time when I try to launch them (once the download > is complete), I get an error about the file size not being what it > should have been. Is this just me? I downloaded 159 right after Mark's post arrived, and it installed just fine. F -- Fredrik Rodland Technical Architect, Stocknet, Oslo, Norway Stocknet: http://www.stocknet.com phone: +47 23 28 40 17 Private: http://rodland.no phone: +47 99 21 98 17 From skip at pobox.com Wed Sep 3 10:05:42 2003 From: skip at pobox.com (Skip Montanaro) Date: Thu Sep 4 15:02:16 2003 Subject: [spambayes-dev] 1.0a5 Release In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F13031E50C4@its-xchg4.massey.ac.nz> References: <1ED4ECF91CDED24C8D012BCF2B034F13031E50C4@its-xchg4.massey.ac.nz> Message-ID: <16213.62774.948493.281707@montanaro.dyndns.org> Skip> One thing to be aware of is that old versions won't magically go Skip> away. Tony> I was wondering about this (in a non-Spambayes context) the other Tony> day. Is there a "setup.py uninstall" type command? Or once Tony> you've install something into your Python directory, are you just Tony> stuck with it, or stuck removing it manually? I think right now you have to remove it manually. There is no distutils uninstall command that I'm aware of. Skip From vanhorn at whidbey.com Wed Sep 3 01:11:00 2003 From: vanhorn at whidbey.com (G. Armour Van Horn) Date: Thu Sep 4 15:11:21 2003 Subject: [spambayes-dev] Business opportunity References: Message-ID: <3F559404.262CE153@whidbey.com> Okay, here's a wordy spam that sailed right through the filter. It had a funny look, showing as a white page with a grey pinstripe border inset an inch or so, with the text set inside the border, and all the tags seem to have been escaped. I'm going to paste in the source of the message first, then the clues from the web interface below. Van Return-path: Envelope-to: twisted@whidbey.com Delivery-date: Tue, 02 Sep 2003 23:42:46 -0700 Received: from [61.11.10.79] (helo=qfgf94101.com) by mail7.whidbey.net with smtp (Exim 4.20) id 19uRLo-0004gg-Eo; Tue, 02 Sep 2003 23:42:45 -0700 From: "Make$MoneyGivingAwayProducts" Reply-To: "Make$MoneyGivingAwayProducts" Date: Wed, 3 Sep 2003 02:18:49 -0400 Subject: New MLM Opportunity - Make Money Giving Away Samples 9/3/2003 2:18:49 AM X-Mailer: Microsoft Outlook Express 6.00.2600.0000 MIME-Version: 1.0 X-Precedence-Ref: 1234056789zxcvbnmlkjhgfqwrtyuo Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Message-Id: Bcc: X-Spambayes-Classification: ham X-Spambayes-Spam-Probability: 0.0690495477564 X-Spambayes-MailId: 1062571784 X-Mozilla-Status: 8001 X-Mozilla-Status2: 00000000 X-UIDL: f52fde2f46bc61f8d0466b2437c15b2d Hello hswo =2C =3Chtml=3E =3Cbody=3E =3Ccenter=3E=3Cfont color=3Dred=3E=3Ch2=3E=3C=2Fh2=3E=3C=2Ffont=3E =3Ctable width=3D=2275%=22 cellspacing=3D=220=22 cellpadding=3D=220=22 border=3D=221px=22=3E=3Ctr=3E =3Ctd width=3D=22133%=22 align=3D=22left=22 valign=3D=22top=22 =3E=3Cfont size=3D=222=22=3E =3Cb=3E Greetings!=3Cbr=3E =3Cbr=3E Some businesses are simple to build while others are more difficult=2E=3Cbr=3E =3Cbr=3E The one I'm about to present to you is one of those incredibly simple=3Cbr=3E ones to build=2E It doesn't matter if you are a heavy hitter or someone=3Cbr=3E who have not recruited anybody in the past=2E The system we have in=3Cbr=3E place make your recruiting efforts much easier and simpler=2E=3Cbr=3E =3Cbr=3E How about building a fantastic long term residual income simply by=3Cbr=3E giving away free samples at zero cost to you=3F=3Cbr=3E =3Cbr=3E We are not talking about any products=2E We are talking about unique=3Cbr=3E sizzle products of the highest quality and exclusive to our company=2E=3Cbr=3E In fact=2C all the products we market=2C our company has the worldwide =3Cbr=3E exclusive marketing rights=2E=3Cbr=3E =3Cbr=3E Our company is so confident in our product's effectiveness that they=3Cbr=3E pay all costs for the products you give away for free! The more you=3Cbr=3E give away=2C the more you make!=3Cbr=3E =3Cbr=3E One of our giveaway products is the coral calcium product=2E We know=3Cbr=3E of another coral calcium product out in the market=2E It was first=3Cbr=3E marketed thru TV and then thru another MLM and later it's also sold=2E=3Cbr=3E in Wall Mart=2E Nobody but our company has our coral calcium product=2E=3Cbr=3E Ours is not from some industrial polluted islands=2E I's hand picked=3Cbr=3E coral from the reefs of the islands of Tonga in the South Pacific=2E=3Cbr=3E The beautiful Island Kingdom of Tonga is over a thousand miles from=3Cbr=3E any significant populations or industries which might potentially=3Cbr=3E pollute the ocean's waters=2E=3Cbr=3E =3Cbr=3E Giving away free products is a proven customer acquisition technique=3Cbr=3E that has worked very well over the years=2E The people behind this =3Cbr=3E program are brilliant direct response marketers and they know how to=3Cbr=3E sell massive quantities of stuff=2E We are using the best techniques=3Cbr=3E from some of the best direct response marketers in the world with=3Cbr=3E this program=2E =3Cbr=3E =3Cbr=3E It doesn't stop here=2E The company also provides you with several=3Cbr=3E free websites to promote your business=2E Different website for =3Cbr=3E different product=2E There are couple of recruiting websites too=2E One of=3Cbr=3E them is the state of the art =22tap root=22 automated recruiting system=2E=3Cbr=3E =3Cbr=3E We use a non-flushing binary compensation plan=2E In addition=2C It pays=3Cbr=3E fast start bonus and 7 generations of matching bonus=2E The start-up =3Cbr=3E cost is under $100=2E=3Cbr=3E =3Cbr=3E Our company is backed by an 5 year old debt free financial and =3Cbr=3E marketing company=2E=3Cbr=3E =3Cbr=3E To request to take a free tour please click the following link and =3Cbr=3E send a blank message=2E We will reply you promptly with the link=2E=3Cbr=3E =3Ca href=3D=22mailto=3Afreeprodctoffer=40netzero=2Enet=3Fsubject=3DSend-link-FreeTour=22=3Cb=3E Please click here to request link to take free tour=3C=2Fb=3E=3C=2Fa=3E =3Cbr=3E Good Health and Good Fortune!=3Cbr=3E =3Cbr=3E =3Cbr=3E =3Cbr=3E =3Cbr=3E =3Cbr=3E =3Ca href=3D=22mailto=3Astrugglenomore=40netzero=2Enet=3Fsubject=3DREMOVE=22=3Cb=3E Remove from List=3C=2Fb=3E=3C=2Fa=3E =3Cbr=3E =3Cbr=3E =3C=2Fbody=3E =3C=2Fhtml=3E Clues: *H* 0.922604249404 *S* 0.0603422472645 binary 0.00185261424455 build. 0.0196506550218 generations 0.0505617977528 message. 0.06072492226 simpler. 0.0652173913043 islands. 0.0652173913043 anybody 0.0693000039519 couple 0.078869778667 polluted 0.0918367346939 pacific. 0.0918367346939 subject:Giving 0.0918367346939 pollute 0.0918367346939 several 0.0988999689764 i'm 0.102313651058 away, 0.106004769542 stuff. 0.106648980289 thru 0.113619986459 might 0.127547641848 subject:2003 0.12777250068 doesn't 0.140146657525 significant 0.141651288307 worked 0.153371520696 product's 0.155172413793 recruited 0.155172413793 populations 0.155172413793 mart. 0.155172413793 i's 0.155172413793 using 0.159699622461 which 0.163461145591 talking 0.163745342828 efforts 0.171769710997 subject:/ 0.180654691332 picked 0.188263519557 but 0.188274221916 there 0.189478683141 fantastic 0.1928199812 automated 0.194815884931 someone 0.195772238664 hello 0.197308545734 list 0.199201326964 marketers 0.199972057097 ones 0.200392104737 then 0.203894270143 later 0.212319387812 another 0.212505517858 was 0.217747381952 technique 0.222960377615 old 0.230006935554 kingdom 0.231607140623 start-up 0.237385934246 potentially 0.238982943648 others 0.240040455204 some 0.240471542102 businesses 0.242428721592 art 0.244646684435 different 0.246133525616 well 0.252621601356 them 0.255080354695 very 0.257780849416 under 0.259915888958 hand 0.260073237877 know 0.261243378684 good 0.261792160739 building 0.262204220366 they 0.265847472014 nobody 0.275081539628 charset:us-ascii 0.281523514104 too. 0.289860637394 market. 0.295388467964 use 0.299896786962 that 0.300156074584 marketed 0.302000309904 system 0.302757428559 ours 0.303482912548 also 0.308749719061 past. 0.310930351937 has 0.311185146002 give 0.311596351207 present 0.31652150803 about 0.318953828161 first 0.321695425133 long 0.322694842406 response 0.323822780311 reply 0.324192438543 tour 0.324834328857 it's 0.325501650415 those 0.325704854156 fact, 0.328357828138 program 0.338444137648 how 0.342165501562 following 0.34506065388 incredibly 0.657851855622 fast 0.663010749915 islands 0.665556219421 worldwide 0.668951612557 waters. 0.673664119131 stop 0.675372167664 link 0.681472977724 product. 0.686303648506 please 0.693500988903 zero 0.706988691962 sold. 0.711255141458 promote 0.726229561673 beautiful 0.730987105157 highest 0.732530984987 products 0.735603921068 subject:: 0.738932851662 marketing 0.743290370292 websites 0.745411451619 subject: 0.756728856617 cost 0.758562183038 pay 0.763624045674 income 0.766304243533 calcium 0.767510501579 giveaway 0.769949807064 quality 0.773903980006 financial 0.775330798724 our 0.780709101593 rights. 0.795417176078 health 0.804075242368 here 0.804906025917 quantities 0.806335142231 free 0.819359455644 proven 0.820983776358 remove 0.821809228792 subject: - 0.826461512996 exclusive 0.828702702245 header:Received:1 0.843613020039 blank 0.843960068237 greetings! 0.844827586207 reefs 0.844827586207 make! 0.844827586207 message-id:@mail7.whidbey.net 0.879086642736 bonus 0.894009319675 $100. 0.89883733596 content-type:text/html 0.903714657328 here. 0.904390469484 residual 0.904409456927 debt 0.905282183009 x-mailer:microsoft outlook express 6.00.2600.0000 0.912193270956 click 0.917860037829 bonus. 0.918753185118 subject:MLM 0.934782608696 sizzle 0.934782608696 coral 0.937956665433 subject:Opportunity 0.95871559633 subject:Away 0.966498334235 subject:Money 0.967503351882 subject:Samples 0.969798657718 subject:Make 0.989309832104 -- ---------------------------------------------------------- Sign up now for Quotes of the Day, a handful of quotations on a theme delivered every morning. Enlightenment! Daily, for free! mailto:twisted@whidbey.com?subject=Subscribe_QOTD For web hosting and maintenance, visit Van's home page: http://www.domainvanhorn.com/van/ ---------------------------------------------------------- From vanhorn at whidbey.com Tue Sep 2 20:56:09 2003 From: vanhorn at whidbey.com (G. Armour Van Horn) Date: Thu Sep 4 17:08:35 2003 Subject: [spambayes-dev] Correcting training References: Message-ID: <3F555849.11366974@whidbey.com> Okay, I will watch for these closely over the next few days, and save the clues and the full source of the messages for review. I'm guessing you won't want them mailed to the list, right? Van Tim Peters wrote: > [G. Armour Van Horn] > >> However, I don't think the filter has ever caught a > >> "Nigerian" spam or other "business proposal" sorts of > >> messages. > > [Tony Meyer] > > It did take me quite a lot of training to get these correctly > > classified, and even now they sit at about 90% rather than the 100% > > that just about everything else gets. > > I really wonder what's going on here! Since the very first tests I ran last > year, Nigerian scams have been absolutely nailed for me. Indeed, the very > worst false positive I had out of tens of thousands of test msgs was a > forward of a Nigerian scam, where the scam content so overwhelmed the "but > it was sent a real person commenting about it" clues that it appeared flatly > impossible to get the system ever to call that one ham (well, not short of > boosting the ham cutoff above 95!). > > I see that, in my home classifier, I've trained on 4 of these, the most > recent one from last December. Their internal *H* and *S* scores are > stellar: > > '*H*' 1.22125e-015 > '*S*' 1 > > '*H*' 0 > '*S*' 1 > > '*H*' 3.33067e-016 > '*S*' 0.999992 > > '*H*' 4.44089e-016 > '*S*' 0.999793 > > Indeed, 'petroleum' appears in exactly 4 of my total training msgs (namely > these 4). > > > I imagine that just a few mistrained would make it extremely difficult > > to correctly classify them. > > Me too -- although the stubborn false positive I mentioned above didn't > really seem to stop the tests from correctly classifying other Nigerian > scams in the runs where that FP was in the ham training data. > > There's still A Mystery here! -- ---------------------------------------------------------- Sign up now for Quotes of the Day, a handful of quotations on a theme delivered every morning. Enlightenment! Daily, for free! mailto:twisted@whidbey.com?subject=Subscribe_QOTD For web hosting and maintenance, visit Van's home page: http://www.domainvanhorn.com/van/ ---------------------------------------------------------- From mhammond at keypoint.com.au Wed Sep 3 13:12:26 2003 From: mhammond at keypoint.com.au (Mark Hammond) Date: Thu Sep 4 17:46:27 2003 Subject: [spambayes-dev] New win32all versions In-Reply-To: Message-ID: <050701c371c0$d1e52220$f502a8c0@eden> > Mark, at work (Win2K) and at home (Win98SE), 159 ended up > compiling stuff > into my TEMP directory, meaning that gen_py\2.3\ is hanging > off my TEMP > directory. That directory gets purged often, so isn't a > great place to > store stuff . Damn. It will all be re-created when necessary, but to get the old behaviour back, create a "gen_py" under win32com. I better add that to the readme, and fix the install script to default to the old behaviour. (I manage to share the same .py tree with multiple Python versions, so it makes sense for me Mark. From richie at entrian.com Thu Sep 4 23:50:16 2003 From: richie at entrian.com (Richie Hindle) Date: Thu Sep 4 17:50:56 2003 Subject: [spambayes-dev] Spambayes 1.0a5 released Message-ID: [Resending - apologies if this arrives twice, but I've been having email troubles] Spambayes 1.0a5 has been released. Here's WHAT_IS_NEW.txt: This file covers the major changes between each release. For more details, the reader is referred to the changelog (changelog.txt in the main directory of the archive), or for extreme details, to the check-ins archive (please see ) Changes are broken into sections for each application, plus one that will probably only interest developers, and one for everything else. Any actions necessary to move to this release from the previous release are noted in the "Transition" section. New in Alpha Release 5 ====================== -------------------------- ** Incompatible changes ** -------------------------- The values taken by some options have changed, so if you're upgrading from a previous version, you may need to update your configuration file (.spambayesrc or bayescustomize.ini) o allow_remote_connections now takes a list of allowed IP addresses, or the word 'localhost', or an asterisk (meaning all connections are accepted). o notate_to and notate_subject now take a comma-separated list of one or more of 'spam', 'ham' and 'unsure', allowing you to control which classes of message are notated. Outlook Plugin -------------- o Added a diagnostics dialog with functions to make it easier for users to help developers track down and fix bugs. o Added a 'timer' method of determining when to filter mail that should work better with Outlook's rule system. o Added a button on the Advanced tab of the dialog to display the SpamBayes data folder. o Moved "Filter Now" to an item on the drop down menu on the toolbar. o Items that can be filtered and trained include "IPM.Note" (normal messages) and "IPM.Anti-Virus*" (virus alerts by some software). o Changed the default filter action to "move" (instead of "untouched"). o Added a Wizard to assist with initial configuration (this will present itself when necessary). o Changed to allow filtering to be enabled, even if no training has been done. o Added a "New Folder" button to the folder selector dialog. o Massive changes to the dialog system (which should fix some problems), including changing the configuration dialog to a tabbed interface. o "Show Clues" now shows the percentage, as well as the raw score. o Added a "Help" menu to the drop down menu, with various information. o Added the ability to check for the latest version via an item on the drop down menu. o Hopefully, the "unread flag" issue is now fixed. o Fixed many problems with working on systems where English is not the default language, or where profile names have non-English characters. POP3 Proxy / SMTP Proxy / POP3 Proxy Service -------------------------------------------- o Fixed "assert hamcount <= nham" problem. o Starting and stopping the POP3 Proxy service (for Windows NT, Windows 2000 and Windows XP users) has been improved. Most noticeably, this means that the SMTP Proxy will start (if it is needed) as well. o Improve the "notate to" and "notate subject" options, so that ham and unsure messages can also be (optionally) notated in these fields. o Add the ability to skip caching messages that are over a (user configurable) size, so that you can keep the size of the cache directories smaller, once these messages are correctly classified. o Added the ability to skip caching messages that have a precedence of "bulk" (most mailing list messages), so that you can keep the size of the cache directories (and review list) smaller, once these messages are correctly classified. o Fixed the "ASCII decoding error" problem. o The SMTP proxy tries harder to pass on the command formatted exactly as it was given. This should make it more reliable. o Add the ability to have the SMTP proxy train on the message sent to it, rather than looking up the id in the cache (which is still possible, and generally the better option). o Removed the ability to add the SpamBayes identification number to the body of messages (it can still be added as a header). o The review messages page now puts unsure messages at the top. o The POP3 proxy should now work with fetchmail. o You can once again specify local addresses as well as ports for the POP3 proxy to listen on (was broken in 1.0a3 and 1.0a4). o A bug with the SMTP proxy that would show up in some cases as an "unrecognised command" error the mail client (particularly Eudora) was fixed. IMAP Filter ----------- o If you didn't use the -p switch to enter your password interactively, imapfilter would try and get it from the options, but if it wasn't there yet (because you hadn't done the setup yet), it would crash. This is now fixed. General ------- o Added the ability to store the SpamBayes database in a mySQL or postreSGL database table (currently supported by hammiefilter and the POP3 proxy). o Removed the ability to use the 'dumbdbm' as the storage method. (See the FAQ for reasons why). o We now allow the '@' and '=' characters in paths. o Added a simple n-way classifier using a cascade of binary SpamBayes classifiers. o Added version information to the web interface. o Fixed the yellow colour of the header boxes in the web interface. o Fixed restoring defaults from the web interface. o Added a missing line break in the status pane on the web interface when there are no proxies configured. o Prevent the "Show clues" links on the web interface's training page from word-wrapping and making all the table rows two lines high. o You can now put "*" at the end of a word in the "Word Query" box on the web interface, and have it show you the first ten words, and how many words there are in total, in the database that start with that word. o The web interface now supports HTTP-Auth. o Added a new script (code-named 'overkill.py') which enables 'drag and drop' training for POP3 users. This is currently still in the experimental stage, and anyone interested in trying it out should enquire on the SpamBayes mailing list (). Developer --------- o Created a directory for test suites, including a storage.py test. o An empty 'allowed values' now allows an empty string. o Add a get_option method, so an option instance itself can be fetched. o Support fetching the "latest" set of version data from the spambayes web site. Transition ========== If you are transitioning from a version older than 1.0a4, please also read the notes in the previous release notes (accessible from ). o If you were previously using the 'dumbdbm' storage method (you will have files called "hammie.db.dat", "hamie.db.dir" and "hammie.db.bak", rather than one file called "hammie.db"), then you will need to change to using either a pickle (please see the FAQ: ), bsddb, gdbm, or one of the new SQL based storage methods. The 'dumbdbm' storage method resulting in many databases being corrupted, and was never the best choice for storage, in any case. Although you can use the dbExpImp.py script to convert your database to your new storage system, we recommend that you retrain from scratch, as it is most likely that your database has been corrupted. o If you were using the options to notate the "To" or "Subject" headers with the message's classification, you will need to update your configuration file, as the format for these options have changed. o The ability to add the SpamBayes id to the message body has been removed, which means that Outlook Express users can no longer use the SMTP proxy and have it retrieve messages from the cache. These users can use the SMTP proxy by training on the forwarded message itself, but this is not recommended, as clues in the message will have changed (the "From" address will be yours, for example). At this time, you will have to use the web interface for training, although there is the possibility of 'drag and drop' training being added in a release in the near future. Reported Bugs Fixed =================== The following bugs tracked via the Sourceforge system were fixed: 776808, 795749, 787251, 790051, 743131, 779319, 785389, 786952, 788495, 790406, 788008, 787296, 788002, 780612, 784323, 784296, 780819, 780801, 779049, 765912, 777026, 777165, 693387, 690418, 719586, 769346, 761499, 769346, 773452, 765042, 760062, 768162, 768221, 797776, 797316, 796996 A url containing the details of these bugs can be made by appending the bug number to this url: http://sourceforge.net/tracker/index.php?func=detail&group_id=61702&atid=498103&aid= Feature Requests Added ====================== The following feature requests tracked via the Sourceforge system were added: 789916, 698036, 796832, 791319 A url containing the details of these feature requests can be made by appending the request number to this url: http://sourceforge.net/tracker/index.php?func=detail&group_id=61702&atid=498106&aid= Patches integrated =================== The following patches tracked via the Sourceforge system were integrated: 791254, 790615, 788001, 769981, 791393 A url containing the details of these feature requests can be made by appending the request number to this url: http://sourceforge.net/tracker/index.php?func=detail&group_id=61702&atid=498105&aid= -- Richie Hindle richie@entrian.com From barry at python.org Thu Sep 4 22:54:25 2003 From: barry at python.org (Barry Warsaw) Date: Thu Sep 4 17:54:26 2003 Subject: [spambayes-dev] Regarding Whitelisting In-Reply-To: <200309030507.h8357bdn016133@localhost.localdomain> References: <200309030507.h8357bdn016133@localhost.localdomain> Message-ID: <1062712461.2063.42.camel@yyz> On Wed, 2003-09-03 at 01:07, Anthony Baxter wrote: > It's only me that Barry's spamming, then? > > never-trust-a-musician, > Anthony So, buy a CD (or 7) and I'll stop . ain't-gettin'-rich-off-python-so-might-as-well-be-a-rock-star-ly y'rs, -Barry From skip at pobox.com Wed Sep 3 17:21:47 2003 From: skip at pobox.com (Skip Montanaro) Date: Thu Sep 4 17:59:30 2003 Subject: [spambayes-dev] Speedup for full retrain when using DB dict Message-ID: <16214.23403.923659.941454@montanaro.dyndns.org> My earlier message (which because of the mail load on mail.python.org you will probably get after this one) indicated that I had a patch which might speed up full retrains when using a shelve database. I'm happy to say it works well for me. The test I ran essentially executed rm hammie.db hammie.py -d -p hammie.db -g newham.clean -s newspam.clean between calls to the Unix date(1) program. The above two files contained a total of 15720 messages. The full retrain time dropped from about 33 minutes to about 20 minutes. The speedup comes from not writing to the shelve until until the training is completed. The context diff is attached. Skip -------------- next part -------------- A non-text attachment was scrubbed... Name: storage.diff Type: application/octet-stream Size: 1428 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes-dev/attachments/20030903/ff5397fa/storage.obj From mhammond at keypoint.com.au Wed Sep 3 12:21:15 2003 From: mhammond at keypoint.com.au (Mark Hammond) Date: Thu Sep 4 19:17:50 2003 Subject: [spambayes-dev] pop3proxy execution "rules" on Windows Message-ID: <04b601c371b9$ad5fa1c0$f502a8c0@eden> This is going back to the discussion last week about "per-user" data. It really only applies to the "front ends" we will provide for the proxies running on Windows. Currently, we provide only a "per user" proxy (ie, the data can not be reasonably shared among multiple users). We should make an effort so "profile roaming" works such that when a user logs on to another machine, if SpamBayes is installed on that machine, it all just works. This is already true for the Outlook addin, and users appreciate it. This is a particular problem for the service, which by default is installed to run as the "local system" user. Without going into the meandering reasoning behind it, I propose the following: * pop3proxy etc data is stored in the per-user "{Application Data}\SpamBayesProxy" (ie, next to the "SpamBayes" directory used by Outlook. It could be "SpamBayes\Proxy", but I see no good reason to mix the data up. I think we all agreed this (bar the specific name!) * pop3proxy_service will refuse to start unless it is configured to use a specific user name. * The application will create a mutex named somehting like "SpamBayesProxy-{username}". If that mutex exists, it refuses to start (as it must already be running for that user on the machine) * For the sake of simplicity all round, pop3proxy_normal (ie, the "normal" executable, whatever it is) will refuse to start if the service is already running on the current machine, even if it is running as a different user. If you want one machine to support multiple users concurrently running pop3proxy, you can't use the service. I suspect this will not be a problem, as people who want to run the service will tend to have exclusive access to the machine. (Note the tray icon app could still start in this case, which could control the service - just never a proxy) Later we can obviously relax any of these restrictions. Does that make sense? Mark. From mhammond at keypoint.com.au Wed Sep 3 13:06:21 2003 From: mhammond at keypoint.com.au (Mark Hammond) Date: Thu Sep 4 20:38:03 2003 Subject: [spambayes-dev] New win32all versions In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F13031E50E5@its-xchg4.massey.ac.nz> Message-ID: <049f01c371bf$f98c7590$f502a8c0@eden> > Has anyone else successfully grabbed these? I've tried to > download them > three times and every time when I try to launch them (once > the download > is complete), I get an error about the file size not being what it > should have been. Is this just me? I just downloaded it and it worked. > BTW, I gather from c.l.p that it's only starship's dns that's > down, not > the site itself. Does this mean that if we go to the correct ip > versions 158 and 159 will be there? It does now :) http://217.160.219.194/crew/mhammond/win32/ Mark -------------- next part -------------- A non-text attachment was scrubbed... Name: winmail.dat Type: application/ms-tnef Size: 1832 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes-dev/attachments/20030903/0a0a48ad/winmail.bin From ta-meyer at ihug.co.nz Fri Sep 5 14:05:58 2003 From: ta-meyer at ihug.co.nz (Tony Meyer) Date: Thu Sep 4 21:06:25 2003 Subject: [spambayes-dev] Big changes to cvs Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130212AE1B@its-xchg4.massey.ac.nz> If you don't use spambayes from cvs, you can ignore this. The major changes discussed on spambayes-dev for the 1.0a6 have been started. For those who haven't read that thread, this includes renaming and moving the majority of the scripts (just about everything in the root directory), and removing backwards compatibility code from the configuration file reading. This means that your system *will* break when you upgrade from cvs. At the very least, you'll need to update your system to use the renamed scripts. At the moment, if you use "setup.py install", all the old scripts will still be there. We'll get something together soonish to remove them. Hopefully, removing the backwards compatibility code from the options won't break of the scripts. I can't test everything, though, so I'll only be 'certain' about sb-imapfilter, sb-server (ex pop3proxy), sb-smtpproxy, timtest and timcv working. If people could check the scripts they use, that would be great. If you are using old names for options in your configuration script, they won't work any more (they have been deprecated for some time now). There is a script in 1.0a5 that will convert them for you, or you can do it by hand. If things break too bad for you, please use the 1.0a5 release (or cvs with the 1.0a5 release tag). That's why the releases are so close together. Thanks for the help! =Tony Meyer From vanhorn at whidbey.com Thu Sep 4 16:26:38 2003 From: vanhorn at whidbey.com (G. Armour Van Horn) Date: Thu Sep 4 21:21:42 2003 Subject: [spambayes-dev] Correcting training References: Message-ID: <3F57BC1E.1B9E6D43@whidbey.com> I think I need to figure out how to uninstall completely before I start bitching again. I had simply expanded the archive and dragged it to C:\Program Files\Spambayes and ran from there, then dragged some 1.0a5 files on top of that which barfed, and only then ran setup.py. As a result, I seem to be running Proxy Beta 2 and Web Interface Alpha 3, at least according to the web interface. As soon as I get that cleanup done I'll report the next one that arrives. I'm sure it won't take long! Van Tim Peters wrote: > [sorry for any repeats; resending stuff python.org thought may have > been dropped in the worm-turd storm > ] > > [G. Armour Van Horn] > > Okay, I will watch for these closely over the next few days, and > > save the clues and the full source of the messages for review. I'm > > guessing you won't want them mailed to the list, right? > > I do want them mailed to the list: I can't promise to make time to look > into it, and, even if I can, it's highly educational for the developers to > collaborate on debugging a problem (note that this is the spambayes-dev > list! it might be questionable on the spambayes list, but on *this* list > it's the reason for the list's existence ). -- ---------------------------------------------------------- Sign up now for Quotes of the Day, a handful of quotations on a theme delivered every morning. Enlightenment! Daily, for free! mailto:twisted@whidbey.com?subject=Subscribe_QOTD For web hosting and maintenance, visit Van's home page: http://www.domainvanhorn.com/van/ ---------------------------------------------------------- From skip at pobox.com Wed Sep 3 18:15:13 2003 From: skip at pobox.com (Skip Montanaro) Date: Thu Sep 4 22:03:27 2003 Subject: [spambayes-dev] trivial speedup? Message-ID: <16214.26609.867200.263742@montanaro.dyndns.org> I didn't want to simply check this in in case we're in feature freeze until after 1.0a6 is released. I found that when doing a full retrain (about 20k messages, so it does run for awhile) if I emitted the msg count every ten messages instead of every message that hammie.py's CPU utilization went from around 75% to 85% and the Window Manager's utilization when from about 5% to 2%. That suggests to me that hammie.py is waiting around for i/o to complete a fair amount of the time. I'm also playing around with temporarily substituting a dict for the shelve object when training from scratch as well, though the verdict's still out on that. Skip -------------- next part -------------- A non-text attachment was scrubbed... Name: hammiebulk.diff Type: application/octet-stream Size: 959 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes-dev/attachments/20030903/e0d73dd9/hammiebulk.obj From tim.one at comcast.net Thu Sep 4 22:09:20 2003 From: tim.one at comcast.net (Tim Peters) Date: Thu Sep 4 22:46:04 2003 Subject: [spambayes-dev] Speedup for full retrain when using DB dict In-Reply-To: <16214.23403.923659.941454@montanaro.dyndns.org> Message-ID: [Skip] > My earlier message (which because of the mail load on mail.python.org > you will probably get after this one) Heh -- I still haven't seen that one. > indicated that I had a patch which might speed up full retrains when > using a shelve database. I'm happy to say it works well for me. The > test I ran essentially executed > > rm hammie.db > hammie.py -d -p hammie.db -g newham.clean -s newspam.clean > > between calls to the Unix date(1) program. The above two files > contained a total of 15720 messages. The full retrain time dropped > from about 33 minutes to about 20 minutes. The speedup comes from > not writing to the shelve until until the training is completed. The > context diff is attached. Wouldn't it be simpler to do the full retrain using a PickledClassifier instance, then populate a DBDictClassifier from the result? That would also skip the extra layers of code (and time) to maintain the changed_words dict during the retrain. From ta-meyer at ihug.co.nz Fri Sep 5 16:21:07 2003 From: ta-meyer at ihug.co.nz (Tony Meyer) Date: Thu Sep 4 23:21:18 2003 Subject: [spambayes-dev] Feature Freeze Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130212AE1F@its-xchg4.massey.ac.nz> I figure we should probably announce this ;) Apologies for cross-posting this and the previous message. As of now, we're not adding any new features to spambayes [1]. The reason for this is to iron out any remaining bugs, without possibly adding more, so that we can release 1.0b1. If you come up with feature you would like added before the freeze is over (a couple of weeks, probably), then the best idea is to open a patch or feature request on sourceforge (http://sourceforge.net/projects/spambayes) so that it doesn't get lost. It will be dealt with at some point, but please be patient. (Developers: I guess if we want we could open a 1.1a1 branch if anyone wants to add new features? You lot are more clued than I am about cvs) In addition, it would be great if everyone can test out 1.0a5 and 1.0a6 (due very soon) and let us know about any bugs that you find. It would be even better if you could check the existing bug trackers to see if it has already been reported! Thanks! =Tony Meyer [1] This excludes the Outlook plug-in (Mark is his own law ) and building the new Windows binary. Both of these are self-contained and don't affect anything else. From mhammond at keypoint.com.au Fri Sep 5 10:31:23 2003 From: mhammond at keypoint.com.au (Mark Hammond) Date: Thu Sep 4 23:26:26 2003 Subject: [spambayes-dev] Business opportunity In-Reply-To: <3F559404.262CE153@whidbey.com> Message-ID: <26f301c3733c$a8bc4a90$f502a8c0@eden> I think the simple fact is that this spam happens to use "hammy" words for you. The top ham clues for you are: binary 0.00185261424455 build. 0.0196506550218 generations 0.0505617977528 message. 0.06072492226 simpler. 0.0652173913043 islands. 0.0652173913043 anybody 0.0693000039519 couple 0.078869778667 polluted 0.0918367346939 pacific. 0.0918367346939 subject:Giving 0.0918367346939 pollute 0.0918367346939 The first 3 would probably also be ham indicators for me (they are related to our job), but the rest would not. Similarly, looking at the strong "spam" clues there are a few you would not expect, and that may indeed be very hammy for other users (eg "coral") Looking at my clues for *your* message (I don't have the original), which should include all of the clues from the original (sans HTML), my strongest original indicators are: 'binary' 0.00634697 35 0 'morning.' 0.00819672 27 0 'brilliant' 0.0100223 22 0 I guess that if the spammers get "chatty" enough and keep obvious spam signs down, we may find a number of these starting to slip through, but training will keep them fairly rare, and still very much hit-and-miss for the spammers - ie, there is no "magic" formula they can use to get past more than a small percentage of bayesian filters. Mark. > -----Original Message----- > From: spambayes-dev-bounces@python.org > [mailto:spambayes-dev-bounces@python.org]On Behalf Of G. > Armour Van Horn > Sent: Wednesday, 3 September 2003 5:11 PM > To: spambayes-dev@python.org > Subject: [spambayes-dev] Business opportunity > > > Okay, here's a wordy spam that sailed right through the > filter. It had a funny > look, showing as a white page with a grey pinstripe border > inset an inch or so, > with the text set inside the border, and all the tags seem to > have been escaped. > I'm going to paste in the source of the message first, then > the clues from the > web interface below. > > Van > > > Return-path: > Envelope-to: twisted@whidbey.com > Delivery-date: Tue, 02 Sep 2003 23:42:46 -0700 > Received: from [61.11.10.79] (helo=qfgf94101.com) > by mail7.whidbey.net with smtp (Exim 4.20) > id 19uRLo-0004gg-Eo; Tue, 02 Sep 2003 23:42:45 -0700 > From: "Make$MoneyGivingAwayProducts" > Reply-To: "Make$MoneyGivingAwayProducts" > Date: Wed, 3 Sep 2003 02:18:49 -0400 > Subject: New MLM Opportunity - Make Money Giving Away > Samples 9/3/2003 > 2:18:49 AM > X-Mailer: Microsoft Outlook Express 6.00.2600.0000 > MIME-Version: 1.0 > X-Precedence-Ref: 1234056789zxcvbnmlkjhgfqwrtyuo > Content-Type: text/html; charset="us-ascii" > Content-Transfer-Encoding: quoted-printable > Message-Id: > Bcc: > X-Spambayes-Classification: ham > X-Spambayes-Spam-Probability: 0.0690495477564 > X-Spambayes-MailId: 1062571784 > X-Mozilla-Status: 8001 > X-Mozilla-Status2: 00000000 > X-UIDL: f52fde2f46bc61f8d0466b2437c15b2d > > Hello hswo =2C > > =3Chtml=3E > =3Cbody=3E > =3Ccenter=3E=3Cfont color=3Dred=3E=3Ch2=3E=3C=2Fh2=3E=3C=2Ffont=3E > =3Ctable width=3D=2275%=22 cellspacing=3D=220=22 cellpadding=3D=220=22 > border=3D=221px=22=3E=3Ctr=3E > =3Ctd width=3D=22133%=22 align=3D=22left=22 > valign=3D=22top=22 =3E=3Cfont > size=3D=222=22=3E > =3Cb=3E > Greetings!=3Cbr=3E > =3Cbr=3E > Some businesses are simple to build while others are more > difficult=2E=3Cbr=3E > =3Cbr=3E > The one I'm about to present to you is one of those > incredibly simple=3Cbr=3E > ones to build=2E It doesn't matter if you are a heavy hitter > or someone=3Cbr=3E > > who have not recruited anybody in the past=2E The system we > have in=3Cbr=3E > place make your recruiting efforts much easier and simpler=2E=3Cbr=3E > =3Cbr=3E > How about building a fantastic long term residual income > simply by=3Cbr=3E > giving away free samples at zero cost to you=3F=3Cbr=3E > =3Cbr=3E > We are not talking about any products=2E We are talking > about unique=3Cbr=3E > sizzle products of the highest quality and exclusive to our > company=2E=3Cbr=3E > In fact=2C all the products we market=2C our company has the > worldwide =3Cbr=3E > exclusive marketing rights=2E=3Cbr=3E > =3Cbr=3E > Our company is so confident in our product's effectiveness > that they=3Cbr=3E > pay all costs for the products you give away for free! The > more you=3Cbr=3E > give away=2C the more you make!=3Cbr=3E > =3Cbr=3E > One of our giveaway products is the coral calcium product=2E > We know=3Cbr=3E > of another coral calcium product out in the market=2E It was > first=3Cbr=3E > marketed thru TV and then thru another MLM and later it's > also sold=2E=3Cbr=3E > in Wall Mart=2E Nobody but our company has our coral calcium > product=2E=3Cbr=3E > > Ours is not from some industrial polluted islands=2E I's hand > picked=3Cbr=3E > coral from the reefs of the islands of Tonga in the South > Pacific=2E=3Cbr=3E > The beautiful Island Kingdom of Tonga is over a thousand > miles from=3Cbr=3E > any significant populations or industries which might > potentially=3Cbr=3E > pollute the ocean's waters=2E=3Cbr=3E > =3Cbr=3E > Giving away free products is a proven customer acquisition > technique=3Cbr=3E > that has worked very well over the years=2E The people > behind this =3Cbr=3E > program are brilliant direct response marketers and they know > how to=3Cbr=3E > sell massive quantities of stuff=2E We are using the best > techniques=3Cbr=3E > from some of the best direct response marketers in the world > with=3Cbr=3E > this program=2E =3Cbr=3E > =3Cbr=3E > It doesn't stop here=2E The company also provides you with > several=3Cbr=3E > free websites to promote your business=2E Different website > for =3Cbr=3E > different product=2E There are couple of recruiting websites > too=2E One > of=3Cbr=3E > them is the state of the art =22tap root=22 automated recruiting > system=2E=3Cbr=3E > =3Cbr=3E > We use a non-flushing binary compensation plan=2E In addition=2C It > pays=3Cbr=3E > fast start bonus and 7 generations of matching bonus=2E The > start-up =3Cbr=3E > cost is under $100=2E=3Cbr=3E > =3Cbr=3E > Our company is backed by an 5 year old debt free financial > and =3Cbr=3E > marketing company=2E=3Cbr=3E > =3Cbr=3E > To request to take a free tour please click the following > link and =3Cbr=3E > send a blank message=2E We will reply you promptly with the > link=2E=3Cbr=3E > =3Ca > href=3D=22mailto=3Afreeprodctoffer=40netzero=2Enet=3Fsubject=3 > DSend-link-FreeTour=22=3Cb=3E > > Please click here to request link to take free > tour=3C=2Fb=3E=3C=2Fa=3E > =3Cbr=3E > Good Health and Good Fortune!=3Cbr=3E > =3Cbr=3E > =3Cbr=3E > =3Cbr=3E > =3Cbr=3E > =3Cbr=3E > =3Ca > href=3D=22mailto=3Astrugglenomore=40netzero=2Enet=3Fsubject=3D > REMOVE=22=3Cb=3E > Remove from List=3C=2Fb=3E=3C=2Fa=3E > =3Cbr=3E > =3Cbr=3E > =3C=2Fbody=3E > =3C=2Fhtml=3E > > > Clues: > > *H* 0.922604249404 > *S* 0.0603422472645 > binary 0.00185261424455 > build. 0.0196506550218 > generations 0.0505617977528 > message. 0.06072492226 > simpler. 0.0652173913043 > islands. 0.0652173913043 > anybody 0.0693000039519 > couple 0.078869778667 > polluted 0.0918367346939 > pacific. 0.0918367346939 > subject:Giving 0.0918367346939 > pollute 0.0918367346939 > several 0.0988999689764 > i'm 0.102313651058 > away, 0.106004769542 > stuff. 0.106648980289 > thru 0.113619986459 > might 0.127547641848 > subject:2003 0.12777250068 > doesn't 0.140146657525 > significant 0.141651288307 > worked 0.153371520696 > product's 0.155172413793 > recruited 0.155172413793 > populations 0.155172413793 > mart. 0.155172413793 > i's 0.155172413793 > using 0.159699622461 > which 0.163461145591 > talking 0.163745342828 > efforts 0.171769710997 > subject:/ 0.180654691332 > picked 0.188263519557 > but 0.188274221916 > there 0.189478683141 > fantastic 0.1928199812 > automated 0.194815884931 > someone 0.195772238664 > hello 0.197308545734 > list 0.199201326964 > marketers 0.199972057097 > ones 0.200392104737 > then 0.203894270143 > later 0.212319387812 > another 0.212505517858 > was 0.217747381952 > technique 0.222960377615 > old 0.230006935554 > kingdom 0.231607140623 > start-up 0.237385934246 > potentially 0.238982943648 > others 0.240040455204 > some 0.240471542102 > businesses 0.242428721592 > art 0.244646684435 > different 0.246133525616 > well 0.252621601356 > them 0.255080354695 > very 0.257780849416 > under 0.259915888958 > hand 0.260073237877 > know 0.261243378684 > good 0.261792160739 > building 0.262204220366 > they 0.265847472014 > nobody 0.275081539628 > charset:us-ascii 0.281523514104 > too. 0.289860637394 > market. 0.295388467964 > use 0.299896786962 > that 0.300156074584 > marketed 0.302000309904 > system 0.302757428559 > ours 0.303482912548 > also 0.308749719061 > past. 0.310930351937 > has 0.311185146002 > give 0.311596351207 > present 0.31652150803 > about 0.318953828161 > first 0.321695425133 > long 0.322694842406 > response 0.323822780311 > reply 0.324192438543 > tour 0.324834328857 > it's 0.325501650415 > those 0.325704854156 > fact, 0.328357828138 > program 0.338444137648 > how 0.342165501562 > following 0.34506065388 > incredibly 0.657851855622 > fast 0.663010749915 > islands 0.665556219421 > worldwide 0.668951612557 > waters. 0.673664119131 > stop 0.675372167664 > link 0.681472977724 > product. 0.686303648506 > please 0.693500988903 > zero 0.706988691962 > sold. 0.711255141458 > promote 0.726229561673 > beautiful 0.730987105157 > highest 0.732530984987 > products 0.735603921068 > subject:: 0.738932851662 > marketing 0.743290370292 > websites 0.745411451619 > subject: 0.756728856617 > cost 0.758562183038 > pay 0.763624045674 > income 0.766304243533 > calcium 0.767510501579 > giveaway 0.769949807064 > quality 0.773903980006 > financial 0.775330798724 > our 0.780709101593 > rights. 0.795417176078 > health 0.804075242368 > here 0.804906025917 > quantities 0.806335142231 > free 0.819359455644 > proven 0.820983776358 > remove 0.821809228792 > subject: - 0.826461512996 > exclusive 0.828702702245 > header:Received:1 0.843613020039 > blank 0.843960068237 > greetings! 0.844827586207 > reefs 0.844827586207 > make! 0.844827586207 > message-id:@mail7.whidbey.net 0.879086642736 > bonus 0.894009319675 > $100. 0.89883733596 > content-type:text/html 0.903714657328 > here. 0.904390469484 > residual 0.904409456927 > debt 0.905282183009 > x-mailer:microsoft outlook express 6.00.2600.0000 0.912193270956 > click 0.917860037829 > bonus. 0.918753185118 > subject:MLM 0.934782608696 > sizzle 0.934782608696 > coral 0.937956665433 > subject:Opportunity 0.95871559633 > subject:Away 0.966498334235 > subject:Money 0.967503351882 > subject:Samples 0.969798657718 > subject:Make 0.989309832104 > > > > -- > ---------------------------------------------------------- > Sign up now for Quotes of the Day, a handful of quotations > on a theme delivered every morning. > Enlightenment! Daily, for free! > mailto:twisted@whidbey.com?subject=Subscribe_QOTD > > For web hosting and maintenance, > visit Van's home page: http://www.domainvanhorn.com/van/ > ---------------------------------------------------------- > > > > _______________________________________________ > spambayes-dev mailing list > spambayes-dev@python.org > http://mail.python.org/mailman/listinfo/spambayes-dev From mhammond at keypoint.com.au Fri Sep 5 09:54:30 2003 From: mhammond at keypoint.com.au (Mark Hammond) Date: Fri Sep 5 00:10:51 2003 Subject: [spambayes-dev] 1.0a5 Release In-Reply-To: <16213.62774.948493.281707@montanaro.dyndns.org> Message-ID: <24b301c37337$7ff64f70$f502a8c0@eden> The 1.0a5 release doesn't work with Outlook - it appears neither the dialogs.rc/h files (which is in CVS) nor the generated .py file (which is not) made it into the archive (only one of them need make it - I suggest what is in CVS) See https://sourceforge.net/tracker/?func=detail&atid=498103&aid=800555&group_id =61702 Mark. From vanhorn at whidbey.com Fri Sep 5 00:21:01 2003 From: vanhorn at whidbey.com (G. Armour Van Horn) Date: Fri Sep 5 02:21:16 2003 Subject: [spambayes-dev] 1.0a5 Release References: <24b301c37337$7ff64f70$f502a8c0@eden> Message-ID: <3F582B4D.898234A0@whidbey.com> I just wiped most traces of my earlier configurations off my local machine, saving only the database (I even screwed up and threw out my .ini file. I correctly ran setup (there's a first time for everything), and I was able to set everything I needed in the configuration and advanced configuration pages. (Good job on that, by the way.) However, I get 404 every time I save the changes, there doesn't seem to be a changeopts page. Did something get left out, or have I just found a new way to screw up the installation? Van Mark Hammond wrote: > The 1.0a5 release doesn't work with Outlook - it appears neither the > dialogs.rc/h files (which is in CVS) nor the generated .py file (which is > not) made it into the archive (only one of them need make it - I suggest > what is in CVS) > > See > https://sourceforge.net/tracker/?func=detail&atid=498103&aid=800555&group_id > =61702 > > Mark. > > _______________________________________________ > spambayes-dev mailing list > spambayes-dev@python.org > http://mail.python.org/mailman/listinfo/spambayes-dev -- ---------------------------------------------------------- Sign up now for Quotes of the Day, a handful of quotations on a theme delivered every morning. Enlightenment! Daily, for free! mailto:twisted@whidbey.com?subject=Subscribe_QOTD For web hosting and maintenance, visit Van's home page: http://www.domainvanhorn.com/van/ ---------------------------------------------------------- -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes-dev/attachments/20030904/7c7d2dcc/attachment-0001.htm From T.A.Meyer at massey.ac.nz Fri Sep 5 21:19:43 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Fri Sep 5 04:19:56 2003 Subject: [spambayes-dev] 1.0a5 Release Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F13031E5921@its-xchg4.massey.ac.nz> > However, I get 404 every time I save the changes, > there doesn't seem to be a changeopts page. > Did something get left out, or have I just found a > new way to screw up the installation? UserInterface.py should serve up the changeopts page - it makes all the changes and presents you with either a "save done" or "invalid settings" page. I presume your settings aren't being saved via the interface, then, right? (If they are, what happens if you try refreshing the 404 page?) Do all the other interface pages show up correctly? Is this pop3proxy or imapfilter? Everything works here, but I'm not doing this from a completely fresh install, so that may be why. =Tony Meyer From vanhorn at whidbey.com Fri Sep 5 04:07:05 2003 From: vanhorn at whidbey.com (G. Armour Van Horn) Date: Fri Sep 5 06:07:10 2003 Subject: [spambayes-dev] 1.0a5 Release References: <1ED4ECF91CDED24C8D012BCF2B034F13031E5921@its-xchg4.massey.ac.nz> Message-ID: <3F586049.66FF4220@whidbey.com> My settings *are* saved when I hit the Save button in either the options page or the advanced options page, I just don't get the changeopts page to confirm that it has been done. I note that all the scripts in the /Python22/Scripts directory are dated on the 4th, but the UserInterface.py in /Python22/Lib/site-packages/spambaues is dated on the 1st, but that matches the time/date on the file in the 1.0a5 archive. (The main scripts are timestamped at the moment of installation.) Here's what comes up in the command window where the proxy is loaded: Loading database... error: uncaptured python exception, closing channel (socket.error:(10054, 'Connectio n reset by peer') [C:\Python22\lib\asynchat.py|initiate_send|213] [C:\Python22\l ib\asyncore.py|send|343]) Loading database... error: uncaptured python exception, closing channel (socket.error:(10054, 'Connectio n reset by peer') [C:\Python22\lib\asynchat.py|initiate_send|213] [C:\Python22\l ib\asyncore.py|send|343]) I think I hit save twice to get that. I've been changing my spam cutoff from .8 to .81 and back for testing purposes. Van "Meyer, Tony" wrote: > > However, I get 404 every time I save the changes, > > there doesn't seem to be a changeopts page. > > Did something get left out, or have I just found a > > new way to screw up the installation? > > UserInterface.py should serve up the changeopts page - it makes all the > changes and presents you with either a "save done" or "invalid settings" > page. > > I presume your settings aren't being saved via the interface, then, > right? (If they are, what happens if you try refreshing the 404 page?) > > Do all the other interface pages show up correctly? Is this pop3proxy > or imapfilter? Everything works here, but I'm not doing this from a > completely fresh install, so that may be why. > > =Tony Meyer -- ---------------------------------------------------------- Sign up now for Quotes of the Day, a handful of quotations on a theme delivered every morning. Enlightenment! Daily, for free! mailto:twisted@whidbey.com?subject=Subscribe_QOTD For web hosting and maintenance, visit Van's home page: http://www.domainvanhorn.com/van/ ---------------------------------------------------------- From vanhorn at whidbey.com Fri Sep 5 04:08:58 2003 From: vanhorn at whidbey.com (G. Armour Van Horn) Date: Fri Sep 5 06:09:01 2003 Subject: [spambayes-dev] 1.0a5 Release References: <1ED4ECF91CDED24C8D012BCF2B034F13031E5921@its-xchg4.massey.ac.nz> Message-ID: <3F5860BA.D429CA72@whidbey.com> Sorry, forgot to include that this is a pop3proxy install only, no smtp proxy has been specified, no other functions have been run. Van "Meyer, Tony" wrote: > > However, I get 404 every time I save the changes, > > there doesn't seem to be a changeopts page. > > Did something get left out, or have I just found a > > new way to screw up the installation? > > UserInterface.py should serve up the changeopts page - it makes all the > changes and presents you with either a "save done" or "invalid settings" > page. > > I presume your settings aren't being saved via the interface, then, > right? (If they are, what happens if you try refreshing the 404 page?) > > Do all the other interface pages show up correctly? Is this pop3proxy > or imapfilter? Everything works here, but I'm not doing this from a > completely fresh install, so that may be why. > > =Tony Meyer -- ---------------------------------------------------------- Sign up now for Quotes of the Day, a handful of quotations on a theme delivered every morning. Enlightenment! Daily, for free! mailto:twisted@whidbey.com?subject=Subscribe_QOTD For web hosting and maintenance, visit Van's home page: http://www.domainvanhorn.com/van/ ---------------------------------------------------------- From richie at entrian.com Thu Sep 4 10:07:21 2003 From: richie at entrian.com (Richie Hindle) Date: Fri Sep 5 08:11:16 2003 Subject: [spambayes-dev] Spambayes 1.0a5 released Message-ID: Spambayes 1.0a5 has been released. Here's WHAT_IS_NEW.txt: This file covers the major changes between each release. For more details, the reader is referred to the changelog (changelog.txt in the main directory of the archive), or for extreme details, to the check-ins archive (please see ) Changes are broken into sections for each application, plus one that will probably only interest developers, and one for everything else. Any actions necessary to move to this release from the previous release are noted in the "Transition" section. New in Alpha Release 5 ====================== -------------------------- ** Incompatible changes ** -------------------------- The values taken by some options have changed, so if you're upgrading from a previous version, you may need to update your configuration file (.spambayesrc or bayescustomize.ini) o allow_remote_connections now takes a list of allowed IP addresses, or the word 'localhost', or an asterisk (meaning all connections are accepted). o notate_to and notate_subject now take a comma-separated list of one or more of 'spam', 'ham' and 'unsure', allowing you to control which classes of message are notated. Outlook Plugin -------------- o Added a diagnostics dialog with functions to make it easier for users to help developers track down and fix bugs. o Added a 'timer' method of determining when to filter mail that should work better with Outlook's rule system. o Added a button on the Advanced tab of the dialog to display the SpamBayes data folder. o Moved "Filter Now" to an item on the drop down menu on the toolbar. o Items that can be filtered and trained include "IPM.Note" (normal messages) and "IPM.Anti-Virus*" (virus alerts by some software). o Changed the default filter action to "move" (instead of "untouched"). o Added a Wizard to assist with initial configuration (this will present itself when necessary). o Changed to allow filtering to be enabled, even if no training has been done. o Added a "New Folder" button to the folder selector dialog. o Massive changes to the dialog system (which should fix some problems), including changing the configuration dialog to a tabbed interface. o "Show Clues" now shows the percentage, as well as the raw score. o Added a "Help" menu to the drop down menu, with various information. o Added the ability to check for the latest version via an item on the drop down menu. o Hopefully, the "unread flag" issue is now fixed. o Fixed many problems with working on systems where English is not the default language, or where profile names have non-English characters. POP3 Proxy / SMTP Proxy / POP3 Proxy Service -------------------------------------------- o Fixed "assert hamcount <= nham" problem. o Starting and stopping the POP3 Proxy service (for Windows NT, Windows 2000 and Windows XP users) has been improved. Most noticeably, this means that the SMTP Proxy will start (if it is needed) as well. o Improve the "notate to" and "notate subject" options, so that ham and unsure messages can also be (optionally) notated in these fields. o Add the ability to skip caching messages that are over a (user configurable) size, so that you can keep the size of the cache directories smaller, once these messages are correctly classified. o Added the ability to skip caching messages that have a precedence of "bulk" (most mailing list messages), so that you can keep the size of the cache directories (and review list) smaller, once these messages are correctly classified. o Fixed the "ASCII decoding error" problem. o The SMTP proxy tries harder to pass on the command formatted exactly as it was given. This should make it more reliable. o Add the ability to have the SMTP proxy train on the message sent to it, rather than looking up the id in the cache (which is still possible, and generally the better option). o Removed the ability to add the SpamBayes identification number to the body of messages (it can still be added as a header). o The review messages page now puts unsure messages at the top. o The POP3 proxy should now work with fetchmail. o You can once again specify local addresses as well as ports for the POP3 proxy to listen on (was broken in 1.0a3 and 1.0a4). o A bug with the SMTP proxy that would show up in some cases as an "unrecognised command" error the mail client (particularly Eudora) was fixed. IMAP Filter ----------- o If you didn't use the -p switch to enter your password interactively, imapfilter would try and get it from the options, but if it wasn't there yet (because you hadn't done the setup yet), it would crash. This is now fixed. General ------- o Added the ability to store the SpamBayes database in a mySQL or postreSGL database table (currently supported by hammiefilter and the POP3 proxy). o Removed the ability to use the 'dumbdbm' as the storage method. (See the FAQ for reasons why). o We now allow the '@' and '=' characters in paths. o Added a simple n-way classifier using a cascade of binary SpamBayes classifiers. o Added version information to the web interface. o Fixed the yellow colour of the header boxes in the web interface. o Fixed restoring defaults from the web interface. o Added a missing line break in the status pane on the web interface when there are no proxies configured. o Prevent the "Show clues" links on the web interface's training page from word-wrapping and making all the table rows two lines high. o You can now put "*" at the end of a word in the "Word Query" box on the web interface, and have it show you the first ten words, and how many words there are in total, in the database that start with that word. o The web interface now supports HTTP-Auth. o Added a new script (code-named 'overkill.py') which enables 'drag and drop' training for POP3 users. This is currently still in the experimental stage, and anyone interested in trying it out should enquire on the SpamBayes mailing list (). Developer --------- o Created a directory for test suites, including a storage.py test. o An empty 'allowed values' now allows an empty string. o Add a get_option method, so an option instance itself can be fetched. o Support fetching the "latest" set of version data from the spambayes web site. Transition ========== If you are transitioning from a version older than 1.0a4, please also read the notes in the previous release notes (accessible from ). o If you were previously using the 'dumbdbm' storage method (you will have files called "hammie.db.dat", "hamie.db.dir" and "hammie.db.bak", rather than one file called "hammie.db"), then you will need to change to using either a pickle (please see the FAQ: ), bsddb, gdbm, or one of the new SQL based storage methods. The 'dumbdbm' storage method resulting in many databases being corrupted, and was never the best choice for storage, in any case. Although you can use the dbExpImp.py script to convert your database to your new storage system, we recommend that you retrain from scratch, as it is most likely that your database has been corrupted. o If you were using the options to notate the "To" or "Subject" headers with the message's classification, you will need to update your configuration file, as the format for these options have changed. o The ability to add the SpamBayes id to the message body has been removed, which means that Outlook Express users can no longer use the SMTP proxy and have it retrieve messages from the cache. These users can use the SMTP proxy by training on the forwarded message itself, but this is not recommended, as clues in the message will have changed (the "From" address will be yours, for example). At this time, you will have to use the web interface for training, although there is the possibility of 'drag and drop' training being added in a release in the near future. Reported Bugs Fixed =================== The following bugs tracked via the Sourceforge system were fixed: 776808, 795749, 787251, 790051, 743131, 779319, 785389, 786952, 788495, 790406, 788008, 787296, 788002, 780612, 784323, 784296, 780819, 780801, 779049, 765912, 777026, 777165, 693387, 690418, 719586, 769346, 761499, 769346, 773452, 765042, 760062, 768162, 768221, 797776, 797316, 796996 A url containing the details of these bugs can be made by appending the bug number to this url: http://sourceforge.net/tracker/index.php?func=detail&group_id=61702&atid=498103&aid= Feature Requests Added ====================== The following feature requests tracked via the Sourceforge system were added: 789916, 698036, 796832, 791319 A url containing the details of these feature requests can be made by appending the request number to this url: http://sourceforge.net/tracker/index.php?func=detail&group_id=61702&atid=498106&aid= Patches integrated =================== The following patches tracked via the Sourceforge system were integrated: 791254, 790615, 788001, 769981, 791393 A url containing the details of these feature requests can be made by appending the request number to this url: http://sourceforge.net/tracker/index.php?func=detail&group_id=61702&atid=498105&aid= -- Richie Hindle richie@entrian.com From skip at pobox.com Fri Sep 5 09:54:54 2003 From: skip at pobox.com (Skip Montanaro) Date: Fri Sep 5 09:55:09 2003 Subject: [spambayes-dev] Re: [Spambayes] Big changes to cvs In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F130212AE1B@its-xchg4.massey.ac.nz> References: <1ED4ECF91CDED24C8D012BCF2B034F130212AE1B@its-xchg4.massey.ac.nz> Message-ID: <16216.38318.586531.912642@montanaro.dyndns.org> Tony> If you don't use spambayes from cvs, you can ignore this. The Tony> major changes discussed on spambayes-dev for the 1.0a6 have been Tony> started. For those who haven't read that thread, this includes Tony> renaming and moving the majority of the scripts (just about Tony> everything in the root directory), and removing backwards Tony> compatibility code from the configuration file reading. One nit. The indication in the checkin message and I thought in the discussion was that the script prefix was going to be "sb-". Instead the prefix is "sb_". Skip From skip at pobox.com Fri Sep 5 10:08:25 2003 From: skip at pobox.com (Skip Montanaro) Date: Fri Sep 5 10:08:37 2003 Subject: [spambayes-dev] Re: [Spambayes-checkins] website faq.txt,1.39,1.40 In-Reply-To: References: Message-ID: <16216.39129.25928.700866@montanaro.dyndns.org> Tony> Add a FAQ about the 'access denied' problem that a lot of Tony> pop3proxy people seem to run into. We might want to special case Tony> this error and tell them what to do instead, or maybe even just Tony> automatically try a range of different ports. How about starting at 8880 and incrementing the port number until a free one is found, then displaying that port? The only situation where you'd die with an error is if the user specified an explicit port which was in use. Skip From kennypitt at hotmail.com Fri Sep 5 11:41:59 2003 From: kennypitt at hotmail.com (Kenny Pitt) Date: Fri Sep 5 10:42:29 2003 Subject: [spambayes-dev] Re: [Spambayes-checkins] website faq.txt, 1.39, 1.40 In-Reply-To: <16216.39129.25928.700866@montanaro.dyndns.org> Message-ID: <000401c373bb$e0c0dbc0$300a10ac@spidynamics.com> Skip Montanaro wrote: > Tony> Add a FAQ about the 'access denied' problem that a lot of > Tony> pop3proxy people seem to run into. We might want to special case > Tony> this error and tell them what to do instead, or maybe even just > Tony> automatically try a range of different ports. > > How about starting at 8880 and incrementing the port number until a > free one is found, then displaying that port? The only situation > where you'd die with an error is if the user specified an explicit > port which was in use. > Only potential problem I see is that changing the port number dynamically instead of under user control will make it difficult for the user to create a Favorites shortcut to access the user interface. For the tray app, this problem could probably be solved by having the proxy share the port number used in some way that is accessible to the tray app. -- Kenny Pitt From skip at pobox.com Fri Sep 5 10:53:48 2003 From: skip at pobox.com (Skip Montanaro) Date: Fri Sep 5 10:54:02 2003 Subject: [spambayes-dev] Speedup for full retrain when using DB dict In-Reply-To: References: <16214.23403.923659.941454@montanaro.dyndns.org> Message-ID: <16216.41852.346684.381899@montanaro.dyndns.org> >> indicated that I had a patch which might speed up full retrains when >> using a shelve database. I'm happy to say it works well for me. The >> test I ran essentially executed Tim> Wouldn't it be simpler to do the full retrain using a Tim> PickledClassifier instance, then populate a DBDictClassifier from Tim> the result? That would also skip the extra layers of code (and Tim> time) to maintain the changed_words dict during the retrain. Perhaps. Are you suggesting I detect the zero-length shelve object before instantiating a classifier or instantiating a PickledClassifier from within the DBDictClassifier code? Skip From skip at pobox.com Fri Sep 5 11:16:23 2003 From: skip at pobox.com (Skip Montanaro) Date: Fri Sep 5 11:16:28 2003 Subject: [spambayes-dev] Does this seem too brutal? Message-ID: <16216.43207.661537.217810@montanaro.dyndns.org> Here's a crude hack to setup.py which complains if the user tries installing while the old files are still in place. Does it seem to extreme to error out of the install or should it print the warnings and continue the install? Skip -------------- next part -------------- A non-text attachment was scrubbed... Name: setup.diff Type: application/octet-stream Size: 2622 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes-dev/attachments/20030905/41bf5804/setup.obj From skip at pobox.com Fri Sep 5 10:54:48 2003 From: skip at pobox.com (Skip Montanaro) Date: Fri Sep 5 11:22:57 2003 Subject: [spambayes-dev] Re: [Spambayes-checkins] spambayes/testtools pop3proxytest.py, 1.1, NONE In-Reply-To: References: Message-ID: <16216.41912.695688.735925@montanaro.dyndns.org> Tony> Crap! We can't use "sb-" as a prefix, because then we can't Tony> import the scripts. I guess that all the importable code could be Tony> moved into modules, but that seems like a huge hassle. Let's use Tony> "sb_" as a prefix instead. Then ignore my previos nit... :-) Skip From skip at pobox.com Fri Sep 5 10:24:33 2003 From: skip at pobox.com (Skip Montanaro) Date: Fri Sep 5 11:24:28 2003 Subject: [spambayes-dev] what did hammie.py and hammiesrv.py become? Message-ID: <16216.40097.963548.245243@montanaro.dyndns.org> I'm trying to update setup.py to reference the new script names. I can't seem to figure out what hammie.py and hammiesrc.py became. Clues appreciated. Thanks, Skip From skip at pobox.com Fri Sep 5 11:28:01 2003 From: skip at pobox.com (Skip Montanaro) Date: Fri Sep 5 11:28:01 2003 Subject: [spambayes-dev] trivial options checker Message-ID: <16216.43905.747082.915704@montanaro.dyndns.org> I suspect many people (like myself) will have to make some changes to their options file. I checked in a trivial script (one-liner) as scripts/sb_chkopts.py to perform this task so non-developers don't have to figure out how to do the import. Skip From tim.one at comcast.net Fri Sep 5 12:55:47 2003 From: tim.one at comcast.net (Tim Peters) Date: Fri Sep 5 11:55:56 2003 Subject: [spambayes-dev] Re: [Spambayes] Big changes to cvs In-Reply-To: <16216.38318.586531.912642@montanaro.dyndns.org> Message-ID: [Skip] > One nit. The indication in the checkin message and I thought in the > discussion was that the script prefix was going to be "sb-". Instead > the prefix is "sb_". Tony quickly discovered that a Python module with name starting with "sb-" can't be used in an import statement from any other module. I'm unclear on why it was thought desirable to uglify all the names, but making modules unimportable should have an obvious downside to everyone . From yrxlnc at bresnan.net Fri Sep 5 11:15:54 2003 From: yrxlnc at bresnan.net (Chas. E. Lehnert) Date: Fri Sep 5 12:18:18 2003 Subject: [spambayes-dev] encountered error on outlook 2000 filter Message-ID: Dear Spambayes Developers, Please refer to the attached word document for pictures of err msgs described below. I use OUtlook 2000, and installed the Spambayes plugin for it. It has been working well for about 60 days now. But this am when I started outlook, I get error dialog box (the top one in the attachment). I dismiss the box and another pops up (the 2nd one in the attachment). Restarting outlook/rebooting/etc. does not help. Running addin.py again did not help. The FAQ on the spambayes.org site does not address this problem. Is there a way I can repair this error on-site? cheers Chas. E. Lehnert --- "Most people would sooner die than think; in fact, they do so." -- Bertrand Russell, (1872 - 1970), British philosopher, mathematician, and writer -------------- next part -------------- A non-text attachment was scrubbed... Name: SpamBayesErrs.doc Type: application/msword Size: 30208 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes-dev/attachments/20030905/af1c259e/SpamBayesErrs-0001.doc From tim.one at comcast.net Fri Sep 5 16:40:37 2003 From: tim.one at comcast.net (Tim Peters) Date: Fri Sep 5 15:40:41 2003 Subject: [spambayes-dev] Speedup for full retrain when using DB dict In-Reply-To: <16216.41852.346684.381899@montanaro.dyndns.org> Message-ID: [Tim[ > Wouldn't it be simpler to do the full retrain using a > PickledClassifier instance, then populate a DBDictClassifier > from the result? That would also skip the extra layers of > code (and time) to maintain the changed_words dict during > the retrain. [Skip Montanaro] > Perhaps. Are you suggesting I detect the zero-length shelve object > before instantiating a classifier or instantiating a > PickledClassifier from within the DBDictClassifier code? They should both be simple enough to try and time, and then let your sense of beauty guide you . What I'd like to avoid is stuff trying to make a class's implementatinon act like a different class's implementation, and especially when that latter class already exists and does a fine job as-is. From richie at entrian.com Fri Sep 5 22:30:51 2003 From: richie at entrian.com (Richie Hindle) Date: Fri Sep 5 16:47:51 2003 Subject: [spambayes-dev] what did hammie.py and hammiesrv.py become? In-Reply-To: <16216.40097.963548.245243@montanaro.dyndns.org> References: <16216.40097.963548.245243@montanaro.dyndns.org> Message-ID: [Skip] > I'm trying to update setup.py to reference the new script names. I can't > seem to figure out what hammie.py and hammiesrc.py became. Clues > appreciated. hammie.py has gone away, replaced with sb_filter.py which is the new name for hammiefilter.py. hammiesrv.py became sb_xmlrpcserver.py - that's a clumsy name, but the long-term plan is fold its functionality into sb_server.py (which is the new name for pop3proxy.py, on the grounds that it does more than POP3 already, and should become our only server script eventually). I-am-the-new-number-two-ly yrs, -- Richie Hindle richie@entrian.com From richie at entrian.com Fri Sep 5 23:04:12 2003 From: richie at entrian.com (Richie Hindle) Date: Fri Sep 5 17:04:20 2003 Subject: [spambayes-dev] pop3proxy execution "rules" on Windows In-Reply-To: <04b601c371b9$ad5fa1c0$f502a8c0@eden> References: <04b601c371b9$ad5fa1c0$f502a8c0@eden> Message-ID: [Mark] > * pop3proxy etc data is stored in the per-user [..] > > * pop3proxy_service will refuse to start unless it is configured to use a > specific user name. > > * The application will create a mutex named somehting like > "SpamBayesProxy-{username}". With you so far. Looks like you're shaping up to allow multiple instances of the server, all running as different users, with optionally one of them being a service. But then: > * For the sake of simplicity all round, pop3proxy_normal (ie, the "normal" > executable, whatever it is) will refuse to start if the service is already > running on the current machine, even if it is running as a different user. ...which I don't understand - what makes the "normal" executable so different from the service, if the service is running as a user...? And then: > If you want one machine to support multiple users concurrently running > pop3proxy, you can't use the service. That sounds like it contradicts the first three points, which seemed to be carefully insulating different users from each other. Why should the fact that someone else is using it one way preclude me from using it another way? (I assume only one user can run a Windows service at once, and if that's a restriction of Windows then fair enough.) > I suspect this will not be a problem, > as people who want to run the service will tend to have exclusive access to > the machine. Yes, but can they be sure that all other users have logged off? I've never worked anywhere that implemented hot desking, but I imagine XP's "fast user switching" capability leads to people being lazy about logging out - just hit Standby and then whoever comes to the machine next will log on and use it. It's not great for system resource usage, but when you next come along, all your applications (and services, I assume) are still running - very convenient. > (Note the tray icon app could still start in this case, which > could control the service - just never a proxy) [Where "in this case" means "where a service is running as any user", I assume] Again, I don't see why you're differentiating between services and normal user processes. Say I share a machine with Johnny Lazy, who never bothers logging out, and I download Spambayes (possibly even through his recommendation) and try to run it, I'll be annoyed if it says "Someone else is running the Spambayes service, so you can't run it as a normal application, even with a different data directory and on different ports." I'm not saying that your proposed restrictions are outrageously unacceptable, just that I can't figure out the reasons behind them. If every user has their own data area and uses their own ports, what difference does it make whether they're running as a service or an application? > Does that make sense? Mostly yes, mostly. 8-) -- Richie Hindle richie@entrian.com From richie at entrian.com Fri Sep 5 23:04:16 2003 From: richie at entrian.com (Richie Hindle) Date: Fri Sep 5 17:04:24 2003 Subject: [spambayes-dev] Re: [Spambayes-checkins] website faq.txt, 1.39, 1.40 In-Reply-To: <000401c373bb$e0c0dbc0$300a10ac@spidynamics.com> References: <16216.39129.25928.700866@montanaro.dyndns.org> <000401c373bb$e0c0dbc0$300a10ac@spidynamics.com> Message-ID: [Tony] > ...the 'access denied' problem that a lot of pop3proxy people > seem to run into. We might want to special case this error and > tell them what to do instead, or maybe even just automatically > try a range of different ports. [Skip] > How about starting at 8880 and incrementing the port number until a > free one is found, then displaying that port? The only situation > where you'd die with an error is if the user specified an explicit > port which was in use. [Kenny] > Only potential problem I see is that changing the port number dynamically > instead of under user control will make it difficult for the user to create > a Favorites shortcut to access the user interface. For the tray app, this > problem could probably be solved by having the proxy share the port number > used in some way that is accessible to the tray app. I think we should prevent the thing from starting up if the configureed port (or the default port if none is configured) is already in use. We should give an error saying that the port is already in use, and how to configure a different one. Automatically choosing a different one is too much like black magic IMHO. The tray app will need to know which port to connect to, either by directly configuring the port number or by following the same rules to find the ini file. -- Richie Hindle richie@entrian.com From richie at entrian.com Fri Sep 5 23:04:17 2003 From: richie at entrian.com (Richie Hindle) Date: Fri Sep 5 17:04:27 2003 Subject: [spambayes-dev] Does this seem too brutal? In-Reply-To: <16216.43207.661537.217810@montanaro.dyndns.org> References: <16216.43207.661537.217810@montanaro.dyndns.org> Message-ID: [Skip] > Here's a crude hack to setup.py which complains if the user tries installing > while the old files are still in place. Does it seem to extreme to error > out of the install or should it print the warnings and continue the install? I think it's extreme if the user has to go and delete the scripts himself, but how about shipping a script that prints a list of all the offending files and offers to delete them? Either setup.py could quit with a message suggesting you run the deletion script, or setup.py itself could give the message and do the deletion. -- Richie Hindle richie@entrian.com From richie at entrian.com Fri Sep 5 23:04:21 2003 From: richie at entrian.com (Richie Hindle) Date: Fri Sep 5 17:04:30 2003 Subject: [spambayes-dev] Automatic configuration In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F130212ADFB@its-xchg4.massey.ac.nz> References: <1ED4ECF91CDED24C8D012BCF2B034F130212ADFB@its-xchg4.massey.ac.nz> Message-ID: [Tony] > I'm sure you've already guessed this, but it would be good to backup your > configuration file (or, in OE's case, the registry), before you do this. If > you blow up the world, don't blame it on me :) A good option would be for the script to create a clone of an existing account with the new settings. Then people could test the cloned account, and if they're happy with it they can either delete their old account or delete the new one and run the script again in "modify" rather than "clone" mode. -- Richie Hindle richie@entrian.com From mhammond at skippinet.com.au Sat Sep 6 09:51:33 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Fri Sep 5 18:51:35 2003 Subject: [spambayes-dev] encountered error on outlook 2000 filter In-Reply-To: Message-ID: <38fc01c37400$42ba6640$f502a8c0@eden> Please see the troubleshooting guide - this is installed, but available online at http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/*checkout*/spambayes/spambaye s/Outlook2000/docs/troubleshooting.html You will find information on how to extract a log, and also the process for opening a bug - please follow that procedure. Thanks, Mark. > -----Original Message----- > From: spambayes-dev-bounces+mhammond=keypoint.com.au@python.org > [mailto:spambayes-dev-bounces+mhammond=keypoint.com.au@python.org]On > Behalf Of Chas. E. Lehnert > Sent: Saturday, 6 September 2003 2:16 AM > To: spambayes-dev@python.org > Subject: [spambayes-dev] encountered error on outlook 2000 filter > > > Dear Spambayes Developers, > > Please refer to the attached word document for pictures of err msgs > described below. > > I use OUtlook 2000, and installed the Spambayes plugin for > it. It has been > working well for about 60 days now. But this am when I > started outlook, I > get error dialog box (the top one in the attachment). I > dismiss the box and > another pops up (the 2nd one in the attachment). > > Restarting outlook/rebooting/etc. does not help. Running > addin.py again did > not help. The FAQ on the spambayes.org site does not address > this problem. > Is there a way I can repair this error on-site? > > cheers > Chas. E. Lehnert > --- > "Most people would sooner die than think; in fact, they do > so." -- Bertrand > Russell, (1872 - 1970), British philosopher, mathematician, and writer > From mhammond at skippinet.com.au Sat Sep 6 10:55:55 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Fri Sep 5 19:55:57 2003 Subject: [spambayes-dev] pop3proxy execution "rules" on Windows In-Reply-To: Message-ID: <3cf801c37409$403b1f50$f502a8c0@eden> [Richie] > With you so far. Looks like you're shaping up to allow > multiple instances > of the server, all running as different users, with > optionally one of them > being a service. Yes, that was my starting position. However, it seems to lead to complex logic for a simple "start proxy" command. It must check if the service is configured for the current user, rather than simply installed for any user. Not brain surgery, but seeming to move into unnecessary complexity. > > * For the sake of simplicity all round, pop3proxy_normal > (ie, the "normal" > > executable, whatever it is) will refuse to start if the > service is already > > running on the current machine, even if it is running as a > different user. > > ...which I don't understand - what makes the "normal" executable so > different from the service, if the service is running as a > user...? The difference is that only one service instance can be running on a machine at one time. However, multiple executables can. So as you say: > way? (I assume only one user can run a Windows service at > once, and if > that's a restriction of Windows then fair enough.) Is exactly the problem. Further, as we are using sockets, there is no way we can determine what user is at the other end of the connection. Hence a single service instance can not service multiple clients (as it won't know what user database to use for the request). > > > I suspect this will not be a problem, > > as people who want to run the service will tend to have > exclusive access to > > the machine. > > Yes, but can they be sure that all other users have logged off? I've > never worked anywhere that implemented hot desking, but I imagine XP's > "fast user switching" capability leads to people being lazy > about logging > out - just hit Standby and then whoever comes to the machine > next will log > on and use it. It's not great for system resource usage, but when you > next come along, all your applications (and services, I > assume) are still > running - very convenient. As I mentioned, I believe users who are interested in the service will have exclusive access to the machine. They will be implicitly choosing to have a single service running for all users but using a single database. If they need to share the machine, then the service is not appropriate - as the service does not know what user is at the other end of the connection. > > (Note the tray icon app could still start in this case, which > > could control the service - just never a proxy) > > [Where "in this case" means "where a service is running as > any user", I > assume] Again, I don't see why you're differentiating > between services > and normal user processes. Say I share a machine with Johnny > Lazy, who > never bothers logging out, and I download Spambayes (possibly > even through > his recommendation) and try to run it, I'll be annoyed if it > says "Someone > else is running the Spambayes service, so you can't run it as a normal > application, even with a different data directory and on > different ports." :) This was part of the meandering reasoning I left out. This makes the startup logic horrible. Assuming the pop3tray program will just "do the right thing", then it must not only check if a service is installed on the current machine, but installed with a specific user name. It all just seemed too hard, for not enough gain. If Johnny truly understands that SpamBayes is a per-user solution and knows that other users may log onto his machine, he would be foolish to run the service. > I'm not saying that your proposed restrictions are outrageously > unacceptable, just that I can't figure out the reasons behind > them. It all boils down to the logic needed in "start proxy" in the tray icon. I don't think it acceptable that we allow mulitple instances of the "same" proxy to be started - which made me start thinking about exactly how to guard against that, while still allowing "different" proxies to be supported in the same machine. I am now of the opinion that "pop3proxy_service" is a dumb idea - it should not be a service at all unless it becomes capable of a single instance serving and storing data for multiple users. Until it is capable of that, it really doesn't qualify as being a resonable service by any measure, other than it is a "background" task. However, having it as a service *is* conventient. Therefore, people who want it to run as a service do so purely from a convenience point of view - for the "background" qualities of services rather than the true "service" qualities. I came to the conclusion that these people are likely to use their machine exclusively. Administrators etc looking to setup the service as a true service, serving multiple users, will find it unauitable so drop back to the "per-user process" option. We need to come up with something that is easily understood by our users. My idea was basically to document: * The SpamBayes proxy is a per user program - therefore, it doesn't run as a service. * People want a service, even though they shouldn't. So we have provided one - but if you use it, you must configure it yourself and the non-service version of the proxy won't work (but the tray-bar icon will - in that case it will be controlling the service rather than running the proxy) Mark. From vanhorn at whidbey.com Fri Sep 5 17:58:08 2003 From: vanhorn at whidbey.com (G. Armour Van Horn) Date: Fri Sep 5 19:58:13 2003 Subject: [spambayes-dev] Nigerian mystery References: <1ED4ECF91CDED24C8D012BCF2B034F13031E50E1@its-xchg4.massey.ac.nz> Message-ID: <3F592310.F7FEA123@whidbey.com> Greetings: Okay, I now have everything cleaned out, and brand new fresh 1.0a5 files installed properly and running. I just got my second Nigerian scam of the day, still rated solidly as Ham. I started looking over the thing and figured that Nigeria, petroleum, and million had to all be pretty spammy terms for me, so I looked at the clues for this message. Unless I'm losing what little mind I have left, those three terms are not listed in the clues. (And Million appears five times, petroleum and Nigeria twice each.) I don't see any sign of errors anywhere. Below I'm pasting first the source and then the clues. I just used the nifty new feature for looking up those words, and to my great surprise, those three terms really aren't all that spammy (or hammy, either) for me. Am I cursed forever because the number of Nigerian spams we discuss here is making this obvious clue very neutral? This is exactly the kind of distinction that has been associated with "Bayesian" analysis back to the original Graham column, that it could pass actual medical messages with penis in the body to a urologist while stopping the larger/longer/harder crap to the same inbox. Van Return-path: Envelope-to: twisted@whidbey.com Delivery-date: Fri, 05 Sep 2003 15:12:29 -0700 Received: from [80.88.142.93] (helo=bba230.com) by mail8.whidbey.net with smtp (Exim 4.22) id 19vOoY-0007oM-Rr for twisted@whidbey.com; Fri, 05 Sep 2003 15:12:23 -0700 From: Alhaji Mantu To: twisted@whidbey.com Reply-To: alhajimantu@primposta.com Subject: IMP Mssg From Ibrahim Mantu Date: Fri, 05 Sep 2003 15:12:23 -0700 MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="4f9a63f0-86df-4e78-9c93-1eb2f0710a6c" Message-Id: X-Spambayes-Classification: ham X-Spambayes-Spam-Probability: 0.002136371286 X-Spambayes-MailId: 1062800158-3 X-Mozilla-Status: 8001 X-Mozilla-Status2: 00000000 X-UIDL: 302cd221204c90d838b973aecad05f5f This is a multi-part message in MIME format --4f9a63f0-86df-4e78-9c93-1eb2f0710a6c Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable Dear Sir, I m ALHAJI IBRAHIM MANTU of the Accounts Department of Petroleum (Special) Trust Fund (PTF), an institution established by the Federal Government of Nigeria to manage the proceeds from the sale of Petroleum products as well as the application of the fu nds in contract execution. My colleagues and I are inclined to contact you in view of a high-level business (deal) that we want to strike with you. It is quite natural to be surprised when somebody you do not know before contacts y ou, especially when it comes to a business of this type. The choice of picking you for the deal is based on satisfactory enquiries and reports that = we gathered about your person. The source of our information shall be disclosed to you in the course of this transaction. The deal stems from an over-valued invoice of a contract executed in our institutio n, PTF for the upgrading of facilities at the Kaduna refineries. The actual contract value was USD$40.5 Million but at the point of approval, it was deliberately inflated to USD$70.8 Million, which was approved as such. The contract has since been executed and the original German contractor, Dresser Gmbh has been paid the actual contract va lue of USD$40.5 Million which is the only money due to him, leaving the over-valued s um of USD$30.3 Million still hanging in our system unclaimed. This is the deal and th is is where we want you to come in. As employees within the system, we have since perfected arrangement through which t he fund (USD$30.3 Million) can be transferred out. However, the only way this can be done successfully is with the collaboration of an oversea partner. It has to be so be cause since it is a foreign denominated contract the destination of the fund must be an oversea account, to avoid any suspicion of any sort. Also we have perfected arrang ements such that the routing of the transfer will be erased from the systems so that there would be no trace of the fund to you or to any of the parties involved, as soon as you confirm the money in your account. However, it is also very important to say that as somebody from a Royal family with integrity and reputation, secrecy and absolute confidentiality is uttermost to me, as I will not want to do anything that would tarnish the name of my family. Thus, the on ly condition under which the deal will be prosecuted is for it to be kept in absolute secrecy and confidentiality. I just hope that you will give me this all-important gua rantee; otherwise there will be no need to start in the first place. If this secrecy can be kept by you, I require you to provide us with the following information to ena ble us obtain necessary approvals and authorizations for the transfer of the fund. 1. Bank Co-ordinates, including the name of the bank, address, acc ount number etc. 2. Name of the company that will serve as beneficiary of the fund. 3. Your private telephone and fax numbers that will be used to fac ilitate the transaction. On my part, I can be reached on the above stated telephone and fax numbers. Since it is a deal, we intend all the parties involved to have commensurate = share of the fun d when it gets to your account. While the sharing pattern cannot be reached now, it i s our hope that it is something that would be finally agreed upon as we get started. For now we propose as follows: 1. The Nigerian group 60% 2 Account owner (Yourself) 30% 3 Contingencies 10% Contingencies in this context are defined as any cost that any of the parties may i ncur in the course of the transaction; viz: Air tickets, accommodation, bills etc. In conclusion, as we go on in the transaction, you will also have to advise us on t he type of investment in your country that we could put our own share into, as we do not intend to bring back our entire share to our country. Your early response will be highly appreciated. ALHAJI IBRAHIM MANTU. --4f9a63f0-86df-4e78-9c93-1eb2f0710a6c-- Clues: *H* 0.996261495821 *S* 0.000534238393173 pattern 0.0201485265933 manage 0.0229719513109 transaction, 0.0365569363139 picking 0.0399569524171 routing 0.0432200568777 quite 0.046792802508 otherwise 0.0486050760563 appreciated. 0.0556630234103 in. 0.063759303312 nds 0.0652173913043 part, 0.0661145138814 context 0.068567797698 him, 0.068567797698 address, 0.0731750361611 follows: 0.0794778320087 executed 0.0914615202634 something 0.0925545733662 upgrading 0.101456860385 invoice 0.101712950517 advise 0.106371763268 involved, 0.119263517746 systems 0.124138758634 into, 0.130523087494 suspicion 0.135448841519 proceeds 0.151621466596 original 0.153425513607 anything 0.15841844752 stems 0.160897818077 which 0.161378244829 source 0.164540127005 done 0.167286446676 gets 0.168556053582 gathered 0.168738289529 say 0.170445641779 since 0.173616466415 account, 0.176023641014 especially 0.180857214817 sort. 0.182077687266 such. 0.182077687266 kept 0.182702051301 german 0.183513408317 there 0.187161104468 serve 0.187355892561 but 0.188737619275 still 0.191289424937 etc. 0.195286220354 used 0.201306565617 bank, 0.20168260521 require 0.20525486008 could 0.206788896097 was 0.218210598988 disclosed 0.218623088379 somebody 0.219713321122 however, 0.224835861339 when 0.226283017645 institution 0.232358689334 cannot 0.232706409742 owner 0.235046208068 condition 0.235857119414 reports 0.23848640643 propose 0.248203662097 very 0.255587393867 before 0.256450309192 well 0.256955153128 number 0.258624804022 inclined 0.258800649492 under 0.260132001439 point 0.26179000574 know 0.264182757639 royal 0.26441042664 comes 0.265526588549 come 0.265951380585 type 0.266736370465 contact 0.268189139491 deliberately 0.26863260471 leaving 0.269522459534 view 0.273203060766 fax 0.276467483306 content-type:text/plain 0.27699392067 where 0.278358122842 course 0.289518412816 numbers 0.29099602902 hanging 0.291672610455 intend 0.291955414849 put 0.2924987419 person. 0.293584753359 way 0.29615128644 place. 0.297939000225 group 0.298026084686 system 0.298535261679 hope 0.29878771185 that 0.299918548882 telephone 0.299967181754 alhaji 0.300715453049 defined 0.300913132367 also 0.306714713032 me, 0.309490381769 content-type:multipart/mixed 0.310109922688 information 0.310894990849 necessary 0.312678180609 stated 0.687798798905 family 0.705146245915 sale 0.724197884822 through 0.729786966843 products 0.735549848691 satisfactory 0.73796808056 perfected 0.748369122939 tickets, 0.748369122939 federal 0.748834766015 cost 0.759447768465 money 0.768861651418 our 0.779984665857 erased 0.795683785481 reached 0.811465044826 header:Received:1 0.843313918565 refineries. 0.844827586207 subject:Mantu 0.844827586207 (yourself) 0.844827586207 (ptf), 0.844827586207 subject:Ibrahim 0.844827586207 mantu 0.844827586207 usd$40.5 0.844827586207 usd$30.3 0.844827586207 transaction; 0.844827586207 ptf 0.844827586207 prosecuted 0.844827586207 oversea 0.844827586207 over-valued 0.844827586207 mantu. 0.844827586207 kaduna 0.844827586207 from:addr:alhajimantu 0.844827586207 ena 0.844827586207 denominated 0.844827586207 acc 0.844827586207 (usd$30.3 0.844827586207 (special) 0.844827586207 (deal) 0.844827586207 viz: 0.844827586207 uttermost 0.844827586207 usd$70.8 0.844827586207 60% 0.852489790959 natural 0.858916407805 approval, 0.873278618217 message-id:@mail8.whidbey.net 0.875148537429 partner. 0.879412338906 bills 0.899483819409 dresser 0.908163265306 contractor, 0.908163265306 approvals 0.911702031437 gua 0.934782608696 -- ---------------------------------------------------------- Sign up now for Quotes of the Day, a handful of quotations on a theme delivered every morning. Enlightenment! Daily, for free! mailto:twisted@whidbey.com?subject=Subscribe_QOTD For web hosting and maintenance, visit Van's home page: http://www.domainvanhorn.com/van/ ---------------------------------------------------------- From T.A.Meyer at massey.ac.nz Sat Sep 6 15:10:30 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Fri Sep 5 22:12:12 2003 Subject: [spambayes-dev] pop3proxy execution "rules" on Windows Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F13031E5942@its-xchg4.massey.ac.nz> > * pop3proxy etc data is stored in the per-user "{Application > Data}\SpamBayesProxy" (ie, next to the "SpamBayes" directory > used by Outlook. It could be "SpamBayes\Proxy", but I see no > good reason to mix the data up. I think we all agreed this > (bar the specific name!) Is there any reason not to just put it in "{Application Data}\SpamBayes" along with the Outlook stuff? > Further, as we are using sockets, there is no way we can > determine what user is at the other end of the connection. > Hence a single service instance can not service multiple clients > (as it won't know what user database to use for the request). Well, we could change things so that it can determine who is at the other end, although this might get too complicated. The user could specify not only the server to proxy, but also the accounts that will be used on that server, and then the proxy could eavesdrop on the authentication and select the appropriate database. For example, I set up the proxy to proxy to pop.ihug.co.nz, and tell it that I'll be using ["ta-meyer", "t-meyer", and "tameyer"], and when a connection is made (through the proxy) to pop.ihug.co.nz, the proxy sees "USER tameyer" and uses my database. When it sees "USER libby", it uses someone else's, or none. Although this adds to the complication of setup, this information could be determined by the autoconfigure script very easily. It could also have a default database that is used when it doesn't recognise the username (which would also allow people to share databases if that was what they wanted). All the rest of the stuff that the two of you have said makes sense (more-or-less ;). I'll leave you to figure it out :p =Tony Meyer From T.A.Meyer at massey.ac.nz Sat Sep 6 15:25:13 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Fri Sep 5 22:25:24 2003 Subject: [spambayes-dev] Nigerian mystery Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F13031E5945@its-xchg4.massey.ac.nz> > Okay, I now have everything cleaned out, and brand new fresh > 1.0a5 files installed properly and running. I just got my > second Nigerian scam of the day, still rated solidly as Ham. I pasted this into a message and sent it to myself and it scored 55%, which isn't bad considering that I would have polluted it with lots of clues about it being from me. > I started looking over the thing and figured that Nigeria, > petroleum, and million had to all be pretty spammy terms for > me, so I looked at the clues for this message. Unless I'm > losing what little mind I have left, those three terms are > not listed in the clues. (And Million appears five times, > petroleum and Nigeria twice each.) The web interface 'show clues' only shows those clues that were used in determining the classification of the message, not all the tokens in the message (there is a request to change this, although I'm not sure how to squeeze in into a single page). In particular, there's a limit to how many tokens are used ("Classifier":"max_discriminators"), and a range of probabilities that aren't used ("Classifier":"minimum_prob_strength"). > I just used the nifty new feature for looking up those words, > and to my great surprise, those three terms really aren't all > that spammy (or hammy, either) for me. If those words are in the 0.4 to 0.6 range, then they're not used, which would explain why they weren't in the clue list. I think the theory goes that you should get *more* of these words in spam than in ham, so they should still be slightly spammy. Or that if you do see them equally, then there should be other words in the message that give it a spam classification. For example, "urgent" has a prob of 0.990798 for me. =Tony Meyer From T.A.Meyer at massey.ac.nz Sat Sep 6 15:28:09 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Fri Sep 5 22:28:17 2003 Subject: [spambayes-dev] Re: [Spambayes] Big changes to cvs Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F13031E5946@its-xchg4.massey.ac.nz> > Tony quickly discovered that a Python module with name > starting with "sb-" can't be used in an import statement from > any other module. Not quickly enough to avoid checking them all in with the wrong name, though... (you'd have thought with all these Python experts on the list, one of them would have pointed out the flaw during the discussions ). > I'm unclear on why it was thought > desirable to uglify all the names, Blame Greg Ward, it was his idea ;) The reason is (quoted from Alex): "the issue is namespace pollution. The gist of the argument is that if we use such generic names, then future people will be barred from using equally generic names... but we have no more right to those names than those other people, so we should not pre-emptively take the names." =Tony Meyer From T.A.Meyer at massey.ac.nz Sat Sep 6 15:38:12 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Fri Sep 5 22:38:27 2003 Subject: [spambayes-dev] Correcting training Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F13031E5948@its-xchg4.massey.ac.nz> > I really wonder what's going on here! Since the very first > tests I ran last year, Nigerian scams have been absolutely > nailed for me. Actually, I've looked more closely at my current results, and I wasn't quite right. Some are nailed for me - 100% (rounded), but others are solidly unsure (50%ish). An example of an unsure is below. It looks like it just has too many clues of each type; although there are almost twice as many spam clues as ham, the ham ones are lower. (The astute will notice that this is with the uni/bigrams classifier change discussed a couple of weeks ago, but I'm pretty sure I got these results without those changes, too). =Tony Meyer Spam Score: 53% (0.529219) word spamprob #ham #spam '*H*' 0.941562 - - '*S*' 1 - - 'problem,' 0.00846406 36 1 'support for' 0.0207407 47 9 'introduced' 0.0230148 16 2 'header:Received:7 header:From:1' 0.023097 148 37 'header:Received:7' 0.0258462 152 43 'resulted' 0.025986 14 2 'the necessary' 0.029065 21 5 'but for' 0.0318429 19 5 'especially' 0.0334112 36 12 '(for' 0.0391162 26 10 'note that' 0.0402315 38 16 'without the' 0.0420951 24 10 'problem' 0.0438073 64 31 'personally,' 0.0440438 6 1 'since most' 0.0440438 6 1 'exchange' 0.0444059 15 6 'machines' 0.0447368 13 5 'similar' 0.0514281 44 25 'land' 0.052521 14 7 'fact,' 0.0529859 33 19 'and some' 0.0535049 20 11 'faced with' 0.0541178 6 2 'fear' 0.0562448 10 5 'governments' 0.0569628 7 3 'necessary' 0.0611149 42 29 'me, and' 0.0614101 4 1 'along with' 0.0633712 47 34 'affected' 0.0646973 6 3 'with them.' 0.0665003 7 4 'development' 0.0680235 30 23 'faced' 0.0693546 9 6 'purpose' 0.0704604 10 7 'action' 0.0712681 32 26 'kept' 0.0718108 19 15 'since' 0.0732255 81 70 'along' 0.0743269 54 47 'president and' 0.0748206 4 2 'skip:d 10 you' 0.0755361 6 4 'highly' 0.0759678 32 28 'the information' 0.0762971 21 18 'acceptance' 0.0766624 3 1 'the nature' 0.0766624 3 1 'and where' 0.0771211 9 7 'properties' 0.0786999 14 12 'let you' 0.0798047 21 19 'and that' 0.0808079 36 34 'before the' 0.0837246 15 14 'george' 0.0848469 9 8 'right now,' 0.0857477 7 6 'should' 0.0858574 154 160 'acceptable' 0.0875871 5 4 'heads' 0.0875871 5 4 'search for' 0.0881218 16 16 'countries' 0.0894356 13 13 'honest' 0.0894382 4 3 'war' 0.0905854 20 21 'seem' 0.0909548 45 49 'mr. robert' 0.908163 0 2 'now because' 0.908163 0 2 'political history.' 0.908163 0 2 'president mr.' 0.908163 0 2 'reform act' 0.908163 0 2 'response and' 0.908163 0 2 'rich white' 0.908163 0 2 'safe keeping,' 0.908163 0 2 'same experience' 0.908163 0 2 'seeking genuine' 0.908163 0 2 'skip:r 10 seeking' 0.908163 0 2 'some few' 0.908163 0 2 'some lunatics' 0.908163 0 2 'stones avoid' 0.908163 0 2 'surprised receive' 0.908163 0 2 'taking everything' 0.908163 0 2 'the eldest' 0.908163 0 2 'the farms' 0.908163 0 2 'the looming' 0.908163 0 2 'the society.' 0.908163 0 2 'transferred without' 0.908163 0 2 'veterans and' 0.908163 0 2 'when zimbabwean' 0.908163 0 2 'while 10%' 0.908163 0 2 'wholly' 0.908163 0 2 'will mapped' 0.908163 0 2 'zimbabwean president' 0.908163 0 2 'are currently' 0.917228 3 373 'finance' 0.930836 1 153 'amount was' 0.934783 0 3 'bent' 0.934783 0 3 'could transferred' 0.934783 0 3 'dilemma' 0.934783 0 3 'few black' 0.934783 0 3 'foreign account.' 0.934783 0 3 'full name' 0.934783 0 3 'header:Mime-version:1' 0.934783 0 3 'header:Return-Path:1 header:Mime-version:1' 0.934783 0 3 'money could' 0.934783 0 3 'money was' 0.934783 0 3 'must let' 0.934783 0 3 'properties and' 0.934783 0 3 'purchase new' 0.934783 0 3 'residing' 0.934783 0 3 'seeking for' 0.934783 0 3 'subject:NEEDED' 0.934783 0 3 'that family' 0.934783 0 3 'this amount' 0.934783 0 3 'through personal' 0.934783 0 3 'reply this' 0.945495 6 1164 'account where' 0.949438 0 4 'for proper' 0.949438 0 4 'fund,' 0.949438 0 4 'had taken' 0.949438 0 4 'have got' 0.949438 0 4 'have lost' 0.949438 0 4 'investing' 0.949438 0 4 'murdered' 0.949438 0 4 'this private' 0.949438 0 4 'was made' 0.949438 0 4 'you accept' 0.949438 0 4 '30% the' 0.958716 0 5 'contact through' 0.958716 0 5 'content-type:text/plain subject:YOUR' 0.958716 0 5 'security company.' 0.958716 0 5 'skip:e 10 new' 0.958716 0 5 'therefore, you' 0.958716 0 5 'trustworthy' 0.958716 0 5 'your reply' 0.958716 0 5 'deposit the' 0.965116 0 6 'states dollars' 0.965116 0 6 'asylum' 0.969799 0 7 'this proposal' 0.969799 0 7 'your country.' 0.969799 0 7 'total sum' 0.973373 0 8 'your assistance' 0.980349 0 11 'family and' 0.981928 0 12 'this skip:r 10' 0.983271 0 13 'invest' 0.984429 0 14 'your full' 0.984429 0 14 'and me,' 0.985437 0 15 'got your' 0.986322 0 16 'risk-free' 0.987106 0 17 'subject: !' 0.98951 0 21 'this transaction' 0.990405 0 23 'urgent' 0.990798 0 24 'amount money' 0.991803 0 27 'and safe' 0.99236 0 29 'confidential' 0.995258 0 47 'deposited' 0.99554 0 50 'this letter' 0.996562 0 65 'this money' 0.996614 0 66 'your business' 0.997057 0 76 Message Stream: Return-Path: Delivered-To: ta-meyer@backend.pop.ihug.co.nz Received: (qmail 9204 invoked from network); 5 Sep 2003 12:57:39 -0000 Received: from grunt1.ihug.co.nz (203.109.254.41) by edmund.ihug.co.nz with SMTP; 5 Sep 2003 12:57:39 -0000 Received: from (genamics.cenancestor.net) [205.214.85.40] by grunt1.ihug.co.nz with esmtp (Exim 3.35 #1 (Debian)) id 19vG9i-0003EE-00; Sat, 06 Sep 2003 00:57:38 +1200 Received: from [208.197.227.8] (helo=mail1.chek.com) by genamics.cenancestor.net with smtp (Exim 4.20) id 19vG9Y-0004nk-HP for tonym@madsods.gen.nz; Sat, 06 Sep 2003 00:57:28 +1200 Received: (qmail 31492 invoked by uid 0); 5 Sep 2003 12:57:33 -0000 Received: from machiavelli.synacor.com (10.10.6.30) by mailrelay2.synacor.com with SMTP; 5 Sep 2003 12:57:33 -0000 Received: (qmail 21318 invoked by uid 99); 5 Sep 2003 12:57:23 -0000 Date: 5 Sep 2003 12:57:23 -0000 Message-ID: <20030905125723.21317.qmail@machiavelli.synacor.com> From: "george p. anderson" To: anderson_family@roanokemail.com X-Originating-IP: [80.179.102.202] Subject: YOUR ATTENTION IS NEEDED ! Mime-version: 1.0 X-MASSMAIL: 1.0 X-MASSMAIL: precedence.bulk Content-Type: multipart/alternative; boundary="=====================_889472414==_" X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - genamics.cenancestor.net X-AntiAbuse: Original Domain - madsods.gen.nz X-AntiAbuse: Originator/Caller UID/GID - [0 0] / [47 12] X-AntiAbuse: Sender Address Domain - roanokemail.com Dear,Sir/Madam. You may be surprised to receive this letter from me,since you do not know me personally, but for purpose of introduction. I am MR. GEORGE PETER ANDERSON, the first son of DR. VANDERS ANDERSON, who was recently murdered in the land disputed in ZIMBABWE. I got your contact through my personal search for a honest and trustworthy individual,who can help my family in receiving this fund, for proper investment and safe keeping, And I decided to write to you, for assistance and business co-operation. Before the death of my father, he had taken me to oversea to deposit the total sum of(US$16,8Million United States Dollars ), with a security and finance company, as if he foresaw the looming danger in Zimbabwe. The money was deposited in a box as gem stones to avoid demurrage from the Security Company. This amount was made for the purchase of new machines and chemicals for the farms and establishment of a new farm in Switzerland. This land problem came when Zimbabwean President MR. ROBERT MUGABE, introduced a new Land Act which wholly affected the rich white farmers and some few black farmers. This resulted in killing and mob action by Zimbabwean war veterans and some lunatics in the society. In fact, more than thirteen(13) people have been killed because of this Land Reform act. Heads of governments from the west especially Britain have condemned Mugabe's new Land Reform Act and for this reason,SADC(Southern African Development Community) heads of states have met and voiced their support for Mr.Mugabe. South African President and the President of Malawi have been sent by SADC to Britain to support Mugabe's new Land Reform Act. It is against this background, that I and my family who are currently staying in europe right now because of this problem, have decided to transfer my father's money to a foreign account. As the eldest son of my father, I am saddled with this responsibility of seeking a genuine account where this money could be transferred without the knowledge of my government who are bent on taking everything we have got and other African Governments seem to be playing along with them. I am faced with the dilemma of investing this amount of money in Africa for fear of going through the same experience in future since most of the countries have lost similar political history. However, the europian union Foreign Exchange Policy where i am right now, does not allow such investment as I am seeking for Asylum. I don"t want to disclose all the information about me, and where i am residing right now for fear of having my asylum request cancelled by the authorities in my host country,but immediately you get back to me, I will give you the necessary information. As a businessman, whom I have to entrust my future and that of my family in his hands, I must let you know that this transaction is risk-free and the nature of your business does not necessarily matter. Therefore, if you accept to assist my family and me, we are willing to offer 30% of the total sum for your assistance needed,60% for me and my family while 10% will be mapped out for any pre-transfer charges or bank charges. I would wish to invest in your country on commercial properties and buying of rents based on your advice or from experts in your country. If this proposal is acceptable by you, please, send me your contacts; your full name and address,telephone and fax numbers. YOU SHOULD NOTE THAT THIS TRANSACTION IS HIGHLY CONFIDENTIAL AND SHOULD BE KEPT WITHIN YOU ALONE. YOUR URGENT RESPONSE AND ACCEPTANCE WILL BE HIGHLY APPRECIATED. PLEASE FORWARD YOUR REPLY TO THIS MY PRIVATE ADDRESS: georgeanderson38@indiatimes.com REGARDS, MR. GEORGE PETER ANDERSON (FOR THE FAMILY) From mhammond at skippinet.com.au Sat Sep 6 19:37:59 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Sat Sep 6 04:38:12 2003 Subject: [spambayes-dev] pop3proxy execution "rules" on Windows In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F13031E5942@its-xchg4.massey.ac.nz> Message-ID: <00aa01c37452$2e340fd0$f502a8c0@eden> > > * pop3proxy etc data is stored in the per-user "{Application > > Data}\SpamBayesProxy" (ie, next to the "SpamBayes" directory > > used by Outlook. It could be "SpamBayes\Proxy", but I see no > > good reason to mix the data up. I think we all agreed this > > (bar the specific name!) > > Is there any reason not to just put it in "{Application > Data}\SpamBayes" > along with the Outlook stuff? Nothing is shared, so I see no reason to use the same directory, and only scope for confusion. a "flaw" in Outlook already is that we use the profile name for a filename - but in a directory with other files. If an Outlook profile was ever called "default_configuration" we would be in trouble An advantage of using the same directory is that our FAQ etc can tell a user they only need to back up a single directory, and they get both/either app. Hence I don't mind the idea of a sub-directory for the proxy. I should have started with "SpamBayes\Outlook" :) > > Further, as we are using sockets, there is no way we can > > determine what user is at the other end of the connection. > > Hence a single service instance can not service multiple clients > > (as it won't know what user database to use for the request). > > Well, we could change things so that it can determine who is at the > other end, although this might get too complicated. The user could > specify not only the server to proxy, but also the accounts > that will be > used on that server, and then the proxy could eavesdrop on the > authentication and select the appropriate database. This wont work due to permissions. The service is running as a different user, and will have no credentials to use to LogonUser() to get access to the user's database. We could solve this by having a "logon to the proxy" request, but that is getting beyond the short term :) Mark. -------------- next part -------------- A non-text attachment was scrubbed... Name: winmail.dat Type: application/ms-tnef Size: 2604 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes-dev/attachments/20030906/23e26009/winmail-0001.bin From tim.one at comcast.net Sun Sep 7 21:59:27 2003 From: tim.one at comcast.net (Tim Peters) Date: Sun Sep 7 20:59:31 2003 Subject: [spambayes-dev] Re: [Spambayes] Big changes to cvs In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F13031E5946@its-xchg4.massey.ac.nz> Message-ID: [Tim] >> I'm unclear on why it was thought desirable to uglify all the names, [Tony] > Blame Greg Ward, it was his idea ;) The reason is (quoted from Alex): > > "the issue is namespace pollution. The gist of the argument is that > if we use such generic names, then future people will be barred from > using equally generic names... but we have no more right to those > names than those other people, so we should not pre-emptively take > the names." Is this some context where it's not allowed to access spambayes scripts from a spambayes directory? Or were Greg/Alex worried about preventing future spambayes developers from using "nice names" because former spambayes developers already used them? No, I don't really want to know the answer . From T.A.Meyer at massey.ac.nz Mon Sep 8 19:12:10 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Mon Sep 8 02:12:32 2003 Subject: [spambayes-dev] gdbm error (was Error running pop3proxy.py) Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F13031E5D98@its-xchg4.massey.ac.nz> [spambayes-dev people: could someone who knows more about the dbm modules confirm that I'm right here?] > There is another error. When I go to the configuration page > and try to set, say, the spamminess cutoff to 0.8 (or change > anything else there), I get the following error. After that, > I need to restart the proxy to get it to work again, although > the changes will have been saved. > > Error: [...] > File > "/usr/lib/python2.2/site-packages/spambayes/dbmstorage.py", > line 23, in > open_gdbm > return gdbm.open(*args) > > error: (11, 'Resource temporarily unavailable') I think this is an actual bug. When you change the options, everything is reloaded with the new settings, including the database. This looks to me like gdbm won't let anyone else (including spambayes) open the db while spambayes has it open, so we need to close it first. We should probably be doing that anyway. I suspect that this doesn't show up using bsddb (and wouldn't using a pickle). =Tony Meyer From T.A.Meyer at massey.ac.nz Mon Sep 8 20:06:53 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Mon Sep 8 03:07:04 2003 Subject: [spambayes-dev] Re: [Spambayes] Big changes to cvs Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F13031E5DA3@its-xchg4.massey.ac.nz> > [Tim] > I'm unclear on why it was thought desirable to uglify all the names, [...later...] > Is this some context where it's not allowed to access > spambayes scripts from a spambayes directory? Or were > Greg/Alex worried about preventing future spambayes > developers from using "nice names" because former spambayes > developers already used them? > > No, I don't really want to know the answer . In case someone else does ;), the context is where people have run "setup.py install" and therefore have had the spambayes scripts put into their python scripts directory, rather than those that run them from a separate spambayes directory. (Installing into a separate directory was discussed, but someone said that it would also have troubles. It all got over my head quickly ;) =Tony Meyer From skip at pobox.com Mon Sep 8 09:24:55 2003 From: skip at pobox.com (Skip Montanaro) Date: Mon Sep 8 09:25:06 2003 Subject: [spambayes-dev] Re: [Spambayes] Big changes to cvs In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F13031E5DA3@its-xchg4.massey.ac.nz> References: <1ED4ECF91CDED24C8D012BCF2B034F13031E5DA3@its-xchg4.massey.ac.nz> Message-ID: <16220.33575.679280.122091@montanaro.dyndns.org> Tony> In case someone else does ;), the context is where people have run Tony> "setup.py install" and therefore have had the spambayes scripts Tony> put into their python scripts directory, rather than those that Tony> run them from a separate spambayes directory. (Installing into a Tony> separate directory was discussed, but someone said that it would Tony> also have troubles. It all got over my head quickly ;) On Unix systems it's not uncommon for packages to install in /usr/local/, with corresponding bin, etc, include, lib directories underneath. The (minor) problem this creates is that people must either reference executables directly or add the directories of interest to PATH. It has the added benefits that people who don't want to use the package aren't bothered with more stuff cluttering their namespace, and assuming you pick reasonable package names name clashes are avoided altogether. If we're concerned about name clashes that approach would be preferable to me. Skip From skip at pobox.com Mon Sep 8 09:29:03 2003 From: skip at pobox.com (Skip Montanaro) Date: Mon Sep 8 09:29:18 2003 Subject: [spambayes-dev] gdbm error (was Error running pop3proxy.py) In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F13031E5D98@its-xchg4.massey.ac.nz> References: <1ED4ECF91CDED24C8D012BCF2B034F13031E5D98@its-xchg4.massey.ac.nz> Message-ID: <16220.33823.743751.29403@montanaro.dyndns.org> >> return gdbm.open(*args) >> >> error: (11, 'Resource temporarily unavailable') Tony> I think this is an actual bug. When you change the options, Tony> everything is reloaded with the new settings, including the Tony> database. This looks to me like gdbm won't let anyone else Tony> (including spambayes) open the db while spambayes has it open, ... I'm not a gdbm user, but that appears to be the case in the two tests I tried from the Python interpreter prompt. Multiple readers are allowed, but if anybody has the file open for write access, nobody else can open it. Skip From tim.one at comcast.net Mon Sep 8 13:09:56 2003 From: tim.one at comcast.net (Tim Peters) Date: Mon Sep 8 12:09:51 2003 Subject: [spambayes-dev] Correcting training In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F13031E5948@its-xchg4.massey.ac.nz> Message-ID: [Tim] >> I really wonder what's going on here! Since the very first >> tests I ran last year, Nigerian scams have been absolutely >> nailed for me. [Tony Meyer] > Actually, I've looked more closely at my current results, and I wasn't > quite right. Some are nailed for me - 100% (rounded), but others are > solidly unsure (50%ish). > > An example of an unsure is below. Heh. This message of yours (the whole thing, including your commentary) was a false positive for me because of the Nigerian Scam content, and was indeed the worst false positive I've ever gotten: Spam Score: 98% (0.98087) '*H*' 0.038259 '*S*' 1 > It looks like it just has too many clues of each type; although there are > almost twice as many spam clues as ham, the ham ones are lower. (The > astute will notice that this is with the uni/bigrams classifier change > discussed a couple of weeks ago, but I'm pretty sure I got these results > without those changes, too). That's apples and oranges. Throwing bigrams into the mix at least doubles the number of distinct features spambayes finds in a message, and it found so many for you that you're suffering a form of the dreaded old "cancellation disease": spambayes found so many strong features in your message that it artificially cut off the clues it looked at to the 150 strongest. That's why your sorted-by-spamprob clue list leaps from the quite hammy 0.09 'seem': > 'seem' 0.0909548 45 49 to the quite spammy 0.91 "mr. robert": > 'mr. robert' 0.908163 0 2 with nothing between them. It's possible that max-clues cutoff should be raised when using a mix of unigrams and bigrams. Or maybe you'd still suffer cancellation disease (== a lot of strong ham clues and a lot of strong spam cluea; chi-combining at least rates msgs like that Unsure instead of (in effect) flipping a coin). From russf at topia.com Mon Sep 8 14:21:59 2003 From: russf at topia.com (Russ Ferriday) Date: Mon Sep 8 16:22:51 2003 Subject: [spambayes-dev] 404 - spambayes-1.0a5.zip not found error at sf and mirrors Message-ID: "The requested URL /sourceforge/spambayes/spambayes-1.0a5.zip was not found on this server." somewhere else I can get it, until this sf problem fixed? --r. From skip at pobox.com Mon Sep 8 17:05:49 2003 From: skip at pobox.com (Skip Montanaro) Date: Mon Sep 8 17:05:55 2003 Subject: [spambayes-dev] 404 - spambayes-1.0a5.zip not found error at sf and mirrors In-Reply-To: References: Message-ID: <16220.61229.54396.941951@montanaro.dyndns.org> Russ> "The requested URL /sourceforge/spambayes/spambayes-1.0a5.zip was Russ> not found on this server." Russ> somewhere else I can get it, until this sf problem fixed? This is the canonical location of the distributed files: https://sourceforge.net/project/showfiles.php?group_id=61702 What URL were you looking at which gave you the error? Skip From russf at topia.com Mon Sep 8 15:27:42 2003 From: russf at topia.com (Russ Ferriday) Date: Mon Sep 8 17:28:42 2003 Subject: [spambayes-dev] 404 - spambayes-1.0a5.zip not found error at sf and mirrors In-Reply-To: <16220.61229.54396.941951@montanaro.dyndns.org> References: Message-ID: <5.2.0.9.0.20030908142304.00abc4b0@mail.topia.com> At 04:05 PM 9/8/2003 -0500, Skip Montanaro wrote: >This is the canonical location of the distributed files: > > https://sourceforge.net/project/showfiles.php?group_id=61702 > >What URL were you looking at which gave you the error? I followed the link above and tried the latest zip and tar.gz . Next page lets me select a mirror. I tried several. All gave 404 - example http://easynews.dl.sourceforge.net/sourceforge/spambayes/spambayes-1.0a5.zip And when I backup to the containing folder, I see the previous versions, so the reast of the path is correct. Seems 1.0a5 has not mirrored correctly. --r ---------- Russ Ferriday Chief Consultant iTec Solutions IT Strategies and Architectures (+1) (805) 748 1552 www.topia.com ---------- -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes-dev/attachments/20030908/82f209b1/attachment.htm From skip at pobox.com Mon Sep 8 17:35:08 2003 From: skip at pobox.com (Skip Montanaro) Date: Mon Sep 8 17:35:08 2003 Subject: [spambayes-dev] 404 - spambayes-1.0a5.zip not found error at sf and mirrors In-Reply-To: <5.2.0.9.0.20030908142304.00abc4b0@mail.topia.com> References: <5.2.0.9.0.20030908142304.00abc4b0@mail.topia.com> Message-ID: <16220.62988.943704.823844@montanaro.dyndns.org> >> https://sourceforge.net/project/showfiles.php?group_id=61702 >> What URL were you looking at which gave you the error? Russ> I followed the link above and tried the latest zip and tar.gz Russ> . Next page lets me select a mirror. I tried several. All gave 404 Russ> - example Russ> http://easynews.dl.sourceforge.net/sourceforge/spambayes/spambayes-1.0a5.zip Russ> And when I backup to the containing folder, I see the previous Russ> versions, so the reast of the path is correct. Seems 1.0a5 has not Russ> mirrored correctly. Okay, this appears to be a SourceForge problem. FWIW, I preselected umn.dl.sourceforge.net (University of Minnesota) as my preferred download source. I've never had a problem with it. It had both the tar.gz and zip versions of 1.0a5 when I tried just prior to sending the message to which you responded. Skip From vanhorn at whidbey.com Mon Sep 8 15:49:22 2003 From: vanhorn at whidbey.com (G. Armour Van Horn) Date: Mon Sep 8 17:49:26 2003 Subject: [spambayes-dev] Re: [Spambayes] how spambayes handles image-only spams References: <792DE28E91F6EA42B4663AE761C41C2AF3D3F4@cliff.bai.org> <16220.57875.543102.433193@montanaro.dyndns.org> Message-ID: <3F5CF962.BD9C9042@whidbey.com> Skip Montanaro wrote (on spambayes list): > Ryan> I'm going to figure out how to add these tokens to a customized > Ryan> parser on my own, and report on the results. I'll see if they help > Ryan> at all. > > Why do you need a customized parser? You'd probably reach your end goal > faster by reading and modifying tokenizer.py. Okay, I'm really green at this, although I occasionally am able to make some tiny changes to Perl scripts if I'm careful. I was thinking that the To: address is probably a really good clue to work with, so I'd like a couple of hints as to where in tokenizer.py I should be looking. Basically, messages that are sent to one of my actual e-mail addresses are mostly ham, messages that are sent to some other address and get to me by bcc are almost always spam, and those that are cc'd to me are probably a mix. Is there a section of the code that could be duplicated and diddled to do what I want to try? Van -- ---------------------------------------------------------- Sign up now for Quotes of the Day, a handful of quotations on a theme delivered every morning. Enlightenment! Daily, for free! mailto:twisted@whidbey.com?subject=Subscribe_QOTD For web hosting and maintenance, visit Van's home page: http://www.domainvanhorn.com/van/ ---------------------------------------------------------- From tim.one at comcast.net Mon Sep 8 18:55:23 2003 From: tim.one at comcast.net (Tim Peters) Date: Mon Sep 8 17:55:19 2003 Subject: [spambayes-dev] Re: [Spambayes] how spambayes handles image-onlyspams In-Reply-To: <3F5CF962.BD9C9042@whidbey.com> Message-ID: [Van] > Okay, I'm really green at this, although I occasionally am able to > make some tiny changes to Perl scripts if I'm careful. I was thinking > that the To: address is probably a really good clue to work with, so > I'd like a couple of hints as to where in tokenizer.py I should be > looking. I recommend you just inculde this line in your .ini file (I'm not sure how you use spambayes, so can't tell you exactly where that is): address_headers: from to cc sender reply-to By default, all of those except the "From:" header are ignored. Including "to" on the line (as shown) will cause the "To:" header to get tokenized too. > Basically, messages that are sent to one of my actual e-mail > addresses are mostly ham, messages that are sent to some other > address and get to me by bcc are almost always spam, and those that > are cc'd to me are probably a mix. spambayes will figure out the truth about cc in your data, and "cc" is in the list above too. From vanhorn at whidbey.com Mon Sep 8 16:08:52 2003 From: vanhorn at whidbey.com (G. Armour Van Horn) Date: Mon Sep 8 18:08:56 2003 Subject: [spambayes-dev] Re: [Spambayes] how spambayes handles image-onlyspams References: Message-ID: <3F5CFDF4.485F207B@whidbey.com> Tim, I'm running pop3proxy.py on two machines, in each case the .ini file is in c:\Python22\Scripts. Here is my modified file: [Categorization] spam_cutoff:0.8 address_headers: from to cc sender reply-to [Storage] persistent_storage_file:hammie.pkl [html_ui] display_to:True allow_remote_connections:192.168.0.* [pop3proxy] listen_ports:1110 no_cache_bulk_ham:True remote_servers:mail.whidbey.com [Headers] include_score:True That doesn't work, because the program claims the addition is an invalid option. I'm guessing it needs to be in a different section? Van Tim Peters wrote: > [Van] > > Okay, I'm really green at this, although I occasionally am able to > > make some tiny changes to Perl scripts if I'm careful. I was thinking > > that the To: address is probably a really good clue to work with, so > > I'd like a couple of hints as to where in tokenizer.py I should be > > looking. > > I recommend you just inculde this line in your .ini file (I'm not sure how > you use spambayes, so can't tell you exactly where that is): > > address_headers: from to cc sender reply-to > > By default, all of those except the "From:" header are ignored. Including > "to" on the line (as shown) will cause the "To:" header to get tokenized > too. > > > Basically, messages that are sent to one of my actual e-mail > > addresses are mostly ham, messages that are sent to some other > > address and get to me by bcc are almost always spam, and those that > > are cc'd to me are probably a mix. > > spambayes will figure out the truth about cc in your data, and "cc" is in > the list above too. -- ---------------------------------------------------------- Sign up now for Quotes of the Day, a handful of quotations on a theme delivered every morning. Enlightenment! Daily, for free! mailto:twisted@whidbey.com?subject=Subscribe_QOTD For web hosting and maintenance, visit Van's home page: http://www.domainvanhorn.com/van/ ---------------------------------------------------------- -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes-dev/attachments/20030908/58d65d60/attachment.htm From russf at topia.com Mon Sep 8 16:08:51 2003 From: russf at topia.com (Russ Ferriday) Date: Mon Sep 8 18:09:42 2003 Subject: [spambayes-dev] 404 - spambayes-1.0a5.zip not found error at sf and mirrors In-Reply-To: <16220.62988.943704.823844@montanaro.dyndns.org> References: <5.2.0.9.0.20030908142304.00abc4b0@mail.topia.com> <5.2.0.9.0.20030908142304.00abc4b0@mail.topia.com> Message-ID: <5.2.0.9.0.20030908150639.064c2788@mail.topia.com> At 04:35 PM 9/8/2003 -0500, Skip Montanaro wrote: >Okay, this appears to be a SourceForge problem. FWIW, I preselected >umn.dl.sourceforge.net (University of Minnesota) as my preferred download >source. I've never had a problem with it. It had both the tar.gz and zip >versions of 1.0a5 when I tried just prior to sending the message to which >you responded. Thanks - tried several including one in AZ, but all offered servers failed. Yes, the UMN one works fine. Thanks. http://umn.dl.sourceforge.net/sourceforge/spambayes/spambayes-1.0a5.zip >Skip ---------- Russ Ferriday Chief Consultant iTec Solutions IT Strategies and Architectures (+1) (805) 748 1552 www.topia.com ---------- -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes-dev/attachments/20030908/3a27cadd/attachment.htm From kennypitt at hotmail.com Mon Sep 8 19:20:06 2003 From: kennypitt at hotmail.com (Kenny Pitt) Date: Mon Sep 8 18:20:26 2003 Subject: [spambayes-dev] Re: [Spambayes] how spambayes handles image-onlyspams In-Reply-To: <3F5CFDF4.485F207B@whidbey.com> Message-ID: <002a01c37657$5d52ddd0$300a10ac@spidynamics.com> I believe the "address_headers" option goes in the "[Tokenizer]" section. -----Original Message----- From: spambayes-dev-bounces@python.org [mailto:spambayes-dev-bounces@python.org] On Behalf Of G. Armour Van Horn Sent: Monday, September 08, 2003 6:09 PM To: Tim Peters Cc: spambayes-dev@python.org Subject: Re: [spambayes-dev] Re: [Spambayes] how spambayes handles image-onlyspams Tim, I'm running pop3proxy.py on two machines, in each case the .ini file is in c:\Python22\Scripts. Here is my modified file: [Categorization] spam_cutoff:0.8 address_headers: from to cc sender reply-to [Storage] persistent_storage_file:hammie.pkl [html_ui] display_to:True allow_remote_connections:192.168.0.* [pop3proxy] listen_ports:1110 no_cache_bulk_ham:True remote_servers:mail.whidbey.com [Headers] include_score:True That doesn't work, because the program claims the addition is an invalid option. I'm guessing it needs to be in a different section? [snip] -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes-dev/attachments/20030908/5562ebe3/attachment.htm From tim.one at comcast.net Mon Sep 8 21:33:41 2003 From: tim.one at comcast.net (Tim Peters) Date: Tue Sep 9 01:02:00 2003 Subject: [spambayes-dev] Re: [Spambayes] how spambayes handles image-onlyspams In-Reply-To: <3F5CFDF4.485F207B@whidbey.com> Message-ID: [G. Armour Van Horn] > I'm running pop3proxy.py on two machines, in each case the .ini file > is in c:\Python22\Scripts. Here is my modified file: > > [Categorization] > spam_cutoff:0.8 > address_headers: from to cc sender reply-to > ... > That doesn't work, because the program claims the addition is an > invalid option. I'm guessing it needs to be in a different section? Yup, make it [Tokenizer] address_headers: from to cc sender reply-to BTW, everyone, I expect it's time to make that the default setting for address_headers. It *is* the default in the Outlook client (and has been for a long time). We looked only at "From" in the early days because some of the others were killer-strong clues for bogus reasons when people experimented on mixed-source ham and spam (for example, every one of ham had a python.org Sender then, and none of my spam did; similarly, huge mounds of my spam had one of Bruce Guenter's honeypot To addresses but none of my ham did; so considering stuff like that made the classifier's job too easy). From ta-meyer at ihug.co.nz Tue Sep 9 15:34:55 2003 From: ta-meyer at ihug.co.nz (Tony Meyer) Date: Tue Sep 9 01:09:11 2003 Subject: [spambayes-dev] Names & 1.0a6 release Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130212AE65@its-xchg4.massey.ac.nz> Is everyone happy with the scripts/sb_* names? I had thought that everyone was (well, with 'sb-'), but I might have misunderstood the messages. Would people rather that we dropped the 'sb_' prefix and installed into a spambayes scripts directory? Or didn't drop the prefix, but still installed into the scripts directory? [This is dependant on the above] Are we ready to go through with the 1.0a6 release? (Remembering that the aim was just to move/rename the scripts and to get rid of the options backwards compat code; unfortunately (?) it also fixes some bugs, particularly a nasty one with the messageinfo db that pop3proxy and imapfilter use). If no-one objects, I'll package this together on Thursday (NZ time). =Tony Meyer From T.A.Meyer at massey.ac.nz Tue Sep 9 14:54:58 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Tue Sep 9 01:23:51 2003 Subject: [spambayes-dev] gdbm error (was Error running pop3proxy.py) Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130332A655@its-xchg4.massey.ac.nz> > >> return gdbm.open(*args) > >> > >> error: (11, 'Resource temporarily unavailable') [Skip] > I'm not a gdbm user, but that appears to be the case in the > two tests I tried from the Python interpreter prompt. > Multiple readers are allowed, but if anybody has the file > open for write access, nobody else can open it. Thanks. If I do this: """ state.bayes.store() del state.bayes """ where state.bayes is one of the *Classifier objects from storage.py, will that definitely free it up? Or do I need to add a close() method to the classifier objects? (which would do nothing for the pickle). =Tony Meyer From T.A.Meyer at massey.ac.nz Tue Sep 9 14:43:33 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Tue Sep 9 01:24:02 2003 Subject: [spambayes-dev] Correcting training Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130332A641@its-xchg4.massey.ac.nz> > Heh. This message of yours (the whole thing, including your > commentary) was a false positive for me because of the Nigerian Scam > content, and was indeed the worst false positive I've ever gotten: > > Spam Score: 98% (0.98087) > > '*H*' 0.038259 > '*S*' 1 :) I wondered if it might cause troubles. Still, 98% isn't that bad - if you were automatically deleting all 100% messages, you'd still be safe. > That's apples and oranges. Throwing bigrams into the mix at least > doubles the number of distinct features spambayes finds in a message, > and it found so many for you that you're suffering a form of the > dreaded old "cancellation disease": [...] > It's possible that max-clues > cutoff should be raised when using a mix of unigrams and bigrams. Or > maybe you'd still suffer cancellation disease That explains a lot. I tried a very quick test with max_discriminators at 300 and got: Spam Score: 60% (0.600497) '*H*' 0.799006 '*S*' 1 But a lot of clues were still missing (apart from the 0.4 to 0.6 ones). So I upped it to 600, and got: Spam Score: 93% (0.931465) '*H*' 0.137069 '*S*' 1 Which is good enough for me. With the unigrams only classifier, it gets: Spam Score: 39% (0.390671) '*H*' 1 '*S*' 0.781341 With the unigrams only classifier, and max_discriminators at 300 it does really badly: Spam Score: 0% (2.90264e-006) '*H*' 1 '*S*' 5.79636e-006 I'll play around with this next time I get a chance to do some testing. I've left my active copy of spambayes using the uni/bigram mix to see how it goes in 'real life'. It's definitely noticably slower - if it was to be included I think I'd have to take a good look at the code I threw together and try and make it a lot more efficient. =Tony Meyer From T.A.Meyer at massey.ac.nz Tue Sep 9 15:30:01 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Tue Sep 9 01:24:07 2003 Subject: [spambayes-dev] Does this seem too brutal? Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130332A68A@its-xchg4.massey.ac.nz> [Skip] > Here's a crude hack to setup.py which complains if the user tries > installing while the old files are still in place. Does it seem to > extreme to error out of the install or should it print the warnings > and continue the install? [Richie] > I think it's extreme if the user has to go and delete the > scripts himself, but how about shipping a script that prints > a list of all the offending files and offers to delete them? > Either setup.py could quit with a message suggesting you run > the deletion script, or setup.py itself could give the > message and do the deletion. I googled about and there was discussion back in 2001 about an uninstall option for distutils, but nothing seems to have eventuated. What about the attached diff? It's basically Skip's patch but also with Richie's suggestion. (And also without the hack part, based on the c.l.p. response to Skip's query). =Tony Meyer -------------- next part -------------- A non-text attachment was scrubbed... Name: setup.py.diff Type: application/octet-stream Size: 3615 bytes Desc: setup.py.diff Url : http://mail.python.org/pipermail/spambayes-dev/attachments/20030909/cfcd626b/setup.py-0001.obj From T.A.Meyer at massey.ac.nz Tue Sep 9 18:45:56 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Tue Sep 9 01:46:17 2003 Subject: [spambayes-dev] Re: [Spambayes] how spambayes handlesimage-onlyspams Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130332A7A0@its-xchg4.massey.ac.nz> > [Tokenizer] > address_headers: from to cc sender reply-to [...] > BTW, everyone, I expect it's time to make that the default > setting for address_headers. +1 here. =Tony Meyer From T.A.Meyer at massey.ac.nz Tue Sep 9 14:00:05 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Tue Sep 9 01:48:42 2003 Subject: [spambayes-dev] Re: [Spambayes] how spambayes handles image-onlyspams Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130332A5FC@its-xchg4.massey.ac.nz> > Here is my modified file: > [Categorization] > spam_cutoff:0.8 > address_headers: from to cc sender reply-to [...] > That doesn't work, because the program claims the addition > is an invalid option. I'm guessing it needs to be in a > different section? Let's see if I can beat Tim to answering . It should be in a section "Tokenizer". =Tony Meyer From T.A.Meyer at massey.ac.nz Tue Sep 9 14:02:36 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Tue Sep 9 01:48:45 2003 Subject: [spambayes-dev] Re: [Spambayes-checkins] website faq.txt, 1.39, 1.40 Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130332A600@its-xchg4.massey.ac.nz> [Richie] > I think we should prevent the thing from starting up if the > configured port (or the default port if none is configured) > is already in use. We should give an error saying that the > port is already in use, and how to configure a different one. > Automatically choosing a different one is too much like > black magic IMHO. This sounds fine to me. Is adding this message a bug fix or a feature? <0.5 wink>. =Tony Meyer From T.A.Meyer at massey.ac.nz Tue Sep 9 13:17:41 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Tue Sep 9 01:48:54 2003 Subject: [spambayes-dev] Re: [Spambayes] how spambayes handles image-onlyspams Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130332A5BD@its-xchg4.massey.ac.nz> > > Why do you need a customized parser? You'd probably reach your end > > goal faster by reading and modifying tokenizer.py. > > Okay, I'm really green at this, although I occasionally am > able to make some tiny changes to Perl scripts if I'm > careful. I was thinking that the To: address is probably a > really good clue to work with, so I'd like a couple of hints > as to where in tokenizer.py I should be looking. If you want to add tokens based on the headers of the message, add something to tokenize_headers() in tokenizer.py. Tokens based on the body, add to tokenize_body(). HTML (etc) stuff, look at the various Stripper() classes. For To: addresses, look at the stuff regarding the "tokenizer":"address_headers" option - line 1151. =Tony Meyer From mhammond at skippinet.com.au Tue Sep 9 17:19:21 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Tue Sep 9 02:19:26 2003 Subject: [spambayes-dev] gdbm error (was Error running pop3proxy.py) In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F130332A655@its-xchg4.massey.ac.nz> Message-ID: <00ce01c3769a$4f9f0a90$f502a8c0@eden> > Thanks. If I do this: > > """ > state.bayes.store() > del state.bayes state.bayes = None is better style IMO. > will that definitely free it up? Nope. It might though > Or do I need to add a close() method > to the classifier objects? (which would do nothing for the pickle). It should grow one, yes. Outlook does similar hacks (and I apologize for not fixing it in the first place). The outlook source shows I do: bayes.db.close() bayes.dbm.close() So I just checked in a close method to storage.py - it works for Outlook (which I also changed to call it). Mark. -------------- next part -------------- A non-text attachment was scrubbed... Name: winmail.dat Type: application/ms-tnef Size: 1884 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes-dev/attachments/20030909/17feb57b/winmail.bin From T.A.Meyer at massey.ac.nz Tue Sep 9 19:28:49 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Tue Sep 9 02:29:01 2003 Subject: [spambayes-dev] gdbm error (was Error running pop3proxy.py) Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130332A7A9@its-xchg4.massey.ac.nz> [Tony] > Thanks. If I do this: > state.bayes.store() > del state.bayes [Mark] > state.bayes = None > > is better style IMO. Stored for future reference (since it's no use here). > > Or do I need to add a close() method > > to the classifier objects? (which would do nothing for the pickle). > > It should grow one, yes. Outlook does similar hacks (and I > apologize for not fixing it in the first place). So you should . [...] > So I just checked in a close method to storage.py - it works > for Outlook (which I also changed to call it). Thanks. I've checked in a change for sb_server.py and sb_imapfilter to do likewise. =Tony Meyer From ta-meyer at ihug.co.nz Tue Sep 9 20:46:19 2003 From: ta-meyer at ihug.co.nz (Tony Meyer) Date: Tue Sep 9 03:46:30 2003 Subject: [spambayes-dev] Procmail documentation Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130212AE6C@its-xchg4.massey.ac.nz> I've updated the README.txt file to reflect the new script names, but I don't know how to finish off the procmail/"command line training" sections. At the moment the example procmail script and the instructions for command line training use hammie.py, but hammie.py is no longer around. I'm guessing that sb_filter.py and sb_mboxtrain.py are the correct replacements, but I don't really know what the correct parameters are. Could someone that uses this setup check in the appropriate changes? Thanks, Tony From T.A.Meyer at massey.ac.nz Tue Sep 9 22:01:22 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Tue Sep 9 05:01:41 2003 Subject: [spambayes-dev] pop3proxy execution "rules" on Windows Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130332A7B8@its-xchg4.massey.ac.nz> > > Is there any reason not to just put it in "{Application > > Data}\SpamBayes" along with the Outlook stuff? > > Nothing is shared, so I see no reason to use the same > directory, and only scope for confusion. I had thought that people might want to share the bayes database and possibly the general config file. Realistically, I suppose, few people will be using both Outlook and pop3proxy, so it's not that important. In that case, I like having a subdirectory of the existing one more than cluttering up the application data folder with another folder for spambayes. > I should have started with "SpamBayes\Outlook" :) True :) Oh well, first in, first served ;) > This wont work due to permissions. The service is running as > a different user, and will have no credentials to use to > LogonUser() to get access to the user's database. We could > solve this by having a "logon to the proxy" request, but that > is getting beyond the short term :) Ok then, you have me convinced ;) We stick with the plan you outlined, unless Richie has more comments. =Tony Meyer From T.A.Meyer at massey.ac.nz Tue Sep 9 22:05:28 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Tue Sep 9 05:05:38 2003 Subject: [spambayes-dev] 1.0a5 Release Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130332A7B9@its-xchg4.massey.ac.nz> > My settings *are* saved when I hit the Save button in either > the options page or the advanced options page, I just don't get the > changeopts page to confirm that it has been done. The only thing I can see causing this is re-reading the options causing a crash. (This would mean that your config file was updated, but that changes weren't made until you restarted the proxy). You're not using gdbm, are you? A bug was just fixed that might cause this to occur. Otherwise, I'm not sure what it is, although it would be good to clarify that what I've said above is the case (to check, add an extra server - doesn't matter if it really exists - and check if the front page says that you are proxying it). =Tony Meyer From sjoerd at acm.org Tue Sep 9 15:38:44 2003 From: sjoerd at acm.org (Sjoerd Mullender) Date: Tue Sep 9 08:38:50 2003 Subject: [spambayes-dev] race conditions in imap filtering? Message-ID: <3F5DC9D4.8010809@acm.org> The other day I started reading my mail from an IMAP server, so I started using sb_imapfilter.py to filter my mail. Looking at the code I have to wonder how safe it is from corrupting my database and my e-mail. The problem is this. There is an option -c to classify mail on the IMAP server, and there is an option -l to loop and classify every . When the filter starts up it opens the database, and it never closes it, and it never re-reads it. If I want to train on some messages I could do that while the imapfilter is still running, but then it won't pick up the new ham and spam count although I suppose it will pick up the new word counts--not everything is in-core, and so the results will be off. If, however, I stop the filter before training, I can only do this by interrupting it, and then there is the chance that it is in the middle of classifying my e-mail. I don't know whether that can corrupt the e-mail (it does make changes such as adding headers). The first problem could be resolved by opening the database at the beginning of each iteration. This doesn't seem too dificult. The second problem is harder, but a solution could be to use locking of the database, so that while training is in progress the filter doesn't classify. Opinions? Do the same problems occur in the pop proxy? From skip at pobox.com Tue Sep 9 13:12:59 2003 From: skip at pobox.com (Skip Montanaro) Date: Tue Sep 9 13:13:13 2003 Subject: [spambayes-dev] sb_filter -n broke? Message-ID: <16222.2587.602773.859095@montanaro.dyndns.org> Given this ini file referred to in BAYESCUSTOMIZE: [Storage] persistent_use_database: False persistent_storage_file: ~/tmp/sa.db this command: sb_filter.py -n produces this traceback: Traceback (most recent call last): File "/Users/skip/local/bin/sb_filter.py", line 186, in ? main() File "/Users/skip/local/bin/sb_filter.py", line 174, in main h.newdb() File "/Users/skip/local/bin/sb_filter.py", line 113, in newdb h = hammie.open(self.dbname, self.usedb, 'n') File "/Users/skip/local/lib/python2.4/site-packages/spambayes/hammie.py", line 259, in open return Hammie(storage.open_storage((filename, mode), useDB)) File "/Users/skip/local/lib/python2.4/site-packages/spambayes/storage.py", line 664, in open_storage return klass(*data_source_name) TypeError: __init__() takes exactly 2 arguments (3 given) Printing out (klass, data_source_name) yields (wrapped): (, ('/Users/skip/tmp/sa.db', 'n')) Modifying the else: branch of the "if useDB" statement to klass = PickledClassifier data_source_name = data_source_name[0:1] solves the problem. I've never used the pickled classifier before though, so I don't know if I was somehow using things wrong or if there is a better fix. Skip From skip at pobox.com Tue Sep 9 13:49:32 2003 From: skip at pobox.com (Skip Montanaro) Date: Tue Sep 9 13:49:48 2003 Subject: [spambayes-dev] Names & 1.0a6 release In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F130212AE65@its-xchg4.massey.ac.nz> References: <1ED4ECF91CDED24C8D012BCF2B034F130212AE65@its-xchg4.massey.ac.nz> Message-ID: <16222.4780.51279.470697@montanaro.dyndns.org> Tony> Is everyone happy with the scripts/sb_* names? I had thought that Tony> everyone was (well, with 'sb-'), but I might have misunderstood Tony> the messages. Would people rather that we dropped the 'sb_' Tony> prefix and installed into a spambayes scripts directory? Or Tony> didn't drop the prefix, but still installed into the scripts Tony> directory? I voted previously for a spambayes directory (e.g. /usr/local/spambayes/...) with the sb_ prefixes going away. I can live with the current formulation though. Tony> [This is dependant on the above] Tony> Are we ready to go through with the 1.0a6 release? I discovered what I thought to be a bug in the storage.py file open code. See my earlier message on the topic though. (Maybe I was simply calling sb_filter.py incorrectly.) Skip From skip at pobox.com Tue Sep 9 13:51:38 2003 From: skip at pobox.com (Skip Montanaro) Date: Tue Sep 9 13:51:51 2003 Subject: [spambayes-dev] Names & 1.0a6 release In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F130212AE65@its-xchg4.massey.ac.nz> References: <1ED4ECF91CDED24C8D012BCF2B034F130212AE65@its-xchg4.massey.ac.nz> Message-ID: <16222.4906.251531.367635@montanaro.dyndns.org> Tony> If no-one objects, I'll package this together on Thursday (NZ time). One other thing. Most of the scripts still refer to themselves internally by their pre-sb_ names. I suspect some of the stuff in contrib and testtools which uses them needs to be changed as well. Skip From skip at pobox.com Tue Sep 9 14:04:42 2003 From: skip at pobox.com (Skip Montanaro) Date: Tue Sep 9 14:04:59 2003 Subject: [spambayes-dev] Does this seem too brutal? In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F130332A68A@its-xchg4.massey.ac.nz> References: <1ED4ECF91CDED24C8D012BCF2B034F130332A68A@its-xchg4.massey.ac.nz> Message-ID: <16222.5690.318004.172906@montanaro.dyndns.org> Tony> What about the attached diff? It's basically Skip's patch but Tony> also with Richie's suggestion. (And also without the hack part, Tony> based on the c.l.p. response to Skip's query). A couple other suggestions: * Realign the indentation of the args in the setup() call. Everything else in the file has four-space indents. The args to setup() have a combination of two-space and eight-space indents. * Make sure the os.remove() calls in .run() don't bomb out in case any old scripts have already been deleted. * ??? Return parent.run(self) in all cases. In your code if there are old scripts but the user chooses not to remove them, parent.run(self) is not called. Attached is a(nother) proposed mod. Skip -------------- next part -------------- A non-text attachment was scrubbed... Name: setup.py.diff Type: application/octet-stream Size: 4566 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes-dev/attachments/20030909/95b89dd2/setup.py.obj From skip at pobox.com Tue Sep 9 14:05:43 2003 From: skip at pobox.com (Skip Montanaro) Date: Tue Sep 9 14:06:04 2003 Subject: [spambayes-dev] Re: [Spambayes-checkins] website faq.txt, 1.39, 1.40 In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F130332A600@its-xchg4.massey.ac.nz> References: <1ED4ECF91CDED24C8D012BCF2B034F130332A600@its-xchg4.massey.ac.nz> Message-ID: <16222.5751.490253.390363@montanaro.dyndns.org> >>>>> "Tony" == Tony Meyer writes: Tony> [Richie] >> I think we should prevent the thing from starting up if the >> configured port (or the default port if none is configured) is >> already in use. We should give an error saying that the port is >> already in use, and how to configure a different one. Automatically >> choosing a different one is too much like black magic IMHO. Fine by me too. S From skip at pobox.com Tue Sep 9 14:31:21 2003 From: skip at pobox.com (Skip Montanaro) Date: Tue Sep 9 14:31:35 2003 Subject: [spambayes-dev] Procmail documentation In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F130212AE6C@its-xchg4.massey.ac.nz> References: <1ED4ECF91CDED24C8D012BCF2B034F130212AE6C@its-xchg4.massey.ac.nz> Message-ID: <16222.7289.623986.86789@montanaro.dyndns.org> Tony> [procmail scoring and command line training] ... Could someone Tony> that uses this setup check in the appropriate changes? Done. Skip From kennypitt at hotmail.com Tue Sep 9 16:02:23 2003 From: kennypitt at hotmail.com (Kenny Pitt) Date: Tue Sep 9 15:02:45 2003 Subject: [spambayes-dev] sb_filter -n broke? In-Reply-To: <16222.2587.602773.859095@montanaro.dyndns.org> Message-ID: <000e01c37704$e872fb90$300a10ac@spidynamics.com> Skip Montanaro wrote: > Given this ini file referred to in BAYESCUSTOMIZE: > > [Storage] > persistent_use_database: False > persistent_storage_file: ~/tmp/sa.db > > this command: > > sb_filter.py -n > > produces this traceback: > [snip (sorry, traceback just wouldn't quote properly in Outlook)] > > Printing out (klass, data_source_name) yields (wrapped): > > (, > ('/Users/skip/tmp/sa.db', 'n')) > > Modifying the else: branch of the "if useDB" statement to > > klass = PickledClassifier > data_source_name = data_source_name[0:1] > > solves the problem. I've never used the pickled classifier before > though, so I don't know if I was somehow using things wrong or if > there is a better fix. > I'm not overly familiar with this bit of code, but I think I can tell what's going on. Somebody correct me if I'm off base. In hammie.py (at line 259 in my slightly out-of-date copy), there is this call to the open_storage function: return Hammie(storage.open_storage((filename, mode), useDB)) This passes the tuple (filename, mode) to open_storage as the data_source_name param. Because you passed a tuple, you reach the following line in storage.py as seen in the traceback: return klass(*data_source_name) This turns the two elements of the tuple into two separate params to the constructor call, which is correct if klass is a DBDictClassifier but not if it is a PickledClassifier since PickledClassifier takes only a single filename parameter. The problem seems to be that the value of the useDB parameter is ignored when testing to see which constructor call should be used, and so breaks your tuple into two parameters even when creating a PickledClassifier. Your solution fixes the problem for the case where a tuple is passed for data_source_name, but breaks if a string is passed for data_source_name. I think if you replace your "data_source_name = data_source_name[0:1]" in the else clause with the following it will work in both cases: if isinstance(data_source_name, type(())): # For PickledClassifier, use only the filename from the tuple. data_source_name = data_source_name[0] -- Kenny Pitt From arohter at nolar.com Tue Sep 9 15:36:56 2003 From: arohter at nolar.com (Alon) Date: Tue Sep 9 15:37:12 2003 Subject: [spambayes-dev] race conditions in imap filtering? In-Reply-To: <3F5DC9D4.8010809@acm.org> References: <3F5DC9D4.8010809@acm.org> Message-ID: <145274625.1063118216@[192.168.0.20]> I use imapfilter to classify and train my mail via cronjobs. I run classify every 15 min, and training once day. To solve the classifying vs. training issue, I wrote a wrapper script that creates a /var/lock/spambayes.lck before execution and removes it when done. That way, if a classification job starts while training is in progress, it'll pause until it sees the spambayes.lck is gone. This works brilliantly on my linux system, but might be a wee bit of trouble for Windows users. Perhaps you should think about incorporating a locking mechanism into sb_imapfilter (or all relevant scripts) itself? aLoN > The other day I started reading my mail from an IMAP server, so I started > using sb_imapfilter.py to filter my mail. Looking at the code I have to > wonder how safe it is from corrupting my database and my e-mail. > > The problem is this. There is an option -c to classify mail on the IMAP > server, and there is an option -l to loop and classify every > . When the filter starts up it opens the database, and it never > closes it, and it never re-reads it. If I want to train on some messages > I could do that while the imapfilter is still running, but then it won't > pick up the new ham and spam count although I suppose it will pick up the > new word counts--not everything is in-core, and so the results will be > off. From skip at pobox.com Tue Sep 9 16:06:27 2003 From: skip at pobox.com (Skip Montanaro) Date: Tue Sep 9 16:06:42 2003 Subject: [spambayes-dev] pop3proxy suggestion Message-ID: <16222.12995.459947.653825@montanaro.dyndns.org> I'm starting to make the move from sb_filter to sb_server+sb_upload. I currently train on unsures, mistakes, and hams or spams which don't score very close to their endpoints. It would be helpful to have the message score displayed in the review pages so I can decide whether or not to score a message. I'd take a look at that myself, however I have no idea where that bit of code is. A quick glance in a couple candidate files didn't reveal its location. Pointers appreciated. Skip From matt at mondoinfo.com Tue Sep 9 16:07:35 2003 From: matt at mondoinfo.com (Matthew Dixon Cowles) Date: Tue Sep 9 16:08:27 2003 Subject: [spambayes-dev] Re: [Spambayes] how spambayes handles image-onlyspams In-Reply-To: References: <3F5CFDF4.485F207B@whidbey.com> Message-ID: <1063136297.05.541@sake.mondoinfo.com> [Tim] > BTW, everyone, I expect it's time to make that the default setting > for address_headers. It *is* the default in the Outlook client > (and has been for a long time). We looked only at "From" in the > early days because some of the others were killer-strong clues for > bogus reasons I expect that you're right about that. But I'd like to suggest that whoever implements the change make it pretty obvious in the documentation that that's the case. My reasoning is that I read various postmaster and webmaster addresses. Spammers love to spam those addresses and they'd quickly become strong spam clues. But it's important that I see what little legitimate mail is sent to them. Regards, Matt From skip at pobox.com Tue Sep 9 16:12:49 2003 From: skip at pobox.com (Skip Montanaro) Date: Tue Sep 9 16:13:01 2003 Subject: [spambayes-dev] race conditions in imap filtering? In-Reply-To: <145274625.1063118216@[192.168.0.20]> References: <3F5DC9D4.8010809@acm.org> <145274625.1063118216@[192.168.0.20]> Message-ID: <16222.13377.808466.856730@montanaro.dyndns.org> aLoN> To solve the classifying vs. training issue, I wrote a wrapper aLoN> script that creates a /var/lock/spambayes.lck before execution and aLoN> removes it when done. Creating such a file in a naive fashion creates a race condition. You might never get bitten by it, but any such code which goes into SpamBayes would have to use the proper mechanisms to create that file, otherwise someone will eventually run into it. If you create it with the low-level os.open() function and the proper flags (some combination of O_EXCL, O_WRONLY and O_EXCLOCK) you should be good to go on Unix, Windows or Mac OS X. Skip From tim.one at comcast.net Tue Sep 9 18:23:24 2003 From: tim.one at comcast.net (Tim Peters) Date: Tue Sep 9 17:23:34 2003 Subject: [spambayes-dev] Re: [Spambayes] how spambayes handlesimage-onlyspams In-Reply-To: <1063136297.05.541@sake.mondoinfo.com> Message-ID: [Tim] > BTW, everyone, I expect it's time to make that the default setting > for address_headers. It *is* the default in the Outlook client > (and has been for a long time). We looked only at "From" in the > early days because some of the others were killer-strong clues for > bogus reasons [Matthew Dixon Cowles] > I expect that you're right about that. But I'd like to suggest that > whoever implements the change make it pretty obvious in the > documentation that that's the case. That's a reasonable request. > My reasoning is that I read various postmaster and webmaster addresses. > Spammers love to spam those addresses and they'd quickly become strong > spam clues. But it's important that I see what little legitimate mail > is sent to them. Indeed, things like webmaster@python.org and help@python.org *are* strong spam/worm/virus clues in my databases now, especially after the recent Sobig.F blitz. Happily it doesn't matter that much, as all clues have the same weight; this is the flip side of complaints that spambayes doesn't instantly treat all email to me@my.org as ham after training on one of 'em. Training will sort it out over time. For now, I think of my Unsure folder as my webmaster/python-help folder <0.6 wink>. From arohter at nolar.com Tue Sep 9 18:14:29 2003 From: arohter at nolar.com (Alon) Date: Tue Sep 9 18:14:38 2003 Subject: [spambayes-dev] race conditions in imap filtering? In-Reply-To: <16222.13377.808466.856730@montanaro.dyndns.org> References: <3F5DC9D4.8010809@acm.org> <145274625.1063118216@[192.168.0.20]> <16222.13377.808466.856730@montanaro.dyndns.org> Message-ID: <154742265.1063127669@[192.168.0.20]> Agreed. I only created the wrapper script after noticing two instances of imapfilter running late one night. It's a hack, but with proper cron timing it works. But this is much better done within spambayes itself, with more fine-grain control, so users don't accidentally corrupt their data. Opening an exclusive lock on a file when running should do the trick nicely. aLoN > wrapper script that creates a /var/lock/spambayes.lck before execution > > Creating such a file in a naive fashion creates a race condition. You > might never get bitten by it, but any such code which goes into SpamBayes > would have to use the proper mechanisms to create that file, otherwise > someone will eventually run into it. If you create it with the low-level > os.open() function and the proper flags (some combination of O_EXCL, > O_WRONLY and O_EXCLOCK) you should be good to go on Unix, Windows or Mac > OS X. > > Skip > From richie at entrian.com Wed Sep 10 00:36:43 2003 From: richie at entrian.com (Richie Hindle) Date: Tue Sep 9 18:36:46 2003 Subject: [spambayes-dev] Names & 1.0a6 release In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F130212AE65@its-xchg4.massey.ac.nz> References: <1ED4ECF91CDED24C8D012BCF2B034F130212AE65@its-xchg4.massey.ac.nz> Message-ID: <9vkslvcqjqmkkiqq62ahte22cpt4a6posm@4ax.com> [Tony] > Is everyone happy with the scripts/sb_* names? I had thought that everyone > was (well, with 'sb-'), but I might have misunderstood the messages. Would > people rather that we dropped the 'sb_' prefix and installed into a > spambayes scripts directory? Or didn't drop the prefix, but still installed > into the scripts directory? I don't feel strongly either way, but if it came to a vote I'd vote for using the sb_ prefix and adding the scripts to a standard location on the PATH. I find it a pain in the bum to have to specify /usr/local/apache (or whatever it's called - I have to find it every time) to do stuff with Apache (for example). I know I could add it to my PATH, but having a big unwieldy PATH is a pain in the bum as well. [FWIW, I preferred sb- as a prefix, and we could have used it by stripping the scripts down to an "import spambayes.xxx; spambayes.xxx.main()", but I don't feel strongly about that either. Get off the fence, Richie!] -- Richie Hindle richie@entrian.com From romain.guy at jext.org Wed Sep 10 02:43:45 2003 From: romain.guy at jext.org (Romain GUY) Date: Tue Sep 9 19:48:14 2003 Subject: [spambayes-dev] Web site Message-ID: <200391014345.340277@Thinthalion> Hello everyone, Who is in charge of the web site please ? -- Romain GUY romain.guy@jext.org http://www.jext.org http://progx.jext.org From T.A.Meyer at massey.ac.nz Wed Sep 10 12:49:54 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Tue Sep 9 19:50:10 2003 Subject: [spambayes-dev] Web site Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130332A971@its-xchg4.massey.ac.nz> > Who is in charge of the web site please ? The people listed here: . (The website is in cvs just like anything else, so any developer can make changes. Sweeping changes would, one would expect, be discussed on spambayes-dev, just like sweeping code changes). =Tony Meyer From T.A.Meyer at massey.ac.nz Wed Sep 10 12:57:20 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Tue Sep 9 19:57:41 2003 Subject: [spambayes-dev] Names & 1.0a6 release Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130332A97B@its-xchg4.massey.ac.nz> > I don't feel strongly either way, but if it came to a vote > I'd vote for using the sb_ prefix and adding the scripts to a > standard location on the PATH. I think it was only Skip that (owned up to) leaning towards the separate directory, and he's said he's not that bothered, so I suppose we go with this. > [FWIW, I preferred sb- as a prefix, and we could have used it > by stripping the scripts down to an "import spambayes.xxx; > spambayes.xxx.main()", but I don't feel strongly about that > either. Get off the fence, Richie!] It was my bumbling that caused this. I should have tested everything properly with the sb- prefix before I started making changes to cvs, and then I could have posted a message to the list asking which we preferred. But I didn't even consider that there might be a problem with the new names, and so checked in the changes. Once I did find the problem, I'd already removed the old ones, so I was left with either un-removing them, or making the switch. I also like the sb- prefix more, but stripping down the scripts was way more work. It would have had the additional advantage of the majority of the code being precompiled, though. Given that my initial checkin has cluttered up the scripts attic already, it wouldn't be that big a deal to make the change back to sb-, although we'd have to add a whole heap of scripts containing the bulk of the code (to the spambayes directory?). Would people prefer this? (Couldn't we just fix Python to allow the sb- prefix ?) =Tony Meyer From T.A.Meyer at massey.ac.nz Wed Sep 10 13:06:04 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Tue Sep 9 20:06:18 2003 Subject: [spambayes-dev] pop3proxy suggestion Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130332A996@its-xchg4.massey.ac.nz> > I'm starting to make the move from sb_filter to > sb_server+sb_upload. Any particular reason why? (Just curious.) > It would be helpful to have the message score > displayed in the review pages so I can decide whether or not > to score a message. I'd take a look at that myself, however > I have no idea where that bit of code is. A quick glance in > a couple candidate files didn't reveal its location. In ProxyUI.py, in the _appendMessages() function. For example, line 268 is: """ row.classify.href="showclues?key=%s&subject=%s" % (key, subj) """ This modifies the "Show Clues" link at the end of each row to point to that particular message. You may also need to alter the resources/ui.html file to make a space for it. Where are you thinking of putting the score? In a column of it's own? After the 'show clues' link? If you describe what you want, I can add it for you easily enough. =Tony Meyer From T.A.Meyer at massey.ac.nz Wed Sep 10 14:11:20 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Tue Sep 9 21:11:59 2003 Subject: [spambayes-dev] FW: Web Site Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130332AA01@its-xchg4.massey.ac.nz> [Romain Guy] > In fact, I was thinking to propose to do a new design to make > a more attractive web site. I've made several web sites already > (http://www.jext.org or http://progx.jext.org are some examples) > and came to create a lite "web site engine" (which is behind the > two URLs I quoted). > This PHP engine is small, offers standard features (news, > administrators and picture galleries) and is based upon XML/XSLT > for many things. SourceForge hosting does not allow the use of > XML/XSLT but there is a simple workaround : just visit the web > site on your localhost and every page will be generated in a cache. > You then simply have to upload the HTML files. That's what I've done for www.jext.org. > This allows to easily update the web site by modifying simple and > clear XML files rather than overloaded HTML docs. Thus, the menu, > the downloads page, etc. on progx.jext.org are XML generated. The > capacities of the engine would be downscaled to match SpamBayes > needs (no photo galleries I guess :-). > Using this lite engine would require a quite small amount of work : > only a new design (basically a new CSS and a few new pictures - > latest versions of the engines offer a skinning capacity). It could > be done in one day. I invite you to crawl on progx.jext.org a bit. > The source code of the web site is also accessible through a CVS server > if you want to look at it. > Thanks, > P.S : please feel free to forward this to any person which could be interested in this [Josh Gass] [plus an earlier email offering to help out with web design] > If you are going for a really simple more notepad style approach > for the site then the design you have now is great. I just thought > that it was more there just to be up. I'm completely on the fence with this one, but I imagine that other developers aren't, so a discussion on spambayes-dev is the best idea. Do people: (a) like the current design (b) want a new design, but not radically different (c) want a radically different design? I would suggest, though, that anyone interested in doing a redesign take a look at the existing tools. The website isn't done with html, it's done with .ht files (plus a text file for the faq), and converted to html via ht2html. I would imagine that a Python program would be preferable to a PHP one. =Tony Meyer From papaDoc at videotron.ca Tue Sep 9 22:24:20 2003 From: papaDoc at videotron.ca (Remi Ricard) Date: Tue Sep 9 21:22:50 2003 Subject: [spambayes-dev] Names & 1.0a6 release In-Reply-To: <9vkslvcqjqmkkiqq62ahte22cpt4a6posm@4ax.com> References: <1ED4ECF91CDED24C8D012BCF2B034F130212AE65@its-xchg4.massey.ac.nz> <9vkslvcqjqmkkiqq62ahte22cpt4a6posm@4ax.com> Message-ID: <1063157059.4531.9.camel@porsche.hq> Hi, > I don't feel strongly either way, but if it came to a vote I'd vote for > using the sb_ prefix and adding the scripts to a standard location on the > PATH. I find it a pain in the bum to have to specify /usr/local/apache > (or whatever it's called - I have to find it every time) to do stuff with > Apache (for example). I know I could add it to my PATH, but having a big > unwieldy PATH is a pain in the bum as well. What can be done under unix (Linux) is to have everything in /usr/local/spambayes and then soft link in /usr/local/bin for all the executables. Whith this your PATH stay under 2048 caracteres and it is easy to remove all components of spambayes from /usr/local/bin (see below). cd /usr/local/spambayes; foreach i (*) rm -f /usr/bin/$i end; rm -rf /usr/local/spambayes Now everything is clean..... Remi Ricard From barry at python.org Wed Sep 10 02:35:13 2003 From: barry at python.org (Barry Warsaw) Date: Tue Sep 9 21:35:14 2003 Subject: [spambayes-dev] Names & 1.0a6 release In-Reply-To: <9vkslvcqjqmkkiqq62ahte22cpt4a6posm@4ax.com> References: <1ED4ECF91CDED24C8D012BCF2B034F130212AE65@its-xchg4.massey.ac.nz> <9vkslvcqjqmkkiqq62ahte22cpt4a6posm@4ax.com> Message-ID: <1063157707.16781.149.camel@anthem> On Tue, 2003-09-09 at 18:36, Richie Hindle wrote: > I don't feel strongly either way, but if it came to a vote I'd vote for > using the sb_ prefix and adding the scripts to a standard location on the > PATH. I find it a pain in the bum to have to specify /usr/local/apache > (or whatever it's called - I have to find it every time) to do stuff with > Apache (for example). I know I could add it to my PATH, but having a big > unwieldy PATH is a pain in the bum as well. I sympathize! Welcome to Unix. :) This brings back memories of the debates we had over 10 years ago when we were designing the Depot. Good to see how far we've come in all that time . -Barry From T.A.Meyer at massey.ac.nz Wed Sep 10 17:12:56 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Wed Sep 10 00:13:09 2003 Subject: [spambayes-dev] sb_filter -n broke? Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130332AB05@its-xchg4.massey.ac.nz> [Has anyone else noticed that viewcvs (diffs) seems to be broken on sf.net?] > this command: > > sb_filter.py -n > > produces this traceback: [...] > return klass(*data_source_name) > TypeError: __init__() takes exactly 2 arguments (3 given) [...] > Modifying the else: branch of the "if useDB" statement to > > klass = PickledClassifier > data_source_name = data_source_name[0:1] > > solves the problem. I've never used the pickled classifier > before though, so I don't know if I was somehow using things > wrong or if there is a better fix. This is the third time that's been fixed ;) The above fix won't work outside of hammie, though, because hammie's the only one that passes data_source_name as a tuple of (name, mode) - the others pass it just as name. The whole open_storage function was getting a bit messy from all the fixes, so I've tidied it up a bit, and fixed this. Could you confirm that it works for you? =Tony Meyer From T.A.Meyer at massey.ac.nz Wed Sep 10 17:18:47 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Wed Sep 10 00:19:16 2003 Subject: [spambayes-dev] Does this seem too brutal? Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130332AB10@its-xchg4.massey.ac.nz> > A couple other suggestions: > > * Realign the indentation of the args in the setup() call. :) I did this too, but did the diff ignoring whitespace changes. So +1. > * Make sure the os.remove() calls in .run() don't bomb out > in case any old scripts have already been deleted. Good point. > * ??? Return parent.run(self) in all cases. In your code if > there are old scripts but the user chooses not to remove them, > parent.run(self) is not called. Sounds sensible. > Attached is a(nother) proposed mod. This looks good to me. Anyone have any objections to this version being checked in? =Tony Meyer From T.A.Meyer at massey.ac.nz Wed Sep 10 17:31:14 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Wed Sep 10 00:31:27 2003 Subject: [spambayes-dev] Names & 1.0a6 release Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130332AB29@its-xchg4.massey.ac.nz> > One other thing. Most of the scripts still refer to > themselves internally by their pre-sb_ names. I suspect some > of the stuff in contrib and testtools which uses them needs > to be changed as well. I had a (admittedly quick) look and all seems to be ok. I've also run most of the testtools scripts, and they still work. I've updated a few names in comments/docstrings that were wrong, too. =Tony Meyer From anthony at interlink.com.au Wed Sep 10 15:47:53 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Wed Sep 10 00:48:18 2003 Subject: [spambayes-dev] Web site In-Reply-To: <200391014345.340277@Thinthalion> Message-ID: <200309100447.h8A4lrJY030387@localhost.localdomain> >>> Romain GUY wrote > Hello everyone, > > Who is in charge of the web site please ? Pretty much the developer team, although I did the initial pile of work on it. I'd also appreciate it if people didn't just hack away at the background page without running it past me first - there's a whole lot of work that's gone into that one... -- Anthony Baxter It's never too late to have a happy childhood. From anthony at interlink.com.au Wed Sep 10 15:54:52 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Wed Sep 10 00:55:30 2003 Subject: [spambayes-dev] FW: Web Site In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F130332AA01@its-xchg4.massey.ac.nz> Message-ID: <200309100454.h8A4sqde030462@localhost.localdomain> >>> "Meyer, Tony" wrote > > In fact, I was thinking to propose to do a new design to make > > a more attractive web site. I've made several web sites already > > (http://www.jext.org or http://progx.jext.org are some examples) > > and came to create a lite "web site engine" (which is behind the > > two URLs I quoted). > I'm completely on the fence with this one, but I imagine that other > developers aren't, so a discussion on spambayes-dev is the best idea. > Do people: > (a) like the current design > (b) want a new design, but not radically different > (c) want a radically different design? > > I would suggest, though, that anyone interested in doing a redesign take > a look at the existing tools. The website isn't done with html, it's > done with .ht files (plus a text file for the faq), and converted to > html via ht2html. I would imagine that a Python program would be > preferable to a PHP one. I'm quite happy for someone to spend the time making the website look pretty, but I'm completely unimpressed with the idea of using PHP. We have bits of it at work (along with large amounts of python) and it's a pig to maintain. All the pages are generated using ht2html, and this is easily tweakable and changeable to make the site "more pretty". I don't see much point to making the site have an all-singing, all-dancing web-accessible frontend for updating it - it's not like there's that many things that need to be updated on a regular basis. Anthony -- Anthony Baxter It's never too late to have a happy childhood. From kennypitt at hotmail.com Wed Sep 10 10:38:42 2003 From: kennypitt at hotmail.com (Kenny Pitt) Date: Wed Sep 10 09:39:08 2003 Subject: [spambayes-dev] FW: Web Site In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F130332AA01@its-xchg4.massey.ac.nz> Message-ID: <002101c377a0$db976270$300a10ac@spidynamics.com> Meyer, Tony wrote: >[snip] > > I would suggest, though, that anyone interested in doing a redesign > take a look at the existing tools. The website isn't done with html, > it's done with .ht files (plus a text file for the faq), and > converted to html via ht2html. I would imagine that a Python program > would be preferable to a PHP one. +1 to using Python tools, this being a Python project after all. I'm not opposed to a little beautification of the SpamBayes sight design, but I would be -1 on anything over-blown. I usually lean toward clean and simple because it gets the job done without making site maintenance too big a hassle. The site design of the ht2html site itself might not be a bad starting point, with some color and logo changes appropriate to SpamBayes. From skip at pobox.com Wed Sep 10 09:51:59 2003 From: skip at pobox.com (Skip Montanaro) Date: Wed Sep 10 09:52:12 2003 Subject: [spambayes-dev] pop3proxy suggestion In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F130332A996@its-xchg4.massey.ac.nz> References: <1ED4ECF91CDED24C8D012BCF2B034F130332A996@its-xchg4.massey.ac.nz> Message-ID: <16223.11391.521632.905968@montanaro.dyndns.org> >> I'm starting to make the move from sb_filter to sb_server+sb_upload. Tony> Any particular reason why? (Just curious.) Mostly so I can run a configuration more people use. >> It would be helpful to have the message score displayed in the review >> pages so I can decide whether or not to score a message. Tony> In ProxyUI.py, in the _appendMessages() function. For example, Tony> line 268 is: ... Thanks for the pointer. Tony> Where are you thinking of putting the score? In a column of it's Tony> own? After the 'show clues' link? If you describe what you want, Tony> I can add it for you easily enough. I think it belongs in its own column. One reason for separating it out is that eventually it would be nice to allow users to sort their review page by From, Subject or score. (A few more changes to pop3proxy and you'll have a full-blown mail reader!) Skip From skip at pobox.com Wed Sep 10 09:58:57 2003 From: skip at pobox.com (Skip Montanaro) Date: Wed Sep 10 09:59:25 2003 Subject: [spambayes-dev] FW: Web Site In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F130332AA01@its-xchg4.massey.ac.nz> References: <1ED4ECF91CDED24C8D012BCF2B034F130332AA01@its-xchg4.massey.ac.nz> Message-ID: <16223.11809.909587.568039@montanaro.dyndns.org> Tony> I'm completely on the fence with this one, but I imagine that Tony> other developers aren't, so a discussion on spambayes-dev is the Tony> best idea. Do people: Tony> (a) like the current design Tony> (b) want a new design, but not radically different Tony> (c) want a radically different design? I think it would be fine it we tracked the Python site's design with slight modifications. Note that the ht2html used for the Python site now groks ReST, so many of the pages are no longer written in HTML directly. Take a look at http://www.python.org/idle/index.ht http://www.python.org/doc/faq/general.ht for a couple examples. Tony> I would suggest, though, that anyone interested in doing a Tony> redesign take a look at the existing tools. The website isn't Tony> done with html, it's done with .ht files (plus a text file for the Tony> faq), and converted to html via ht2html. I would imagine that a Tony> Python program would be preferable to a PHP one. You got that right. I suspect to test out a PHP design locally would require me to install PHP and reconfigure my Apache. It would be different if I used PHP for anything else, but I don't. Skip From skip at pobox.com Wed Sep 10 10:05:18 2003 From: skip at pobox.com (Skip Montanaro) Date: Wed Sep 10 10:05:47 2003 Subject: [spambayes-dev] sb_filter -n broke? In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F130332AB05@its-xchg4.massey.ac.nz> References: <1ED4ECF91CDED24C8D012BCF2B034F130332AB05@its-xchg4.massey.ac.nz> Message-ID: <16223.12190.841595.23916@montanaro.dyndns.org> Tony> The whole open_storage function was getting a bit messy from all Tony> the fixes, so I've tidied it up a bit, and fixed this. Could you Tony> confirm that it works for you? Nope: % cat sa.opt [Storage] persistent_use_database: False persistent_storage_file: ~/tmp/sa.db % BAYESCUSTOMIZE=sa.opt sb_filter.py -n Traceback (most recent call last): File "/Users/skip/local/bin/sb_filter.py", line 186, in ? main() File "/Users/skip/local/bin/sb_filter.py", line 174, in main h.newdb() File "/Users/skip/local/bin/sb_filter.py", line 113, in newdb h = hammie.open(self.dbname, self.usedb, 'n') File "/Users/skip/local/lib/python2.4/site-packages/spambayes/hammie.py", line 259, in open return Hammie(storage.open_storage(filename, useDB, mode)) File "/Users/skip/local/lib/python2.4/site-packages/spambayes/storage.py", line 663, in open_storage return klass(data_source_name, mode) TypeError: __init__() takes exactly 2 arguments (3 given) I checked to make sure I hadn't forgotten to delete my changes: % cvs up -dP . 2>/dev/null ? INTEGRATION.html ? _pop3proxyham.mbox ? base.ini ? base.txt ? bases.txt ? chkhdrs.sh ? command.log ? failures ? ham.report ? hamhour.plt ? hf.pickle ? hf.py ? hour.png ? mboxtrain.diff ? pop3proxy-ham-cache ? pop3proxy-spam-cache ? pop3proxy-unknown-cache ? sb.diff ? contrib/hammiefilter.zip ? contrib/mkzip.py ? spambayes/PickleRPC.py ? spambayes/bytes-words.diff ? spambayes/psets.py ? spambayes/storage.py.save M setup.py M spambayes/hammiebulk.py The hammiebulk changes just print the message count once every 10 messages. The setup.py changes are for the warnings about old installed scripts, so I don't think either one should affect this command. Skip From skip at pobox.com Wed Sep 10 10:11:18 2003 From: skip at pobox.com (Skip Montanaro) Date: Wed Sep 10 10:11:34 2003 Subject: [spambayes-dev] sb_* in config files? Message-ID: <16223.12550.849888.147657@montanaro.dyndns.org> Should the (for example) pop3proxy section of the ini/config file be changed to sb_server? Ditto for other changed scripts who happen to have their own section. Skip From kennypitt at hotmail.com Wed Sep 10 11:13:42 2003 From: kennypitt at hotmail.com (Kenny Pitt) Date: Wed Sep 10 10:14:10 2003 Subject: [spambayes-dev] sb_filter -n broke? In-Reply-To: <16223.12190.841595.23916@montanaro.dyndns.org> Message-ID: <002901c377a5$bed6c040$300a10ac@spidynamics.com> Skip Montanaro wrote: > Tony> The whole open_storage function was getting a bit messy from all > Tony> the fixes, so I've tidied it up a bit, and fixed this. Could you > Tony> confirm that it works for you? > > Nope: > [snip] I think the fatal flaw is in the "if mode is not None" check. Hammie.py always passes a mode value whether useDB is true or false, so the condition will always be true. The appropriate fix here may be to change __init__ for PickledClassifier to accept the mode parameter for consistency, and then just not use it. [From Tony Meyer in storage.py] *************** *** 661,666 **** klass = PickledClassifier try: ! if isinstance(data_source_name, type(())): ! return klass(*data_source_name) return klass(data_source_name) except dbmstorage.error, e: --- 660,665 ---- klass = PickledClassifier try: ! if mode is not None: ! return klass(data_source_name, mode) return klass(data_source_name) except dbmstorage.error, e: From skip at pobox.com Wed Sep 10 10:28:41 2003 From: skip at pobox.com (Skip Montanaro) Date: Wed Sep 10 10:28:55 2003 Subject: [spambayes-dev] sb_filter -n broke? In-Reply-To: <16223.12190.841595.23916@montanaro.dyndns.org> References: <1ED4ECF91CDED24C8D012BCF2B034F130332AB05@its-xchg4.massey.ac.nz> <16223.12190.841595.23916@montanaro.dyndns.org> Message-ID: <16223.13593.84364.826506@montanaro.dyndns.org> Tony> The whole open_storage function was getting a bit messy from all Tony> the fixes, so I've tidied it up a bit, and fixed this. Could you Tony> confirm that it works for you? Skip> Nope: ... I checked in a change to storage.py. (I needed to get SB working again so I could read my mail. ;-) I'm not entirely satisfied with it though. The alternative (also not entirely satisfying) seems to force all classifier classes to have the same signature (data_source_name, mode='c') even though only DBDictClassifier uses the mode arg. Feel free to change it around. Skip From kennypitt at hotmail.com Wed Sep 10 11:46:31 2003 From: kennypitt at hotmail.com (Kenny Pitt) Date: Wed Sep 10 10:48:32 2003 Subject: [spambayes-dev] sb_* in config files? In-Reply-To: <16223.12550.849888.147657@montanaro.dyndns.org> Message-ID: <002a01c377aa$54876190$300a10ac@spidynamics.com> Skip Montanaro wrote: > Should the (for example) pop3proxy section of the ini/config file be > changed to sb_server? Ditto for other changed scripts who happen to > have their own section. > My first impression is -1, at least for pop3proxy. While the sb_server script contains the implementation of the POP3 proxying functionality, the pop3proxy section contains settings that control the POP3 proxying behavior and not the behavior of the sb_server script in general. The name [sb_server] seems a little too generic to me in regards to the real meaning of the settings. -- Kenny Pitt From vanhorn at whidbey.com Wed Sep 10 10:11:23 2003 From: vanhorn at whidbey.com (G. Armour Van Horn) Date: Wed Sep 10 12:11:31 2003 Subject: [spambayes-dev] pop3proxy suggestion References: <1ED4ECF91CDED24C8D012BCF2B034F130332A996@its-xchg4.massey.ac.nz> <16223.11391.521632.905968@montanaro.dyndns.org> Message-ID: <3F5F4D2B.EC889ACC@whidbey.com> Skip Montanaro wrote: > Tony> Where are you thinking of putting the score? In a column of it's > Tony> own? After the 'show clues' link? If you describe what you want, > Tony> I can add it for you easily enough. > > I think it belongs in its own column. One reason for separating it out is > that eventually it would be nice to allow users to sort their review page by > From, Subject or score. This could become valuable down the road, assuming that there really is value to keeping a balance between spam and ham in training. I currently train on every message. Of my two proxies, one is staying pretty closely balanced (lots of list and admin ham), but the other has been trained on Spam: 11691 Ham: 7288. I can see potential value in training that on all ham, all unsures, and borderline spam to avoid the growing imbalance. Van -- ---------------------------------------------------------- Sign up now for Quotes of the Day, a handful of quotations on a theme delivered every morning. Enlightenment! Daily, for free! mailto:twisted@whidbey.com?subject=Subscribe_QOTD For web hosting and maintenance, visit Van's home page: http://www.domainvanhorn.com/van/ ---------------------------------------------------------- From rob at hooft.net Wed Sep 10 19:49:23 2003 From: rob at hooft.net (Rob Hooft) Date: Wed Sep 10 12:49:20 2003 Subject: [spambayes-dev] Names and webname Message-ID: <3F5F5613.6060301@hooft.net> Two things after catching up with my E-mail after the holidays: 1) The naming of the files: if the names of programs should be uniquified, why use sb_* instead of the more obvious spambayes_*? Only to keep with the short-name habit of unix? to try and be close to the 8+3 limit? 2) spambayes.org is running out. We're not using it at all so far. Should I extend it? Rob -- Rob W.W. Hooft || rob@hooft.net || http://www.hooft.net/people/rob/ From kennypitt at hotmail.com Wed Sep 10 16:24:59 2003 From: kennypitt at hotmail.com (Kenny Pitt) Date: Wed Sep 10 15:25:38 2003 Subject: [spambayes-dev] FW: Web Site In-Reply-To: <16223.11809.909587.568039@montanaro.dyndns.org> Message-ID: <003801c377d1$4583c680$300a10ac@spidynamics.com> Skip Montanaro wrote: > Tony> I'm completely on the fence with this one, but I imagine that > Tony> other developers aren't, so a discussion on spambayes-dev is the > Tony> best idea. Do people: > Tony> (a) like the current design > Tony> (b) want a new design, but not radically different > Tony> (c) want a radically different design? > > I think it would be fine it we tracked the Python site's design with > slight modifications. Note that the ht2html used for the Python site > now groks ReST, so many of the pages are no longer written in HTML > directly. [snip] If anyone is interested in what the SpamBayes site would look like using the Python site design, the attached ZIP contains a sample of the SpamBayes home page. I created a new version of the SpamBayes logo in the python.org style, replaced style.css with a copy of the python.org stylesheet, and based on the PDOGenerator included with the ht2html distribution to generate the html. -- Kenny Pitt -------------- next part -------------- A non-text attachment was scrubbed... Name: spambayes_index.zip Type: application/x-zip-compressed Size: 10702 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes-dev/attachments/20030910/2f07e651/spambayes_index.bin From richie at entrian.com Wed Sep 10 22:32:50 2003 From: richie at entrian.com (Richie Hindle) Date: Wed Sep 10 16:32:56 2003 Subject: [spambayes-dev] Names & 1.0a6 release In-Reply-To: <1063157059.4531.9.camel@porsche.hq> References: <1ED4ECF91CDED24C8D012BCF2B034F130212AE65@its-xchg4.massey.ac.nz> <9vkslvcqjqmkkiqq62ahte22cpt4a6posm@4ax.com> <1063157059.4531.9.camel@porsche.hq> Message-ID: [Remi] > What can be done under unix (Linux) is to have everything in > /usr/local/spambayes and then soft link in /usr/local/bin for all the > executables. Sure, but that still means the names have to live alongside everything else on your path. The choice is to either use globally unique names and put them on the PATH, or use whatever names we like and keep them off the PATH. Whether the things on your PATH are the executables or links to them makes no difference. -- Richie Hindle richie@entrian.com From richie at entrian.com Wed Sep 10 22:33:41 2003 From: richie at entrian.com (Richie Hindle) Date: Wed Sep 10 16:33:47 2003 Subject: [spambayes-dev] pop3proxy execution "rules" on Windows In-Reply-To: <3cf801c37409$403b1f50$f502a8c0@eden> References: <3cf801c37409$403b1f50$f502a8c0@eden> Message-ID: [Mark] > My idea was basically to document: > * The SpamBayes proxy is a per user program - therefore, it doesn't run as a > service. > * People want a service, even though they shouldn't. So we have provided > one - but if you use it, you must configure it yourself and the non-service > version of the proxy won't work (but the tray-bar icon will - in that case > it will be controlling the service rather than running the proxy) When you put it like that, it makes a lot of sense. > We stick with the plan you outlined, unless Richie has more comments. Nope, nothing more to add. -- Richie Hindle richie@entrian.com From richie at entrian.com Wed Sep 10 22:41:34 2003 From: richie at entrian.com (Richie Hindle) Date: Wed Sep 10 16:41:41 2003 Subject: [spambayes-dev] Names and webname In-Reply-To: <3F5F5613.6060301@hooft.net> References: <3F5F5613.6060301@hooft.net> Message-ID: [Rob] > spambayes.org is running out. We're not using it at all so far. > Should I extend it? Yes please - we *are* using it, in that when I'm on someone else's machine (away from my bookmarks) and I want to visit the Spambayes website, it's "spambayes.org" that I type into the address bar. 8-) [I'll take over the DNS - and the fees - if you don't want it any more.] -- Richie Hindle richie@entrian.com From papaDoc at videotron.ca Wed Sep 10 17:54:00 2003 From: papaDoc at videotron.ca (papaDoc) Date: Wed Sep 10 16:53:12 2003 Subject: [spambayes-dev] Names & 1.0a6 release In-Reply-To: References: <1ED4ECF91CDED24C8D012BCF2B034F130212AE65@its-xchg4.massey.ac.nz> <9vkslvcqjqmkkiqq62ahte22cpt4a6posm@4ax.com> <1063157059.4531.9.camel@porsche.hq> Message-ID: <3F5F8F68.9000608@videotron.ca> Richie Hindle wrote: >[Remi] > > >>What can be done under unix (Linux) is to have everything in >>/usr/local/spambayes and then soft link in /usr/local/bin for all the >>executables. >> >> > >Sure, but that still means the names have to live alongside everything >else on your path. > Yes, but the path variable will stay the same length if /usr/local/bin is already in your path. The "beauty of this" is: if spambayes executable (pop3proxy.py hammie.py) are changed to have unique name then their name will be less obvious since we cannot take any name. If you take a quick look at the directory /usr/local/bin sb_server.py won't tell me much about what it is. If I have a soft link I will know it is part of spambayes. cd /usr/local/bin; ls > sb_server.py > a_progam > a_second_program ls -l > sb_server.py -> /usr/local/spambayes/sb_server.py > a_program > a_second_program Sure you can do: sb_server.py -h to know what the program is doing. >The choice is to either use globally unique names and >put them on the PATH, or use whatever names we like and keep them off the >PATH. > So I prefer to have spambayes in its own directory if needed we can have soft link in /usr/local/bin ;-) >Whether the things on your PATH are the executables or links to >them makes no difference. > It makes some differences, easier to know where this program is coming from. Easier to remove. Remi From skip at pobox.com Wed Sep 10 16:57:49 2003 From: skip at pobox.com (Skip Montanaro) Date: Wed Sep 10 16:58:01 2003 Subject: [spambayes-dev] spambayes.org Message-ID: <16223.36941.876796.795768@montanaro.dyndns.org> I saw Richies response about the spambayes.org domain but not Rob's original note. Perhaps the PSF could be persuaded to pick up the tab on it. Skip From mhammond at skippinet.com.au Thu Sep 11 10:24:08 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Wed Sep 10 19:24:19 2003 Subject: [spambayes-dev] FW: Web Site In-Reply-To: <16223.11809.909587.568039@montanaro.dyndns.org> Message-ID: <351b01c377f2$a34d02b0$f502a8c0@eden> > I think it would be fine it we tracked the Python site's > design with slight modifications. While I have no specific objection, I do note that SpamBayes is more "user" oriented, whereas Python is not. Someone looking at Python is likely to already have certain technical skills, and appreciates a "low key" design. SpamBayes users are less likely to be technically literate beyond using their mail program. Also, if the future of python.org truly is http://pollenation.net/assets/public/python-main-2.html, we should be thinking about that instead :) I have no further details on that page - if I understand things correctly, it is primarily being discussed on the marketting-python list. Mark From ta-meyer at ihug.co.nz Thu Sep 11 13:12:50 2003 From: ta-meyer at ihug.co.nz (Tony Meyer) Date: Wed Sep 10 20:14:07 2003 Subject: [spambayes-dev] RE: [Spambayes-checkins] spambayes/spambayes storage.py, 1.31, 1.32 In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1303346327@its-xchg4.massey.ac.nz> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F13026F2923@its-xchg4.massey.ac.nz> > Modified Files: > storage.py > Log Message: > Bug 803501: Fix the "No dbm modules available" message to > print rather than crash. """ ! print >> sys.stderr, "You do not have a dbm module available " \ "to use. You need to either use a pickle (see the FAQ)" \ ", use Python 2.3 (or above), or install a dbm module " \ "such as bsddb (see http://sf.net/projects/pybsddb)." - import sys """ I'm willing to believe that this fixes the problem, but I don't see how this works, and in the interests of bettering my Python understanding, could someone explain it? sys is imported at the top of storage.py, so I see that the "import sys" is no longer necessary. What I don't get is how there could be this error: """ File "/Users/chris/Desktop/spambayes-1.0a5/spambayes/storage.py", line 652, in open_storage print >> sys.stderr, """\ UnboundLocalError: local variable 'sys' referenced before Assignment """ when sys has been imported. Especially since the change is removing a line *after* the crash. What am I missing? =Tony Meyer From T.A.Meyer at massey.ac.nz Thu Sep 11 13:20:12 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Wed Sep 10 20:20:28 2003 Subject: [spambayes-dev] Names and webname Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130332ACD6@its-xchg4.massey.ac.nz> [Rob] > 1) The naming of the files: if the names of programs should be > uniquified, why use sb_* instead of the more obvious spambayes_*? IIRC, no-one pushed for spambayes_*, so it wasn't really considered. Personally, I like sb_* a lot more, though, because I quite often type out the commands by hand, and that saves me 7 keystrokes..."scripts/sb_server.py" is already 8 keystrokes longer than "pop3proxy.py". [Rob] > 2) spambayes.org is running out. We're not using it at all so far. > Should I extend it? Yes. It is used - for example it's on the bottom of every page of the web interface, so is the obvious way for pop3proxy/imapfilter people to find the website. It also frees spambayes from being tied to sf (if python moved elsewhere, I would imagine that spambayes would consider it, too). [Skip] > Perhaps the PSF could be persuaded to pick up the tab on it. +1 :) I can't recall if the PSF donation link manages to record a 'reason' for donating or not, but I suspect that the donations for spambayes (based on messages posted) might cover it. [Richie] > [I'll take over the DNS - and the fees - if you don't want it any more.] I'd be willing to contribute towards the fees (but have no desire to manage another domain) if this is necessary. =Tony Meyer From T.A.Meyer at massey.ac.nz Thu Sep 11 13:30:17 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Wed Sep 10 20:30:31 2003 Subject: [spambayes-dev] sb_* in config files? Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130332ACE7@its-xchg4.massey.ac.nz> > Should the (for example) pop3proxy section of the ini/config > file be changed to sb_server? Ditto for other changed > scripts who happen to have their own section. I had considered this too and didn't mention anything because I figured it would be a real PITA :) I agree with the person (sorry, lost the email) that said that changing the POP3 proxy options is -1. These options are still about proxying a POP3 server, so it makes sense for them to have their own section - although I suspect that at some point in the (not near) future the caching options in that section will end up moving to the "Storage" section, and maybe the "notate_to" and "notate_subject" options will move to the "Headers" section (they probably should be there now). I don't use hammiefilter/sb_filter, but I'm not sure that the "hammie" options should be changed either. There are only three left, and two (the 'debug' header names) will probably be deprecated at some point [1]. The remaining option could probably find a new home at that point, in whatever section seems most appropriate at that time. Changing the "hammie" options is probably even trickier than the pop3proxy (etc) ones, because the pop3proxy people get an interface to configure, and so don't actually know what the options are called [2]. =Tony Meyer [1] It's either my bad or Tim Stone's that there are the 'debug' header options in 'hammie' and the 'evidence' header options in 'headers', which do the same thing. Whatever the options end up being called, I imagine that the code for doing this will be centralised at some point (if hammie used message.py, for example). [2] If Skip finds that the sb_server/sb_upload combo works well, then I'll probably add a config page to the ui for those sorts of users for 1.1a1. (Changing the focus of sb_server away from pop3proxy means changes to what's presented anyway). From ta-meyer at ihug.co.nz Thu Sep 11 13:39:33 2003 From: ta-meyer at ihug.co.nz (Tony Meyer) Date: Wed Sep 10 20:39:38 2003 Subject: [spambayes-dev] Feature freeze & branch Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F13026F2925@its-xchg4.massey.ac.nz> IIRC it wasn't really discussed, but I presume that the way things will go is that in a couple of weeks or so (whenever we don't seem to be getting any more 1.0a5 bug reports, and have fixed all the open ones), we'll put out 1.0b1, then after a similar period, 1.0rc1 and 1.0. All of this makes quite a while before new features are added. What about opening a cvs branch for 1.1 that we can merge back in after 1.0 is released, which isn't feature frozen? Any objections? Better ideas? =Tony Meyer From T.A.Meyer at massey.ac.nz Thu Sep 11 13:40:36 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Wed Sep 10 20:40:42 2003 Subject: [spambayes-dev] Feature freeze & branch Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130332ACFE@its-xchg4.massey.ac.nz> > IIRC it wasn't really discussed, but I presume that the way > things will go is that in a couple of weeks or so (whenever > we don't seem to be getting any more 1.0a5 bug reports, and > have fixed all the open ones), we'll put out 1.0b1, then > after a similar period, 1.0rc1 and 1.0. Opps. I forgot about 1.0a6. Insert the appropriate statements in there, and the rest stands as is... =Tony Meyer From tim.one at comcast.net Wed Sep 10 22:36:36 2003 From: tim.one at comcast.net (Tim Peters) Date: Wed Sep 10 21:36:43 2003 Subject: [spambayes-dev] RE: [Spambayes-checkins] spambayes/spambayesstorage.py, 1.31, 1.32 In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F13026F2923@its-xchg4.massey.ac.nz> Message-ID: >> Modified Files: >> storage.py >> Log Message: >> Bug 803501: Fix the "No dbm modules available" message to >> print rather than crash. > > """ > ! print >> sys.stderr, "You do not have a dbm module > available " \ > "to use. You need to either use a pickle (see > the FAQ)" \ > ", use Python 2.3 (or above), or install a dbm > module " \ > "such as bsddb (see > http://sf.net/projects/pybsddb)." > - import sys > """ > > I'm willing to believe that this fixes the problem, but I don't see > how this works, and in the interests of bettering my Python > understanding, could someone explain it? > > sys is imported at the top of storage.py, so I see that the "import > sys" is no longer necessary. What I don't get is how there could be > this error: > > """ > File "/Users/chris/Desktop/spambayes-1.0a5/spambayes/storage.py", > line 652, in open_storage > > print >> sys.stderr, """\ > > UnboundLocalError: local variable 'sys' referenced before Assignment > """ > > when sys has been imported. Especially since the change is removing > a line *after* the crash. What am I missing? "import xyz" binds the name xyz in the local scope, to the module object being imported. It's exactly like an assignment statement this way. So you get an UnboundLocalError for exactly the same reason you get one if you run this: x = 1 def f(): print x x = 2 f() Traceback (most recent call last): File "temp2.py", line 5, in ? f() File "temp2.py", line 3, in f print x UnboundLocalError: local variable 'x' referenced before assignment The local name x within f has nothing to do with the global name x, and the local x within f is referenced before it's been bound to a value. The same thing happens with sys. From ta-meyer at ihug.co.nz Thu Sep 11 14:42:27 2003 From: ta-meyer at ihug.co.nz (Tony Meyer) Date: Wed Sep 10 21:42:47 2003 Subject: [spambayes-dev] RE: [Spambayes-checkins] spambayes/spambayesstorage.py, 1.31, 1.32 In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1303346372@its-xchg4.massey.ac.nz> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130212AE8F@its-xchg4.massey.ac.nz> > "import xyz" binds the name xyz in the local scope, to the > module object being imported. It's exactly like an > assignment statement this way. [...] > The local name x within f has nothing to do with the global > name x, and the local x within f is referenced before it's > been bound to a value. The same thing happens with sys. Ah, so it was the _import_, not the sys.stderr, that was the problem. Now I understand :) Thanks, Tony From leon at isdn.net Wed Sep 10 21:47:43 2003 From: leon at isdn.net (Leon Oosterwijk) Date: Wed Sep 10 21:47:52 2003 Subject: [spambayes-dev] Python programming for outlook plugins & spambayes uninstall In-Reply-To: <16223.13593.84364.826506@montanaro.dyndns.org> Message-ID: <006701c37806$b1ec34d0$720241cf@galvatron> All, I'm new both to python and spambayes. So far I'm impressed with both. I've got 2 questions: 1: I installed spambayes from source. It created the plugin just fine. However, there is now way to remove the plugin. Can anyone shed some light on this. 2: I'm planning to write my own spam filter. This filter will also feature a plugin to outlook. I'm rather new to programming plugins for outlook. Does anyone have any good reading they can recommend? I found some reading which talks about using VB6. it seems you need CDO and there are tons of incompatibilities between the different versions of outlook. It seems python would be a cleaner way to go. The spam filter I'm trying to write is a thesis project that will feature server-side or centralized spam filtering and a mechanism for users to report spam (for learning) by means of a network where their credentials are authorized by trust metrics (see raph levien's work for more details). Thank you . Leon From tim.one at comcast.net Wed Sep 10 22:59:37 2003 From: tim.one at comcast.net (Tim Peters) Date: Wed Sep 10 21:59:47 2003 Subject: [spambayes-dev] Names and webname In-Reply-To: <3F5F5613.6060301@hooft.net> Message-ID: [Rob Hooft] > Two things after catching up with my E-mail after the holidays: > > 1) The naming of the files: if the names of programs should be > uniquified, why use sb_* instead of the more obvious spambayes_*? Only > to keep with the short-name habit of unix? to try and be close to the > 8+3 limit? The answers so far seem to be "short-name habit". Some fights aren't worth the energy <0.8 wink>. > 2) spambayes.org is running out. We're not using it at all so far. > Should I extend it? Yes, please. If you don't want to pay for it, and/or not hassle with ongoing registration, just say so. For example, Skip mentioned that the PSF might be persuaded to pay the cost, and, if they don't want to, I'd be happy to pay for it. In fact, it would cost me less overall to pay for it than to spend time arguing the case to the PSF . From jeremy at alum.mit.edu Wed Sep 10 23:39:18 2003 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Wed Sep 10 22:39:24 2003 Subject: [spambayes-dev] Feature freeze & branch In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F13026F2925@its-xchg4.massey.ac.nz> References: <1ED4ECF91CDED24C8D012BCF2B034F13026F2925@its-xchg4.massey.ac.nz> Message-ID: <1063247958.2069.141.camel@localhost.localdomain> On Wed, 2003-09-10 at 20:39, Tony Meyer wrote: > IIRC it wasn't really discussed, but I presume that the way things will go > is that in a couple of weeks or so (whenever we don't seem to be getting any > more 1.0a5 bug reports, and have fixed all the open ones), we'll put out > 1.0b1, then after a similar period, 1.0rc1 and 1.0. > > All of this makes quite a while before new features are added. What about > opening a cvs branch for 1.1 that we can merge back in after 1.0 is > released, which isn't feature frozen? Any objections? Better ideas? In the Zope world, we create a branch once the beta release occurs. Any bug fixing on the road to a final release occurs on a branch. That leaves the head free for development while the bugs are fixed in the beta cycle. If there are lots of new features people are eager to work on, it may be easier to use the branch this way. Jeremy From jeremy at alum.mit.edu Wed Sep 10 23:42:57 2003 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Wed Sep 10 22:43:04 2003 Subject: [spambayes-dev] pop3proxy suggestion In-Reply-To: <16223.11391.521632.905968@montanaro.dyndns.org> References: <1ED4ECF91CDED24C8D012BCF2B034F130332A996@its-xchg4.massey.ac.nz> <16223.11391.521632.905968@montanaro.dyndns.org> Message-ID: <1063248176.2069.144.camel@localhost.localdomain> On Wed, 2003-09-10 at 09:51, Skip Montanaro wrote: > Tony> Where are you thinking of putting the score? In a column of it's > Tony> own? After the 'show clues' link? If you describe what you want, > Tony> I can add it for you easily enough. > > I think it belongs in its own column. One reason for separating it out is > that eventually it would be nice to allow users to sort their review page by > From, Subject or score. Yes. This would make the review page much more useful. I find it very difficult to review hundreds of messages at a time, I usually end up just discarding everything but one or two messages that I have to hunt around for. Jeremy From T.A.Meyer at massey.ac.nz Thu Sep 11 17:36:32 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Thu Sep 11 00:36:52 2003 Subject: [spambayes-dev] Python programming for outlook plugins & spambayesuninstall Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130332AE3E@its-xchg4.massey.ac.nz> > I installed spambayes from source. It created the plugin just > fine. However, there is now way to remove the plugin. Can > anyone shed some light on this. Run "addin.py --unregister". > I'm planning to write my own spam filter. This filter will > also feature a plugin to outlook. I'm rather new to > programming plugins for outlook. Does anyone have any good > reading they can recommend? The SpamBayes Outlook plug-in source code. Seriously. Mark's win32 programming book could very well also be of use; I haven't read it, but have seen it recommended a lot ;) Oh, and msdn.microsoft.com. =Tony Meyer From mhammond at skippinet.com.au Thu Sep 11 16:28:57 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Thu Sep 11 01:29:02 2003 Subject: [spambayes-dev] Python programming for outlook plugins & spambayesuninstall In-Reply-To: <006701c37806$b1ec34d0$720241cf@galvatron> Message-ID: <00c301c37825$9b770ee0$f502a8c0@eden> > I installed spambayes from source. It created the plugin just fine. > However, there is now way to remove the plugin. Can anyone shed some > light on this. Run "addin.py --unregister". Note that the toolbar items are not removed - we don't really know how to do that at uninstall time. > 2: > I'm planning to write my own spam filter. This filter will > also feature > a plugin to outlook. I'm rather new to programming plugins > for outlook. > Does anyone have any good reading they can recommend? I found some > reading which talks about using VB6. it seems you need CDO > and there are > tons of incompatibilities between the different versions of > outlook. It > seems python would be a cleaner way to go. Nope - just the various samples all over the web. Note that using CDO will be a problem as newer outlooks will force the security dialogs. You need to either stick to the Outlook object model, or use extended MAPI. Note VB can't use extended MAPI, so a tool named "Redemption" is very useful (but Python does support it, so we don't need that tool) > The spam filter I'm trying to write is a thesis project that will > feature server-side or centralized spam filtering and a mechanism for > users to report spam (for learning) by means of a network where their > credentials are authorized by trust metrics (see raph > levien's work for > more details). I think some systems like that already exist, but I assume you have already found them :) Mark. From leon at isdn.net Thu Sep 11 02:13:48 2003 From: leon at isdn.net (Leon Oosterwijk) Date: Thu Sep 11 02:14:02 2003 Subject: [spambayes-dev] Python programming for outlook plugins & spambayesuninstall In-Reply-To: <00c301c37825$9b770ee0$f502a8c0@eden> Message-ID: <000401c3782b$df1c0c30$720241cf@galvatron> Mark (and Tony), Thank you for your feedback. I've been doing some reading and tinkering. I noticed that the "unregister" does not remove the tool bars. With version 10a4 I was not able to re-register the source version and make it all work. I downloaded 10a5. This version did work, but only after adding all the missing files to the outlook2000\dialogs\resources directory and making changes to msgstore.py (for some reason errored on: folder_eid, ret_class = store.GetReceiveFolder(msg_class, 0) ) perhaps because my folder names are dutch. While doing all the research on outlook automation it struck me that there are a lot of incompatible ways of interacting with the different outlook versions. Spambayes was the first program I found that seemed to handle all versions of outlook while not needing tons of tedious code. My compliments to you. I noticed while messing with the dialogs.h/dialogs.rc files that you seem to import these from visual studio. I assume you do some of your interface design in this program and then convert this over somehow. Do you have more information on this? There are a lot of programs that do server-side spam filtering, and also a lot that do client side spam filtering (like spambayes). The only product I know that neatly ties them together is cloudmark. They went paid I believe and I wasn't too impressed when I used their application last year. Spam filtering seemed like a neat topic, but the implementation of a working trust-metric system is going to be the meat of the academic merit this project has. As far as I know there are only a handful trust metric systems publicly in deployment on the internet right now. Thank you. Leon > -----Original Message----- > From: Mark Hammond [mailto:mhammond@skippinet.com.au] > Sent: Thursday, September 11, 2003 12:29 AM > To: 'Leon Oosterwijk'; spambayes-dev@python.org > Subject: RE: [spambayes-dev] Python programming for outlook > plugins & spambayesuninstall > > > > I installed spambayes from source. It created the plugin just fine. > > However, there is now way to remove the plugin. Can anyone shed some > > light on this. > > Run "addin.py --unregister". Note that the toolbar items are > not removed - > we don't really know how to do that at uninstall time. > > > 2: > > I'm planning to write my own spam filter. This filter will > > also feature > > a plugin to outlook. I'm rather new to programming plugins > > for outlook. > > Does anyone have any good reading they can recommend? I found some > > reading which talks about using VB6. it seems you need CDO > > and there are > > tons of incompatibilities between the different versions of > > outlook. It > > seems python would be a cleaner way to go. > > Nope - just the various samples all over the web. Note that > using CDO will > be a problem as newer outlooks will force the security > dialogs. You need to > either stick to the Outlook object model, or use extended > MAPI. Note VB > can't use extended MAPI, so a tool named "Redemption" is very > useful (but > Python does support it, so we don't need that tool) > > > The spam filter I'm trying to write is a thesis project that will > > feature server-side or centralized spam filtering and a > mechanism for > > users to report spam (for learning) by means of a network > where their > > credentials are authorized by trust metrics (see raph > > levien's work for > > more details). > > I think some systems like that already exist, but I assume > you have already > found them :) > > Mark. > From richie at entrian.com Thu Sep 11 08:51:31 2003 From: richie at entrian.com (Richie Hindle) Date: Thu Sep 11 02:51:37 2003 Subject: [spambayes-dev] Feature freeze & branch In-Reply-To: <1063247958.2069.141.camel@localhost.localdomain> References: <1ED4ECF91CDED24C8D012BCF2B034F13026F2925@its-xchg4.massey.ac.nz> <1063247958.2069.141.camel@localhost.localdomain> Message-ID: [Tony] > All of this makes quite a while before new features are added. What about > opening a cvs branch for 1.1 that we can merge back in after 1.0 is > released, which isn't feature frozen? Any objections? Better ideas? [Jeremy] > In the Zope world, we create a branch once the beta release occurs. Any > bug fixing on the road to a final release occurs on a branch. That > leaves the head free for development while the bugs are fixed in the > beta cycle. If there are lots of new features people are eager to work > on, it may be easier to use the branch this way. +1 This is how we do things too (almost). The release branch is created as soon as we think we're ready for a release, and the release candidate is taken from the branch. -- Richie Hindle richie@entrian.com From mhammond at skippinet.com.au Thu Sep 11 17:52:56 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Thu Sep 11 02:52:55 2003 Subject: [spambayes-dev] Python programming for outlook plugins & spambayesuninstall In-Reply-To: <000401c3782b$df1c0c30$720241cf@galvatron> Message-ID: <00f601c37831$54a95520$f502a8c0@eden> > Thank you for your feedback. I've been doing some reading and > tinkering. > I noticed that the "unregister" does not remove the tool bars. With > version 10a4 I was not able to re-register the source version and make > it all work. I downloaded 10a5. This version did work, but only after > adding all the missing files to the outlook2000\dialogs\resources > directory I think that has been fixed for the next source release. > and making changes to msgstore.py (for some reason > errored on: > folder_eid, ret_class = store.GetReceiveFolder(msg_class, 0) ) perhaps > because my folder names are dutch. Can you please send me the specific exception you got? > While doing all the research on outlook automation it struck me that > there are a lot of incompatible ways of interacting with the different > outlook versions. Spambayes was the first program I found > that seemed to > handle all versions of outlook while not needing tons of tedious code. > My compliments to you. Thanks :) Generally it is just a matter of choosing the right tool for the job :) > I noticed while messing with the dialogs.h/dialogs.rc files that you > seem to import these from visual studio. I assume you do some of your > interface design in this program and then convert this over > somehow. Do > you have more information on this? Not really - we rolled it ourselves. Adam Walker hacked together a script to parse these rc files - you can find them in the directory along with the rc file. The parent directory - Outlook2000\dialogs, and specifically dlgcore.py, has code that is largely agnostic to how resources are loaded or parsed (so long as they are Windows dialog structures!), and create a generic "windows dialog". The other files in the "dialogs" directory are very specific to "dialogs that get data from a SpamBayes OptionsClass" - ie, it is not really specific to SpamBayes, but is specific to the spambayes OptionClass module. Currently though, Outlook is the only app that defines dialogs using this. It is feasable that other SpamBayes apps, eg the pop3proxy toolbar, could also use dialogs from this system (in which case the code would get restructured) > There are a lot of programs that do server-side spam > filtering, and also > a lot that do client side spam filtering (like spambayes). The only > product I know that neatly ties them together is cloudmark. They went > paid I believe and I wasn't too impressed when I used their > application > last year. Our biggest issue with server side filtering is that we are not "rule" based as such, and use data unique to each user. This just doesn't lend itself to server side apps the way things stand. Mark. From anthony at interlink.com.au Thu Sep 11 18:37:13 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Thu Sep 11 03:37:49 2003 Subject: [spambayes-dev] wierdness in generated faq.ht Message-ID: <200309110737.h8B7bEIx011089@localhost.localdomain> I'm seeing stuff like:
  • 1.3&&&&&&What online resources are available?
  • in the faq.ht file. What the heck?? Is anyone else getting this? -- Anthony Baxter It's never too late to have a happy childhood. From T.A.Meyer at massey.ac.nz Thu Sep 11 20:58:25 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Thu Sep 11 03:58:49 2003 Subject: [spambayes-dev] wierdness in generated faq.ht Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130332AEA6@its-xchg4.massey.ac.nz> > I'm seeing stuff like: > >
  • href="#what-online-resources-are-available" id="id6" na > me="id6">1.3&&&&&&What online resources are available?
  • > > in the faq.ht file. What the heck?? Is anyone else getting this? I get this sort of thing:
  • 1.3??????What online resources are available?
  • Is the "&&&&&&" in your quote the same as the "??????" (those are capital A's with accents) that I get? In the final html everything is ok again though, so I figured this was some sort of .ht magic. BTW I just checked in a fix for an unterminated 'interpreted text' phrase, which was causing an error, but that doesn't seem to have changed anything. =Tony Meyer From anthony at interlink.com.au Thu Sep 11 21:59:58 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Thu Sep 11 07:01:17 2003 Subject: [spambayes-dev] Re: [Spambayes-checkins] spambayes/spambayes storage.py, 1.31, 1.32 In-Reply-To: Message-ID: <200309111059.h8BAxwWq012406@localhost.localdomain> [redirecting away from the -checkins list to the -dev list] >>> Richie Hindle wrote > I'm not asking that people test every part of the code after making an > edit. I'm asking that people *run* the specific code they've edited - > that should be no kind of "ask", let alone a big one. Maybe as part of the goals for 1.1, we should work at a complete/thorough test suite? Anthony -- Anthony Baxter It's never too late to have a happy childhood. From skip at pobox.com Thu Sep 11 09:32:36 2003 From: skip at pobox.com (Skip Montanaro) Date: Thu Sep 11 09:32:58 2003 Subject: [spambayes-dev] FW: Web Site In-Reply-To: <351b01c377f2$a34d02b0$f502a8c0@eden> References: <16223.11809.909587.568039@montanaro.dyndns.org> <351b01c377f2$a34d02b0$f502a8c0@eden> Message-ID: <16224.31092.319588.373310@montanaro.dyndns.org> Mark> Also, if the future of python.org truly is Mark> http://pollenation.net/assets/public/python-main-2.html, we should Mark> be thinking about that instead :) I have no further details on Mark> that page - if I understand things correctly, it is primarily Mark> being discussed on the marketting-python list. In my opinion, it was posted to c.l.py improperly. At the very least, it was posted out of context. Consequently, Tim Parkin had to withstand a lot of needless slings and arrows for what was just something to talk about within the pydotorg-redesign mailing list. Several other mockups were also presented to that group. From skip at pobox.com Thu Sep 11 10:00:15 2003 From: skip at pobox.com (Skip Montanaro) Date: Thu Sep 11 10:00:39 2003 Subject: [spambayes-dev] RE: [Spambayes-checkins] spambayes/spambayes storage.py, 1.31, 1.32 In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F13026F2927@its-xchg4.massey.ac.nz> References: <1ED4ECF91CDED24C8D012BCF2B034F1303346327@its-xchg4.massey.ac.nz> <1ED4ECF91CDED24C8D012BCF2B034F13026F2927@its-xchg4.massey.ac.nz> Message-ID: <16224.32751.262014.315275@montanaro.dyndns.org> Tony> I've checked in a unittest (in test_storage.py) that tests that we Tony> still fail (correctly) when no dbm modules are available. This is Tony> my first ever unittest, so I'd welcome any critiques. This fails for me. I've so far traced it back to this install error: running install_lib warning: install_lib: 'build/lib' does not exist -- no Python modules to install running install_scripts Something must be hosed in the latest version of setup.py, but I'm not a distutils expert. I don't see anything obviously wrong with it. If you delete your build directory then try to install, does it fail for you? I'm running Python from CVS as of August 11th and SpamBayes from CVS. Skip From skip at pobox.com Thu Sep 11 10:03:52 2003 From: skip at pobox.com (Skip Montanaro) Date: Thu Sep 11 10:04:23 2003 Subject: [spambayes-dev] wierdness in generated faq.ht In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F130332AEA6@its-xchg4.massey.ac.nz> References: <1ED4ECF91CDED24C8D012BCF2B034F130332AEA6@its-xchg4.massey.ac.nz> Message-ID: <16224.32968.495083.620064@montanaro.dyndns.org> Tony> Is the "&&&&&&" in your quote the same as the "??????" (those are Tony> capital A's with accents) that I get? Tony> In the final html everything is ok again though, so I figured this Tony> was some sort of .ht magic. Yes, by default ReST uses UTF-8. Those weird characters are Unicode hard spaces I think. Skip From skip at pobox.com Thu Sep 11 10:09:03 2003 From: skip at pobox.com (Skip Montanaro) Date: Thu Sep 11 10:09:23 2003 Subject: [spambayes-dev] Re: [Spambayes-checkins] spambayes/spambayes storage.py, 1.31, 1.32 In-Reply-To: <200309111059.h8BAxwWq012406@localhost.localdomain> References: <200309111059.h8BAxwWq012406@localhost.localdomain> Message-ID: <16224.33279.463658.236459@montanaro.dyndns.org> Anthony> Maybe as part of the goals for 1.1, we should work at a Anthony> complete/thorough test suite? Even for a project the size of SpamBayes (not really very big), I think that's a fair amount to expect (it might delay 1.1 for quite awhile). I would prefer we approach it incrementally. Every time you check in a change you should add tests to cover the changed code. Adding another test or two would be a nice bonus. Skip From skip at pobox.com Thu Sep 11 10:23:02 2003 From: skip at pobox.com (Skip Montanaro) Date: Thu Sep 11 10:23:13 2003 Subject: [spambayes-dev] RE: [Spambayes-checkins] spambayes/spambayes storage.py, 1.31, 1.32 In-Reply-To: <16224.32751.262014.315275@montanaro.dyndns.org> References: <1ED4ECF91CDED24C8D012BCF2B034F1303346327@its-xchg4.massey.ac.nz> <1ED4ECF91CDED24C8D012BCF2B034F13026F2927@its-xchg4.massey.ac.nz> <16224.32751.262014.315275@montanaro.dyndns.org> Message-ID: <16224.34118.653563.527102@montanaro.dyndns.org> Skip> running install_lib Skip> warning: install_lib: 'build/lib' does not exist -- no Python modules to install Skip> running install_scripts Skip> Something must be hosed in the latest version of setup.py, ... Found and fixed. Skip From skip at pobox.com Thu Sep 11 17:36:36 2003 From: skip at pobox.com (Skip Montanaro) Date: Thu Sep 11 17:36:47 2003 Subject: [spambayes-dev] sb_server doesn't display limited set of messages anymore Message-ID: <16224.60132.476538.415314@montanaro.dyndns.org> I thought that someone (me, perhaps) had modified the pop3proxy review page to only display about 20 messages per section at a time in each of the three score categories. I visited my review page this afternoon after ignoring it for about 24 hours and was presented with roughly a day's worth of messages. Took awhile just to display the page. I then scored the unsures and decided to simply Discard the hams and spams. I've been waiting for several minutes now for JavaScript ohHeader('Ham', 'Discard') function to execute. I anticipate a similar wait when I do the same for the spam section. Did I miss a configuration parameter, simply misremember how it used to work or did it get changed back to display all messages all the time? I get a huge amount of mail. If I forget to check the review page for awhile I would really like to only see a handful of messages at a time. This is one reason why. Thx, Skip From T.A.Meyer at massey.ac.nz Fri Sep 12 14:15:17 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Thu Sep 11 21:15:28 2003 Subject: [spambayes-dev] Re: [Spambayes-checkins] spambayes/spambayesstorage.py, 1.31, 1.32 Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130332B091@its-xchg4.massey.ac.nz> [Anthony] > Maybe as part of the goals for 1.1, we should work at a > complete/thorough test suite? [Skip] > I would prefer we approach it incrementally. Every time > you check in a change you should add tests to cover the > changed code. Adding another test or two would be a nice bonus. +1. [Richie] > I'm not asking that people test every part of the code > after making an edit. I'm asking that people *run* the > specific code they've edited I'm sure that I'm easily the worse culprit at this; I've been trying to improve my behaviour and will try harder; my apologies. This particular case was probably a bad example (you should have picked one of my earlier screw ups ;) because Skip's edit shouldn't have broken anything, and would have involved running various different setups to test it all. The problem (traceable to me...) was not following nice style guidelines and putting the import at the top [1]. So anyway: bad me, I will improve, but +1 to making things easier to test. =Tony Meyer [1] Until Tim explained in a few messages back, I didn't realise that this would be a problem; the same sort of thing would happily work in C... From tim.one at comcast.net Thu Sep 11 22:34:30 2003 From: tim.one at comcast.net (Tim Peters) Date: Thu Sep 11 21:34:32 2003 Subject: [spambayes-dev] Re: [Spambayes-checkins]spambayes/spambayesstorage.py, 1.31, 1.32 In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F130332B091@its-xchg4.massey.ac.nz> Message-ID: [Tony Meyer] > ... > [1] Until Tim explained in a few messages back, I didn't realise that > this would be a problem; the same sort of thing would happily work in > C... Actually not, and the failure could have been much worse in C. A C lookalike would be int sys = 4; int whatever() { int sys; call_something(sys); sys = 3; } whatever(); That's exactly the same thing: the file-scope "sys" and the whatever-scope "sys" have nothing in common, and the whatever-scope sys is referenced before a value is assigned to it. In the case of C, the behavior is undefined, and can easily lead to, e.g., a segfault and core dump. The only thing that makes it hard to see the equivalence is that Python doesn't have *explicit* local (C "auto scope") declarations. The good news is that Python knows when a reference to an unbound variable is made, and raises an exception (instead of, as most C implementations end up doing, using whatever trash bits happen to be sitting on the HW stack). From T.A.Meyer at massey.ac.nz Fri Sep 12 15:09:22 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Thu Sep 11 22:09:39 2003 Subject: [spambayes-dev] Re: [Spambayes-checkins]spambayes/spambayesstorage.py, 1.31, 1.32 Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130332B0D2@its-xchg4.massey.ac.nz> > Actually not, and the failure could have been much worse in > C. A C lookalike would be > > int sys = 4; > > int whatever() > { > int sys; > call_something(sys); > sys = 3; > } > > whatever(); Ok, so I meant C++: int sys = 4; # sys = 4 int whatever() # def whatever(): { call_something(sys); # call_something(sys) int sys = 3; # sys = 3 } int main() { whatever(); # whatever() } Where until the local "int sys" the sys referred to is the global one, and then after that it's the local one. This is what the Python *looks* like (since the "int sys" in the C example isn't in the Python), although not what it is actually like. I do understand that even in C++ it's not good practice, and will in future place my imports where they belong. =Tony Meyer From tim.one at comcast.net Thu Sep 11 23:22:26 2003 From: tim.one at comcast.net (Tim Peters) Date: Thu Sep 11 22:22:40 2003 Subject: [spambayes-dev] Re: [Spambayes-checkins]spambayes/spambayesstorage.py, 1.31, 1.32 In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F130332B0D2@its-xchg4.massey.ac.nz> Message-ID: [Tony Meyer] > Ok, so I meant C++: > > int sys = 4; # sys = 4 > > int whatever() # def whatever(): > { > call_something(sys); # call_something(sys) > int sys = 3; # sys = 3 > } > > int main() > { > whatever(); # whatever() > } > > Where until the local "int sys" the sys referred to is the global one, > and then after that it's the local one. This is what the Python > *looks* like (since the "int sys" in the C example isn't in the > Python), although not what it is actually like. Good point! Python actually acts as if the declaration were floated up to the top of the block defining the function. > I do understand that even in C++ it's not good practice, and will in > future place my imports where they belong. There are worse sins, Tony -- it was an easy mistake to fall into, and especially if you read Mark's code . From mhammond at skippinet.com.au Fri Sep 12 13:41:50 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Thu Sep 11 23:06:46 2003 Subject: [spambayes-dev] Re:[Spambayes-checkins]spambayes/spambayesstorage.py, 1.31, 1.32 In-Reply-To: Message-ID: <03d601c378d7$6b7ece00$f502a8c0@eden> > There are worse sins, Tony -- it was an easy mistake to fall into, and > especially if you read Mark's code . Think global, act local! Mark. From tim.one at comcast.net Fri Sep 12 00:14:46 2003 From: tim.one at comcast.net (Tim Peters) Date: Thu Sep 11 23:14:51 2003 Subject: [spambayes-dev] Re:[Spambayes-checkins]spambayes/spambayesstorage.py, 1.31, 1.32 In-Reply-To: <03d601c378d7$6b7ece00$f502a8c0@eden> Message-ID: [Tim, on spraying imports at random spots within a function body] >> There are worse sins, Tony -- it was an easy mistake to fall into, >> and especially if you read Mark's code . [Mark Hammond] > Think global, act local! LOL! If we had a spambayes QOTW, it would take top honors. From mjeaya069 at rogers.com Fri Sep 12 03:33:59 2003 From: mjeaya069 at rogers.com (Mark Jeays) Date: Fri Sep 12 02:31:29 2003 Subject: [spambayes-dev] More stats talk (warning, long) Message-ID: Hello all, I posted a message about a stats feature a while ago and have been working on some concepts, as well as some code. Using the old (0.3 version) I did have something that was working for me on the source code version. I appropriated the greyed-out "Advanced" button and made the dialog that I posted to the list at http://mail.python.org/pipermail/spambayes/attachments/20030813/a1d9b90f/sta ts-0001.jpg I was reasonably happy with this and getting ready to post it but then the GUI design changed significantly (definitely for the better!) and so here I am. I'm afraid I don't have quite enough python experience (or CVS experience, for that matter, right now) to make a working patch. However, I do have some code that I wanted to send to the list, in the hopes that someone with some more knowledge of the GUI might fairly easily be able to slot this in to the 1.1 branch of the project. I have looked around in the dialogs.rc file and so forth but I am only just "Learning Python" with the book of the same name and this is a bit beyond me right now. I'm not sure if a stats tab (there's lots of room next to the other four tabs...) needs to be justified, but: * there's at least some demand * it's worthwhile to provide some sort of feedback to the user on how effective the program is * i like numbers ;) Some of the stats that can fairly easily be generated from this are things like * total number of spam/ham/unsure/total emails found, average per day * session totals of same since you loaded outlook * false positives, false negatives (add to count when Recover or Delete key is pressed, if not already in the appropriate category). * All sorts of neat stats such as accuracy, error rate, spam recall, spam precision, etc. as outlined in: http://www.ics.forth.gr/~potamias/mlnia/paper_2.pdf (not all in sample code attached but easy to calculate) There is one potential problem, which is that I think without integrating the stats with the database it would be difficult to have a rock-solid accounting system for every email, for at least two reasons: people can classify on buttons multiple times, and the existence of the "unsure" category makes things difficult. Conceptually, every email is either ham or spam. A false positive occurs when a ham is categorized as spam, and a false negative occurs when a spam is categorized as ham. I'm not too sure how to fit "unsures" into this scheme. One possibility is to simply not count them. But, if a particular mail is rated as "unsure", it's not a FP or FN, and it's not correct either. Any ideas on how to handle this? My thought (implemented in the attachment) on this was to just not count unsure emails in the total until a categorization has occurred (as I always train "unsure"s immediately, this discrepancy wouldn't show up in the stats). The total emails reported would then be the actual total received minus the number of untrained "unsure"s. I'd personally rather there was no "unsure" category (and have occasionally set the spam and unsure thresholds to the same number to do this), but that would be an example of the tail (or at least a few hairs on the tail) wagging the dog... ;) As for the clicking-multiple times problem, using what I have, if someone does a Delete and Recover on the same message it would count as both a FP and FN, which can't really be true. For my purposes this seems like something that I wouldn't worry about, although I can understand objections here. I don't know how to get around this without a more tightly-integrated stats concept. Anyway, I think what is suitable is the following: * Email comes in, if ham or spam, increment appropriate count, if unsure do nothing for now. * When an "unsure" gets Recovered or Deleted, increment ham or spam counter * When an email classified as ham gets "deleted as spam", increment false negative counter * When an email classified as spam gets "recovered from spam", increment false positive counter. * Unsures would be counted as they are right now, i.e. by filter.py. My feeling is that if it goes to the Unsure folder then the user is going to classify the message by hand. I don't really like the idea of having a third "Unsure" category for stats purposes alongside ham and spam, since it reflects lack of confidence by the program rather than a reality. If you read this far, thank you! If this isn't suitable, that's fine, I learned something anyway and had fun playing around with spambayes and python. Regards, Mark Jeays Attached are two files: * sb-stats.txt is an expansion of the Stats class in manager.py * sb-message.txt is some code to output the various stats (not hooked up to any GUI), to go in a potential new StatsDialog.py (or similar) -------------- next part -------------- # expansion of existing Stats class in manager.py # to use this, these methods would be called in addin.py # num_unsure, num_seen, num_spam are already incremented in filter.py # in ButtonDeleteAsSpamEvent there would be # self.manager.stats.num_fn += 1 # if message is 'Unsure' then # self.manager.stats.num_spam += 1 # in ButtonRecoverFromSpamEvent there would be # self.manager.stats.num_fp += 1 # if message is 'Unsure' then # self.manager.stats.num_ham += 1 # in OnDisconnection, call # StoreAll() import _winreg class Stats: def get_time(self): # return initial time. if it's not there, initialize with current time try: temp = _winreg.QueryValueEx(self.key, "time") return temp[0] except: thetime = int(time.time()) _winreg.SetValueEx(self.key, "time", None, _winreg.REG_DWORD, thetime) return thetime def get(self, item): # wrapper around QueryValueEx try: temp = _winreg.QueryValueEx(self.key, item) return temp[0] except: return 0 def StoreAll(self): # store everything to registry #print "init_time: %d, init_seen: %d, init_spam: %d, init_unsure: %d, init_fp: %d, init_fn: %d" % (self.init_time, self.init_seen, self.init_spam, self.init_unsure, self.init_fp, self.init_fn) #print "num_seen: %d, num_spam: %d, num_unsure: %d, num_fp: %d, num_fn: %d" % (self.num_seen, self.num_spam, self.num_unsure, self.num_fp, self.num_fn) self.key = _winreg.OpenKey(self.root, self.regkey, 0, _winreg.KEY_ALL_ACCESS) self.store("num_seen", self.num_seen + self.init_seen) self.store("num_spam", self.num_spam + self.init_spam) self.store("num_unsure", self.num_unsure + self.init_unsure) self.store("num_fp", self.num_fp + self.init_fp) self.store("num_fn", self.num_fn + self.init_fn) _winreg.CloseKey(self.key) def store(self, item, value): # wrapper around SetValueEx try: _winreg.SetValueEx(self.key, item, None, _winreg.REG_DWORD, value) except: print "Failed to set item %s with value %d in registry" % (item, value) def __init__(self): self.root = _winreg.HKEY_CURRENT_USER self.regkey = "Software\\Microsoft\\Office\\Outlook\\Addins\\SpamBayes.OutlookAddin" self.num_seen = self.num_spam = self.num_unsure = 0 self.key = _winreg.OpenKey(self.root, self.regkey, 0, _winreg.KEY_ALL_ACCESS) self.init_time = self.get_time() self.start_time = int(time.time()) self.init_seen = self.get("num_seen") self.init_spam = self.get("num_spam") self.init_unsure = self.get("num_unsure") self.init_fp = self.get("num_fp") self.init_fn = self.get("num_fn") self.num_fp = 0 self.num_fn = 0 _winreg.CloseKey(self.key) #print "init_time: %d, init_seen: %d, init_spam: %d, init_unsure: %d, init_fp: %d, init_fn: %d" % (self.init_time, self.init_seen, self.init_spam, self.init_unsure, self.init_fp, self.init_fn) -------------- next part -------------- def getStatsMessage(self): # return a string with info output = "" stats = self.mgr.stats emails = stats.num_seen + stats.init_seen spam = stats.num_spam + stats.init_spam unsure = stats.num_unsure + stats.init_unsure ham = emails - spam currentham = stats.num_seen - stats.num_spam fp = stats.num_fp + stats.init_fp fn = stats.num_fn + stats.init_fn # now spam and ham are wrong. spam = spam + fn - fp ham = ham + fp - fn wrong = fp + fn starttime = stats.init_time currenttime = float(int(time.time())) elapsedtime = currenttime-starttime days = float((currenttime-starttime)/86400) hours = (days - int(days)) * 24 currentdays = float((currenttime-stats.start_time)/86400) currenthours = (currentdays - int(currentdays)) * 24 emails = float(emails) spam = float(spam) ham = float(ham) fn = float(fn) fp = float(fp) output += "This session: Spam: %d, Unsure: %d, Ham: %d, Emails: %d \n" % (stats.num_spam, stats.num_unsure, currentham, stats.num_seen) output += "Per Day: Spam: %0.2f, Unsure: %0.2f, Ham: %0.2f, Emails: %0.2f\n" % (stats.num_spam/currentdays, stats.num_unsure/currentdays, currentham/currentdays, stats.num_seen/currentdays) output += "Totals: Spam: %d, Unsure: %d, Ham: %d, Emails: %d \n" % (spam, unsure, ham, emails) output += "Per Day: Spam: %0.2f, Unsure: %0.2f, Ham: %0.2f, Emails: %0.2f\n" % (spam/days, unsure/days, ham/days, emails/days) output += "This session: %d d %d h. Total days counting: %d d %d h\r" % (currentdays, currenthours, days, hours) output += "Incorrect evaluations: False Positives: %d, False Negatives: %d\n" % (fp, fn) output += "Number of incorrectly evaluated per day %0.2f\n" % ((wrong)/days) if emails > 0: output += "Percent of email that is spam: %0.2f%%\n" % (100*spam/emails) if spam > 0: output += "Percent correct on spam: %0.2f%% " % (100*(spam-fn)/(spam)) if fn > 0: output += "(1 in %d spam was misclassified)\n" % (spam/fn) else: output += "(None misclassified!)\n" if ham > 0: output += "Percent correct on ham: %0.2f%% " % (100*(ham-fp)/(ham)) if fp > 0: output += "(1 in %d ham was misclassified)\n" % (ham/fp) else: output += "(None misclassified!)\n" if emails > 0: output += "Percent correct on all emails: %0.2f%%\n" % (100*(emails-wrong)/(emails)) output += "(Unsure not counted as spam or ham, or in totals)" return output From Andrew.Stickland at perwill.com Fri Sep 12 13:25:53 2003 From: Andrew.Stickland at perwill.com (Andrew Stickland) Date: Fri Sep 12 07:25:59 2003 Subject: [spambayes-dev] Possible enhancement Message-ID: Hi, I've just become a user of this excellent software (outlook plugin) as we're trailing it for our organisation. It's been great so far but as we use Exchange, I can't see an easy route for rolling out a company wide solution. One option that could make this possible for a small organisation is to use a common database. The obvious barrier to this is that the config settings only allow you to define the data directory which also contains the INI files. A possible second barrier is locking of the database but I've not reviewed the code to see if this is a problem. May I take the liberty of suggesting a possible enhancement to the system such that it has an 'override' INI setting so that the database pointer could be redirected to a file server folder? Regards Andrew Stickland phone: +44 (0)1420 545031 mobile: +44 (0)7736 557126 ******************************************************* This email has originated from Perwill plc (Registration No. 1906964) Office registered at: 13A Market Square, Alton, Hampshire, GU34 1UR, UK Tel: +44 (0)1420 545000 Fax: +44 (0)1420 545001 www.perwill.com ******************************************************* Privileged, confidential and/or copyright information may be contained in this email, and is only for the use of the intended addressee. To copy, forward, disclose or otherwise use it in any way if you are not the intended recipient or responsible for delivering to him/her is prohibited. If you receive this email by mistake, please advise the sender immediately, by using the reply facility in your email software. We may monitor the content of emails sent and received via our network for the purposes of ensuring compliance with policies and procedures. This message is subject to and does not create or vary any contractual relationships between Perwill plc and the recipient. ******************************************************* Any opinions expressed in the email are those of the sender and not necessarily of Perwill plc. ******************************************************* This email has been scanned for known viruses using McAfee WebShield 4.5 MR1a ******************************************************* From mhammond at skippinet.com.au Fri Sep 12 22:41:22 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Fri Sep 12 08:27:08 2003 Subject: [spambayes-dev] More stats talk (warning, long) In-Reply-To: Message-ID: <00c301c37922$cb2abad0$f502a8c0@eden> > I'm afraid I don't have quite enough python experience (or > CVS experience, for that matter, right now) to make a working > patch. However, I do have some code that I wanted to send to > the list, This is a good start - thanks! If you can, there are still some thing you can do to help us slot this in. Regarding CVS experience - my suggestion is that you simply start with a decent "diff" program, and ignore CVS. Almost any "diff" should do :) Take a copy of the files you are editing (say - addin_orig.py), and before you thing you have a nice set of changes, run: C:\> diff -u addin_orig.py addin.py > addin.patch Then you can review addin.patch, and check that all your changes still make sense now it is all finished :) I'd be happy to accept addin.patch and integrate it in. Slowly we will get to adding a "cvs" to the start of that command :) I'd suggest you do the following: * Create a new stats.py file - this would be similar to the attached sb-stats.txt * Add the following methods to the "Stats" class: - RecordFilterAction(self, dispostion) # ie, "Yes", "No", "Unsure" or "Error" - RecordRecoverAction(self, recover_type) * Have filter.py and addin.py call these methods. All counters etc are managed internally by the class, rather than externally set as filter.py does now. * Add a method "GenerateReport()" to the "Stats" class - this would return a string - similar to sb-message.txt * Mail me "stats.py", an "addin.patch", "filter.patch" and any other .patch file that becomes necessary. This way we will have an excellent start, and the rest will eventually fall into place. "The rest" will then consist mainly of changes to "stats.py" > There is one potential problem, which is that I think without > integrating > the stats with the database it would be difficult to have a rock-solid > accounting system for every email, for at least two reasons: > people can > classify on buttons multiple times, We manage this already, so that is no problem. > and the existence of the "unsure" > category makes things difficult. > > Conceptually, every email is either ham or spam. A false > positive occurs > when a ham is categorized as spam, and a false negative > occurs when a spam > is categorized as ham. I'm not too sure how to fit "unsures" into this > scheme. One possibility is to simply not count them. But, if > a particular > mail is rated as "unsure", it's not a FP or FN, and it's not > correct either. > Any ideas on how to handle this? > > My thought (implemented in the attachment) on this was to > just not count > unsure emails in the total until a categorization has > occurred (as I always > train "unsure"s immediately, this discrepancy wouldn't show up in the > stats). The total emails reported would then be the actual > total received > minus the number of untrained "unsure"s. Yes, I think you are correct. We simply ignore unsure for certain stats. However, the key thing to do is get the start of a little "interface" all setup, and get even basic stats going. My pathetic excuse for stats prompted you to write this. One small step at a time, as long as it in the right direction, will get there. > As for the clicking-multiple times problem, using what I > have, if someone > does a Delete and Recover on the same message it would count > as both a FP > and FN, which can't really be true. For my purposes this seems like > something that I wouldn't worry about, although I can > understand objections > here. I don't know how to get around this without a more > tightly-integrated > stats concept. We can integrate our stats concept as tightly as we want - it just means a few more ".patch" files to attach :) We would be able to keep these counters in synch without too much trouble, and we don't have to get it right the first time. > If you read this far, thank you! If this isn't suitable, > that's fine, I > learned something anyway and had fun playing around with spambayes and > python. I think it is all suitable, and I hope you keep playing and learning. If you can kep pushing this along as I mention above, then it will all happen fairly quickly. Before you know it you will be firing off patches too fast for me to keep up with :) Mark. From skip at pobox.com Fri Sep 12 09:15:32 2003 From: skip at pobox.com (Skip Montanaro) Date: Fri Sep 12 09:15:45 2003 Subject: [spambayes-dev] sb_* in config files? In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F130332ACE7@its-xchg4.massey.ac.nz> References: <1ED4ECF91CDED24C8D012BCF2B034F130332ACE7@its-xchg4.massey.ac.nz> Message-ID: <16225.50932.176569.258930@montanaro.dyndns.org> Tony> [1] It's either my bad or Tim Stone's that there are the 'debug' Tony> header options in 'hammie' and the 'evidence' header options in Tony> 'headers', which do the same thing. Whatever the options end up Tony> being called, I imagine that the code for doing this will be Tony> centralised at some point (if hammie used message.py, for Tony> example). As long as 1.0a5-1.0a6 will create a fair amount of breakage anyway, can we correct this now, at least within the ini file stuff? BTW, I trust all the developers who don't use the Outlook plugin are running from CVS to help identify problems which need to be fixed as a result of the "grand renaming". ;-) Skip From kennypitt at hotmail.com Fri Sep 12 09:57:39 2003 From: kennypitt at hotmail.com (Kenny Pitt) Date: Fri Sep 12 09:58:11 2003 Subject: [spambayes-dev] Outlook toolbar error Message-ID: <002701c37935$d76bf530$300a10ac@spidynamics.com> Attached is a tracelog of an error that I've been getting in the Outlook addin. The error occurs any time I start Outlook when the SpamBayes toolbar already exists. If I delete the toolbar and restart Outlook, everything works that first time. But on subsequent starts, SpamBayes always fails to add the sub-items to the SpamBayes drop-down. If I'm not mistaken, the problem started around addin.py rev 1.103 when we went to temporary sub-items. I'd be happy to look into this myself, but SourceForge CVS seems so hopelessly out-of-date right now that I thought I'd post first to find out if the problem has already been fixed. No sense in duplicating effort. -- Kenny Pitt -------------- next part -------------- SpamBayes Outlook Addin, version 0.7 (August 9, 2003) starting (with engine SpamBayes Beta2, version 0.2 (July 2003)) on Windows 5.0.2195 (Service Pack 4) using Python 2.3 (#46, Jul 29 2003, 18:54:32) [MSC v.1200 32 bit (Intel)] SpamBayes: Watching for new messages in folder Inbox SpamBayes: Watching for new messages in folder Spam Processing 0 missed spam in folder 'Inbox' took 4.26814ms FAILED to add the toolbar item 'SpamBayesCommand.Manager' - (-2147352567, 'Exception occurred.', (0, None, None, None, 0, -2147467259), None) Traceback (most recent call last): File "C:\src\Tools\spambayes\spambayes\Outlook2000\addin.py", line 963, in _AddControl item = parent.Controls.Add(Type=control_type, Temporary=temporary) File "C:\Dev\Lang\Python23\lib\site-packages\win32com\client\__init__.py", line 451, in __getattr__ return self._ApplyTypes_(*args) File "C:\Dev\Lang\Python23\lib\site-packages\win32com\client\__init__.py", line 445, in _ApplyTypes_ return self._get_good_object_(self._oleobj_.InvokeTypes(*((dispid, 0, wFlags, retType, argTypes) + args)), user, resultCLSID) com_error: (-2147352567, 'Exception occurred.', (0, None, None, None, 0, -2147467259), None) FAILED to add the toolbar item 'SpamBayesCommand.Clues' - (-2147352567, 'Exception occurred.', (0, None, None, None, 0, -2147467259), None) Traceback (most recent call last): File "C:\src\Tools\spambayes\spambayes\Outlook2000\addin.py", line 963, in _AddControl item = parent.Controls.Add(Type=control_type, Temporary=temporary) File "C:\Dev\Lang\Python23\lib\site-packages\win32com\client\__init__.py", line 451, in __getattr__ return self._ApplyTypes_(*args) File "C:\Dev\Lang\Python23\lib\site-packages\win32com\client\__init__.py", line 445, in _ApplyTypes_ return self._get_good_object_(self._oleobj_.InvokeTypes(*((dispid, 0, wFlags, retType, argTypes) + args)), user, resultCLSID) com_error: (-2147352567, 'Exception occurred.', (0, None, None, None, 0, -2147467259), None) FAILED to add the toolbar item 'SpamBayesCommand.FilterNow' - (-2147352567, 'Exception occurred.', (0, None, None, None, 0, -2147467259), None) Traceback (most recent call last): File "C:\src\Tools\spambayes\spambayes\Outlook2000\addin.py", line 963, in _AddControl item = parent.Controls.Add(Type=control_type, Temporary=temporary) File "C:\Dev\Lang\Python23\lib\site-packages\win32com\client\__init__.py", line 451, in __getattr__ return self._ApplyTypes_(*args) File "C:\Dev\Lang\Python23\lib\site-packages\win32com\client\__init__.py", line 445, in _ApplyTypes_ return self._get_good_object_(self._oleobj_.InvokeTypes(*((dispid, 0, wFlags, retType, argTypes) + args)), user, resultCLSID) com_error: (-2147352567, 'Exception occurred.', (0, None, None, None, 0, -2147467259), None) FAILED to add the toolbar item 'SpamBayesCommand.CheckVersion' - (-2147352567, 'Exception occurred.', (0, None, None, None, 0, -2147467259), None) Traceback (most recent call last): File "C:\src\Tools\spambayes\spambayes\Outlook2000\addin.py", line 963, in _AddControl item = parent.Controls.Add(Type=control_type, Temporary=temporary) File "C:\Dev\Lang\Python23\lib\site-packages\win32com\client\__init__.py", line 451, in __getattr__ return self._ApplyTypes_(*args) File "C:\Dev\Lang\Python23\lib\site-packages\win32com\client\__init__.py", line 445, in _ApplyTypes_ return self._get_good_object_(self._oleobj_.InvokeTypes(*((dispid, 0, wFlags, retType, argTypes) + args)), user, resultCLSID) com_error: (-2147352567, 'Exception occurred.', (0, None, None, None, 0, -2147467259), None) FAILED to add the toolbar item 'SpamBayesCommand.HelpPopup' - (-2147352567, 'Exception occurred.', (0, None, None, None, 0, -2147467259), None) Traceback (most recent call last): File "C:\src\Tools\spambayes\spambayes\Outlook2000\addin.py", line 963, in _AddControl item = parent.Controls.Add(Type=control_type, Temporary=temporary) File "C:\Dev\Lang\Python23\lib\site-packages\win32com\client\__init__.py", line 451, in __getattr__ return self._ApplyTypes_(*args) File "C:\Dev\Lang\Python23\lib\site-packages\win32com\client\__init__.py", line 445, in _ApplyTypes_ return self._get_good_object_(self._oleobj_.InvokeTypes(*((dispid, 0, wFlags, retType, argTypes) + args)), user, resultCLSID) com_error: (-2147352567, 'Exception occurred.', (0, None, None, None, 0, -2147467259), None) FAILED to add the toolbar item 'SpamBayesCommand.TestSuite' - (-2147352567, 'Exception occurred.', (0, None, None, None, 0, -2147467259), None) Traceback (most recent call last): File "C:\src\Tools\spambayes\spambayes\Outlook2000\addin.py", line 963, in _AddControl item = parent.Controls.Add(Type=control_type, Temporary=temporary) File "C:\Dev\Lang\Python23\lib\site-packages\win32com\client\__init__.py", line 451, in __getattr__ return self._ApplyTypes_(*args) File "C:\Dev\Lang\Python23\lib\site-packages\win32com\client\__init__.py", line 445, in _ApplyTypes_ return self._get_good_object_(self._oleobj_.InvokeTypes(*((dispid, 0, wFlags, retType, argTypes) + args)), user, resultCLSID) com_error: (-2147352567, 'Exception occurred.', (0, None, None, None, 0, -2147467259), None) From papaDoc at videotron.ca Fri Sep 12 10:03:11 2003 From: papaDoc at videotron.ca (papaDoc) Date: Fri Sep 12 10:02:41 2003 Subject: [spambayes-dev] sb_* in config files? In-Reply-To: <16225.50932.176569.258930@montanaro.dyndns.org> References: <1ED4ECF91CDED24C8D012BCF2B034F130332ACE7@its-xchg4.massey.ac.nz> <16225.50932.176569.258930@montanaro.dyndns.org> Message-ID: <3F61D21F.7040708@videotron.ca> Hi, I want to help you on that even if I'm not a day to day developer (on the SB project ;-( ) but sf is always a couple of hours even days behind when you try to get the files from anonymous cvs .... >BTW, I trust all the developers who don't use the Outlook plugin are running >from CVS to help identify problems which need to be fixed as a result of the >"grand renaming". ;-) > Remi From skip at pobox.com Fri Sep 12 10:23:33 2003 From: skip at pobox.com (Skip Montanaro) Date: Fri Sep 12 10:23:42 2003 Subject: [spambayes-dev] sb_* in config files? In-Reply-To: <3F61D21F.7040708@videotron.ca> References: <1ED4ECF91CDED24C8D012BCF2B034F130332ACE7@its-xchg4.massey.ac.nz> <16225.50932.176569.258930@montanaro.dyndns.org> <3F61D21F.7040708@videotron.ca> Message-ID: <16225.55013.111189.120385@montanaro.dyndns.org> Remi> I want to help you on that even if I'm not a day to day developer Remi> (on the SB project ;-( ) but sf is always a couple of hours even Remi> days behind when you try to get the files from anonymous cvs .... I'll set up an hourly tarfile on the Mojam server. I'll let you know when it's set up and where to grab the distro. Skip From tim.one at comcast.net Fri Sep 12 14:56:38 2003 From: tim.one at comcast.net (Tim Peters) Date: Fri Sep 12 14:56:40 2003 Subject: [spambayes-dev] More stats talk (warning, long) In-Reply-To: <00c301c37922$cb2abad0$f502a8c0@eden> Message-ID: [Mark Jeays] > ... > Conceptually, every email is either ham or spam. Not so in our system. About half the Unsures I get I throw away without training on, because after 30 seconds of staring at them I simply can't decide whether they're "really" ham or spam. That used to bother me a year ago, but doesn't anymore. If we were to classify all spambayes users as either "fat" or "skinny", *some* of them would get the point quicker . > A false positive occurs when a ham is categorized as spam, and a false > negative occurs when a spam is categorized as ham. That much is non-controversial. > I'm not too sure how to fit "unsures" into this scheme. Three categories don't fit into two, of course. The Unsure rate (% of email classified as unsure) is an interesting stat in its own right. The percentages of initial Unsures later trained as ham, trained as spam, and never trained, are also interesting. Pie charts come to mind. From popiel at wolfskeep.com Fri Sep 12 15:58:15 2003 From: popiel at wolfskeep.com (T. Alexander Popiel) Date: Fri Sep 12 15:58:23 2003 Subject: [spambayes-dev] sb_* in config files? In-Reply-To: Message from Skip Montanaro of "Fri, 12 Sep 2003 08:15:32 CDT." <16225.50932.176569.258930@montanaro.dyndns.org> References: <1ED4ECF91CDED24C8D012BCF2B034F130332ACE7@its-xchg4.massey.ac.nz> <16225.50932.176569.258930@montanaro.dyndns.org> Message-ID: <20030912195815.DA1822DE90@cashew.wolfskeep.com> In message: <16225.50932.176569.258930@montanaro.dyndns.org> Skip Montanaro writes: > >BTW, I trust all the developers who don't use the Outlook plugin are running >from CVS to help identify problems which need to be fixed as a result of the >"grand renaming". ;-) To be honest, I haven't upgraded in many months... while I recognize that it would be a benefit to everyone else (and, eventually, myself once this project becomes mainstream enough for .deb packaging) if I were to do so, I haven't made the time to do so. Perhaps in about a week (I'm in crunch mode at work). - Alex From tim.one at comcast.net Fri Sep 12 23:36:25 2003 From: tim.one at comcast.net (Tim Peters) Date: Fri Sep 12 23:36:33 2003 Subject: [spambayes-dev] Give up on experimental_ham_spam_imbalance_adjustment? Message-ID: experimental_ham_spam_imbalance_adjustment has been True in the Outlook addin, but still False by default everywhere else (AFAIK). I'd like to ask everyone running the Outlook addin to change it to False in their default_bayes_customize.ini file, and just live with that for a week, noting any new peculiarities. What this option is all about: We first compute a word's spamprob by counting how often the word has appeared in ham and spam messages, and then doing some arithmetic to produce a reasonable ratio between 0 and 1. Call that "by-counting" spamprob p. p can't be used directly. At the extreme, if a word was seen in one spam and no ham before, p is exactly 1, and if seen in one ham and no spam before, exactly 0. If you feed any single spamprob of 0 or 1 into the combining math, the end result will be 0 or 1, and regardless of what the other spamprobs feeding into it are. It's crazy in a statistical scheme to let one piece of evidence completely determine the outcome (that's what rule-based schemes are for, and they're brittle). Intuitively, the real problem is that our by-counting guess is *only* a guess, and has no claim to being absolute truth. It only reflects what we've trained on, so is only as reliable as that what we've trained on is a perfect prediction of what we're going to see in the future -- but it isn't, and there's no way to know in advance how far off from future reality it is. Gary Robinson gave us a slick way to deal with this: instead of using the by-counting guess p, use a weighted average of p and 0.5 (unknown_word_prob). The weight given to 0.5 is fixed at 0.45 today (unknown_word_strength), and the weight given to p is what experimental_ham_spam_imbalance_adjustment is all about. When that's False, the weight given to p is the sum of the number of ham and spam the word appeared in. When True, and there's much more spam than ham in the training data, or much more ham than spam, the weight given to p can be much smaller than the sum of the number of messages the word has appeared in. This is *trying* to account for the intuition that when training data is wildly unbalanced, we have much less reason to be confident about how reliable a by-counting spamprob guess is. One concrete example: suppose we've trained on 30,000 ham and 100 spam, and the word "fudge" appeared in 100 of those ham and none of those spam. The ratio of ham it's appeared in is then 0.003333..., the ratio of spam it's appeared in is 0, and the by-counting spamprob p is 0/(0 + 0.003333...) = 0 If we see a new message containing "fudge", how much weight should we give to that spamprob of 0? When experimental_ham_spam_imbalance_adjustment is False, we give it a weight of 100 (the total # of training msgs it's appeared in), and the weighted-average spamprob is 0.0022399 option False . When the option is True, we give it a weight of 0.333333333, and the weighted-average spamprob is then the (very! by a factor of > 128) much milder 0.287234043 option True Who knows? We've trained on so little spam (compared to ham) that they're both wild-ass guesses. Suppose the world had been a little different: all the same, except that "fudge" had appeared in a single training spam. Then the by-counting spamprob p would zoom from 0 (certain ham) to 0.75 (probably spam). That's an enormous difference for a 1-out-of-30100-messages change, and that alone is a reason for being suspicious about a spamprob as strong as 0.0022399 in the slightly different world we started with. In our new slightly different world, the straight and adjusted weighted-average guesses are 0.7488911 option False 0.686915888 option True instead. There are several things to note: 1. When the option is True, small changes in training data make smaller changes in final spamprob guesses. I happen to think that's good, but the data may not agree. 2. The difference between True and False can be gigantic when (a) there is wild imbalance; and, (b) a token has never (yet) appeared in the class with the smaller amount of training data. The difference between 0.0022399 and 0.287234043 above is huge, the difference between a very strong clue and a mild clue. 3. The difference between True and False can't be extreme for a token that's appeared in at least one ham and one spam. The difference between 0.7488911 and 0.686915888 above is real but hardly dramatic. Now in some earlier tests, some people (including me) reported better results with unbalanced training data when setting the option True. But the imbalances I tried were much milder than some of the imbalances reported by actual Outlook users (which have exceeded the factor of 300(!) in the 30000-ham-plus-100-spam example in this message). There's also a difference between most of our testing and real life: in most testing, a classifier is built and then predicts some hundreds or thousands of new messages. That's not how the Outlook client is *used*, though. In real life, a relatively small batch of messages come in and then the classifier is quickly trained on mistakes and unsures. One unfortunate consequence of setting the option True is that adding even more training data to the classification with the wildly larger number of examples has little effect. The system is already unhappy with the massive imbalance, and increasing the imbalance just causes it to give even less weight to the over-represented classification. This may actually be good for a static classifier that predicts thousands of messages without further training, but all the evidence I'm seeing from real users with wild imbalance is that it frustrates them due to the lack of instant gratification as they futilely increase the imbalance again and again. So I think this option just doesn't work in real life, but want to be a little cautious about changing the default (I seem naturally to tend toward a 2-to-1 imbalance, in favor of spam, in the 3 classifiers I use commonly, and the imbalance adjustment hasn't seemed to hurt me a bit; I'm trying it the other way not, and that doesn't seem to be hurting me a bit either). Let me try to make the core a bit clearer here: in the example at the start, the adjustment hates a spamprob as strong as 0.0022399, because as soon as "fudge" appears in the first spam, a hammy spamprob as strong as 0.0022399 has a very good chance of making the msg classify as Unsure instead of as Spam. The adjusted spamprob of 0.287234043 is very much less likely to cause such a mistake. That's what the adjustment is *trying* to accomplish, and it succeeds at that. OTOH, if ham containing "fudge" continues to come in and get classed as Unsure, training even more of those as ham doesn't do much to reduce "fudge"'s adjusted spamprob, and users get frustrated. Also, because of the way the addin gets used in real life, as soon as that first spam containing "fudge" *does* come in and get classed as Unsure (or even as Ham), the user will train on that instantly, and the unadjusted spamprob will then instantly increase from the powerful 0.0022399 to the so-so 0.7488911, and "fudge" will never cause a problem again. From T.A.Meyer at massey.ac.nz Fri Sep 12 23:40:19 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Fri Sep 12 23:40:30 2003 Subject: [spambayes-dev] Outlook toolbar error Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130332B1DD@its-xchg4.massey.ac.nz> > Attached is a tracelog of an error that I've been getting in > the Outlook addin. The error occurs any time I start Outlook > when the SpamBayes toolbar already exists. If I delete the > toolbar and restart Outlook, everything works that first > time. But on subsequent starts, SpamBayes always fails to > add the sub-items to the SpamBayes drop-down. A problem sounding a lot like this was recently fixed, yes. I can't say for sure that it's the same one, but it sounds like it. It was perhaps a week ago now that Mark checked in the (working) fix, so I would hope that current anon cvs would include it. =Tony Meyer From T.A.Meyer at massey.ac.nz Sat Sep 13 00:18:05 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Sat Sep 13 00:18:18 2003 Subject: [spambayes-dev] Give up onexperimental_ham_spam_imbalance_adjustment? Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130332B1DF@its-xchg4.massey.ac.nz> > I'd like to ask everyone running the Outlook addin to change > it to False in their default_bayes_customize.ini file, and just > live with that for a week, noting any new peculiarities. Since around April/May, I've had this option off, and I generally run with an imbalance of roughly 1:10 ham to spam [1] - it's 418:4660 at the moment. I've been happy with the results, both in terms of correct classification and training speed. Do you want people like me, who have it off, to turn it on for a week? (If I do, I'll turn off the mixed uni/bigram scheme for the week, too). I think the option tends to help with little imbalances (up to 1:5, for example), and then starts to confuse people. Unfortunately, in real life this teaches people the wrong thing - they train and things improve, so they keep doing it, and then it starts to go wrong again. If this is true (that it's good up to a certain imbalance) then the plug-in could be smart enough to disable the option if the imbalance reached a certain level - or it could warn the user that their training method isn't that good (I know, I should test whether I'm right, but I don't have the time at the moment). Or if the plug-in was even smarter ;) then it could auto-manage the corpora. If the user is training on too many spam, start automatically training on all messages that are replied to. If the user is training on too many ham, start subscribing her to junk lists . In the meantime, I think the default could be changed to False. At least the reason for things going 'wrong' is then more obvious to the people that have no idea how it works. =Tony Meyer [1] Because I'm too lazy to create a sub-corpus of my spam collection, or ham collection, so I use almost all my spam (to this address), plus misclassified mail and whatever's in the inbox at training time. From T.A.Meyer at massey.ac.nz Sat Sep 13 00:28:40 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Sat Sep 13 00:29:06 2003 Subject: [spambayes-dev] Possible enhancement Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130332B1E1@its-xchg4.massey.ac.nz> > The obvious barrier to this is > that the config settings only allow you to define the data > directory which also contains the INI files The data_directory option tells the plug-in where to find all the user-specific files, not just the ini ones. If you change it, you'll also move the default_bayes_database.db and default_message_info.db files as well. > A possible second barrier is locking of the database but I've not > reviewed the code to see if this is a problem. I'll leave Mark to answer this, but I suspect it could be, if lots of people are going to be training this database at once. Changing (in the source code) the plug-in to use one of the SQL storage methods rather than the dbm that it uses at the moment might help. Or it might not ;) Are you sure that your users will all share a common definition of ham/spam, though? I'm also not sure how having a common database helps, if you have to install the plug-in on all the users machines, plus teach them how to use (train) it. =Tony Meyer From T.A.Meyer at massey.ac.nz Sat Sep 13 00:33:08 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Sat Sep 13 00:34:31 2003 Subject: [spambayes-dev] sb_* in config files? Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130332B1E2@its-xchg4.massey.ac.nz> Tony> [1] It's either my bad or Tim Stone's that there are the 'debug' Tony> header options in 'hammie' and the 'evidence' header options in Tony> 'headers', which do the same thing. Whatever the options end up Tony> being called, I imagine that the code for doing this will be Tony> centralised at some point (if hammie used message.py, for Tony> example). Skip> As long as 1.0a5-1.0a6 will create a fair amount of breakage Skip> anyway, can we correct this now, at least within the ini file stuff? Fine by me (the names bit). Do we use the name 'debug' header, or 'evidence' header? The former is the older name, but the latter is more easily understood (IMO) for end-users. OTOH, pop3proxy users are (hopefully) not exposed to the actual option names, so may not need to notice if one changes. =Tony Meyer From T.A.Meyer at massey.ac.nz Sat Sep 13 01:15:49 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Sat Sep 13 01:18:05 2003 Subject: [spambayes-dev] pop3proxy suggestion Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130332B1E5@its-xchg4.massey.ac.nz> [adding a score column, and sorting by column] > Yes. This would make the review page much more useful. I > find it very difficult to review hundreds of messages at a > time, I usually end up just discarding everything but one or > two messages that I have to hunt around for. I've done the code to add an optional score column (Skip may also have done this independantly by now, but is hopefully too busy with his move). I'll work up the 'sort by column' code, too. I haven't checked anything in yet, but will do so once we create a branch to hold the feature frozen 1.0 (assuming that that is what we end up doing). =Tony Meyer From rob at hooft.net Sat Sep 13 06:02:43 2003 From: rob at hooft.net (Rob Hooft) Date: Sat Sep 13 06:02:54 2003 Subject: [spambayes-dev] Names and webname In-Reply-To: References: Message-ID: <3F62EB43.4010505@hooft.net> Tim Peters wrote: > [Rob Hooft] > >>2) spambayes.org is running out. We're not using it at all so far. >>Should I extend it? > > > Yes, please. If you don't want to pay for it, and/or not hassle with > ongoing registration, just say so. For example, Skip mentioned that the PSF > might be persuaded to pay the cost, and, if they don't want to, I'd be happy > to pay for it. In fact, it would cost me less overall to pay for it than to > spend time arguing the case to the PSF . > I will pay for another year. I'm prepared to pay some money for spambayes, but I would appreciate some help at some point. If you argue the case with the PSF, I will sign my rights away to them happily. The domain is at Gandi.net Rob -- Rob W.W. Hooft || rob@hooft.net || http://www.hooft.net/people/rob/ From skip at pobox.com Sat Sep 13 10:01:22 2003 From: skip at pobox.com (Skip Montanaro) Date: Sat Sep 13 10:02:02 2003 Subject: [spambayes-dev] sb_* in config files? In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F130332B1E2@its-xchg4.massey.ac.nz> References: <1ED4ECF91CDED24C8D012BCF2B034F130332B1E2@its-xchg4.massey.ac.nz> Message-ID: <16227.9010.23698.484789@montanaro.dyndns.org> Tony> Fine by me (the names bit). Do we use the name 'debug' header, or Tony> 'evidence' header? +1 for 'evidence'. 'debug' sounds too much like something normal users wouldn't approach although them might well be interested in the clues. Skip From skip at pobox.com Sat Sep 13 10:04:52 2003 From: skip at pobox.com (Skip Montanaro) Date: Sat Sep 13 10:05:18 2003 Subject: [spambayes-dev] pop3proxy suggestion In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F130332B1E5@its-xchg4.massey.ac.nz> References: <1ED4ECF91CDED24C8D012BCF2B034F130332B1E5@its-xchg4.massey.ac.nz> Message-ID: <16227.9220.799129.810173@montanaro.dyndns.org> Tony> I've done the code to add an optional score column (Skip may also Tony> have done this independantly by now, but is hopefully too busy Tony> with his move). I'll work up the 'sort by column' code, too. A score column and 'sort by column' would be great. I'm not terribly overwhelmed by the move at the moment. We've exited our old house and are living at my mother-in-law's while we wait for the new house's remodelling to be "complete enough" to throw some mattresses on the floor. In the meantime, the only decent Internet access I have is at work. AOL on Windows 95 over a dialup line at my mum-in-law's just doesn't cut it. ;-) Skip From tim.one at comcast.net Sat Sep 13 20:24:24 2003 From: tim.one at comcast.net (Tim Peters) Date: Sat Sep 13 20:24:29 2003 Subject: [spambayes-dev] Give up onexperimental_ham_spam_imbalance_adjustment? In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F130332B1DF@its-xchg4.massey.ac.nz> Message-ID: [Tony Meyer] > Since around April/May, I've had this option off, Why did you turn it off? > and I generally run with an imbalance of roughly 1:10 ham to spam [1] - > it's 418:4660 at the moment. I've been happy with the results, both > in terms of correct classification and training speed. It's good know that the classification is OK for you. The option has no effect on training speed (the database built is identical regardless of the option's value -- anyone thinking of switching it off or on should know that you don't need to retrain -- you can switch as often as you like without retraining). Scoring is provably (although perhaps not measurably!) a little slower with the option True. > Do you want people like me, who have it off, to turn it on for a week? Nope. Since it's the default in the Outlook client, and I'm suggesting to change that default (or remove the code entirely), the most interesting question is whether changing it True -> False hurts anyone. > (If I do, I'll turn off the mixed uni/bigram scheme for the week, > too). > > I think the option tends to help with little imbalances (up to 1:5, > for example), It was tested earlier and the results were mixed. Unfortunately, that was around the time I got yanked from the project, and it was left hanging in that ambiguous state. We've been lax since then about getting loser code out of the codebase. > and then starts to confuse people. That part is demonstrably true . > Unfortunately, in real life this teaches people the wrong thing - they > train and things improve, so they keep doing it, and then it starts to > go wrong again. If this is true (that it's good up to a certain > imbalance) then the plug-in could be smart enough to disable the option > if the imbalance reached a certain level - or it could warn the user > that their training method isn't that good (I know, I should test > whether I'm right, but I don't have the time at the moment). Tests were run on imbalance before this option existed, and we already know imbalance hurts, at least for cross-validation kinds of tests. But those differ from real-life training patterns in ways covered last time. Rob and I started running tests closer to real-life use (like modeling time-ordered mistake-based training), and at least I was surprised at how well they performed. I didn't run any tests like that with an eye on imbalance, though. > Or if the plug-in was even smarter ;) then it could auto-manage the > corpora. If the user is training on too many spam, start > automatically training on all messages that are replied to. If the > user is training on too many ham, start subscribing her to junk lists > . > > In the meantime, I think the default could be changed to False. At > least the reason for things going 'wrong' is then more obvious to the > people that have no idea how it works. If switching True -> False doesn't generate any "whoa! it's killing me!" reports from people currently using True, I'm more inclined to purge the code supporting experimental_ham_spam_imbalance_adjustment. It's limited to Classifier.probability(), and getting rid of the code would speed all scoring (albeit perhaps not measurably). From T.A.Meyer at massey.ac.nz Sun Sep 14 19:47:08 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Sun Sep 14 19:47:23 2003 Subject: [spambayes-dev] Give up onexperimental_ham_spam_imbalance_adjustment? Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130332B3BE@its-xchg4.massey.ac.nz> > [Tony Meyer] > > Since around April/May, I've had this option off, > > Why did you turn it off? I was getting a lot of unsures and figured out (read a post?) that turning the option off might help. It immediately helped matters, which meant that I didn't fall back to plan 2 (retrain with equal numbers). > The option has no effect on training speed [...] I meant 'training speed' as in the number of messages that have to be trained in order for a similar (to human eyes) message to be correctly classified. A different phrase would have made that clearer, but I can't think of one to use :) [...] > We've been lax since then about getting loser code out of the > codebase. I've occasionally wondered if there's any point keeping the gary_combining_scheme stuff in there. No-one's using that any more, are they? =Tony Meyer From tim.one at comcast.net Sun Sep 14 20:15:12 2003 From: tim.one at comcast.net (Tim Peters) Date: Sun Sep 14 20:17:04 2003 Subject: [spambayes-dev] Give up onexperimental_ham_spam_imbalance_adjustment? In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F130332B3BE@its-xchg4.massey.ac.nz> Message-ID: [Tony Meyer] >>> Since around April/May, I've had this option off, [Tim] >> Why did you turn it off? [Tony] > I was getting a lot of unsures and figured out (read a post?) that > turning the option off might help. It immediately helped matters, > which meant that I didn't fall back to plan 2 (retrain with equal > numbers). That's highly relevant, then! The option has no effect if there is in fact an equal number of ham and spam training msgs, doesn't appear to make a spit's worth of difference in my 2::1 spam::ham classifier, and made trouble for you. >> The option has no effect on training speed > I meant 'training speed' as in the number of messages that have to be > trained in order for a similar (to human eyes) message to be correctly > classified. A different phrase would have made that clearer, but I > can't think of one to use :) Let's call it training efficiency: how much good you get out of a fixed (but secret ) number of training messages. For example, I'm sure the scheme mixing unigrams and bigrams has higher training efficiency than the current unigram-only scheme, although given *enough* training data all previous experiments didn't show higher accuracy either way. Enabling experimental_ham_spam_imbalance_adjustment has had a bad effect on training efficiency for a number of people with unbalanced training data (including you). I think it would still be better to get the training data in balance, but since it's not easy to force people to do that, it's tilting at windmills. >> We've been lax since then about getting loser code out of the >> codebase. > I've occasionally wondered if there's any point keeping the > gary_combining_scheme stuff in there. No-one's using that any more, > are they? We agreed to get rid of that long ago, IIRC around the time of the first alpha release. I guess it's just that nobody has gotten around to it yet. Feel encouraged. From adam.walker at rbwconsulting.com Sun Sep 14 20:29:14 2003 From: adam.walker at rbwconsulting.com (Adam Walker) Date: Sun Sep 14 20:29:23 2003 Subject: [spambayes-dev] Documentation page on the website. Message-ID: <20030915002920.24B8C13E254@sack.dreamhost.com> The links for the "about" and "troubleshooting guide" on the documentation page of the website both end in "rev=HEAD&content-type=text/plain" rather than "rev=HEAD" which means the they view the page source rather than the page. From T.A.Meyer at massey.ac.nz Sun Sep 14 20:39:06 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Sun Sep 14 20:39:38 2003 Subject: [spambayes-dev] Documentation page on the website. Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1303471AE8@its-xchg4.massey.ac.nz> > The links for the "about" and "troubleshooting guide" on the > documentation page of the website both end in > "rev=HEAD&content-type=text/plain" rather than "rev=HEAD" > which means the they view the page source rather than the page. Fixed. (You could have done this, too, you know...). (The old version displayed fine in IE and Opera, although not in Mozilla.) =Tony Meyer From adam.walker at rbwconsulting.com Sun Sep 14 20:42:52 2003 From: adam.walker at rbwconsulting.com (Adam Walker) Date: Sun Sep 14 20:43:07 2003 Subject: [spambayes-dev] Documentation page on the website. In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1303471AE8@its-xchg4.massey.ac.nz> Message-ID: <20030915004255.4217713E236@sack.dreamhost.com> > -----Original Message----- > Fixed. (You could have done this, too, you know...). True, I could have fixed the page, but I don't how to "build the site". > > (The old version displayed fine in IE and Opera, although not in > Mozilla.) Ah, I use Mozilla Firebird. From adam.walker at rbwconsulting.com Sun Sep 14 20:51:04 2003 From: adam.walker at rbwconsulting.com (Adam Walker) Date: Sun Sep 14 20:51:13 2003 Subject: [spambayes-dev] Give up onexperimental_ham_spam_imbalance_adjustment? In-Reply-To: Message-ID: <20030915005107.08EE613E222@sack.dreamhost.com> After switching to False on Friday, I've suffered no ill effects. I have 2.4:1 imbalance in ham's favor. > -----Original Message----- > > experimental_ham_spam_imbalance_adjustment has been True in the Outlook > addin, but still False by default everywhere else (AFAIK). From T.A.Meyer at massey.ac.nz Sun Sep 14 21:12:18 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Sun Sep 14 21:12:32 2003 Subject: [spambayes-dev] Give up onexperimental_ham_spam_imbalance_adjustment? Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1303471B24@its-xchg4.massey.ac.nz> > Let's call it training efficiency: how much good you get out > of a fixed (but secret ) number of training messages. Thanks, that's a nice term. > I think it would still be better to get the training data in balance, > but since it's not easy to force people to do that, it's tilting > at windmills. I still wonder if there is some way to do this automatically, but I wonder if it might end up being even more confusing. [Tony] > I've occasionally wondered if there's any point keeping the > gary_combining_scheme stuff in there. No-one's using that any more, > are they? [Tim] > We agreed to get rid of that long ago, IIRC around the time > of the first alpha release. I guess it's just that nobody > has gotten around to it yet. Feel encouraged. Done (it was just the code in classifier.py and the option, right?). I tagged it with 'Last-Gary' in case anyone decides to go back to it. =Tony Meyer From T.A.Meyer at massey.ac.nz Sun Sep 14 21:34:15 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Sun Sep 14 21:34:37 2003 Subject: [spambayes-dev] sb_* in config files? Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1303471B50@its-xchg4.massey.ac.nz> Tony> Fine by me (the names bit). Do we use the name 'debug' header, or Tony> 'evidence' header? Skip> +1 for 'evidence'. 'debug' sounds too much like something normal Skip> users wouldn't approach although them might well be interested in Skip> the clues. That was my thought, too. On a similar note, the more I think about it, the more I think Skip is right in saying that since we're breaking all sorts of backwards compatibility at this point, we shouldn't hold off doing more that we can see will be necessary later. So I'd like to: * Change hammie to use the 'evidence' options instead of the 'debug' ones, as above. * Move the 'notate_to' and 'notate_subject' options out of the 'pop3proxy' section into the 'Headers' section. * Move the ham/spam/unknown cache options out of the 'pop3proxy' section into the 'Storage' section. I'm happy to do the edits, although I can only test sb_server (pop3proxy), sb_imapfilter and sb_pop3dnd, not sb_filter (hammiefilter). These will all break config files for anyone using non-default values for those options, though. Thoughts? BTW, apart from this there doesn't seem to be anything stopping the 1.0a6 release (which I planned for last Thursday ;) so once there seems to be an agreement on the above, I'll package it together and release it. =Tony Meyer From mhammond at skippinet.com.au Sun Sep 14 21:55:38 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Sun Sep 14 22:01:46 2003 Subject: [spambayes-dev] Test windows binaries available Message-ID: <006b01c37b2c$76a6db90$f502a8c0@eden> Just for the sake of getting *anything* out for comment, I have put up http://starship.python.net/crew/mhammond/downloads/SpamBayes_TestBinaries.zi p This is *not* intended for users (but it should be close!) This zip file contains a single directory with the following programs: * pop3proxy_tray.exe - a Windows GUI program * sb_server.exe - a console program (so presumably for "test" purposes only) * pop3proxy_service.exe - A Windows service - "-install" to install it * spambayes_addin.dll.dll - The Outlook addin (the extra ".dll" is also a mistake, but it still works :) Use "regsvr32.exe" to register. * outlook_dump_props.exe - The Outlook "dump_props.py" program, which could be useful to have in the field for bug diagnosis. * manager.exe - created from Outlook's "manager.py", and I forgot to remove it - was just for test purposes. All that for a 2.8Mb .zip :) Known problems: * No Outlook docs are installed. * This is all built from CVS - no smarts have been added for the "Application Data" directory etc. You will need to move/copy your existing database and configuration to the directory where you run these executables from. * pop3proxy_tray.exe redirects its output similarly to Outlook (%TEMP%\SpamBayesServer1.log). The service doesn't redirect - it probably should. Features: * Built using py2exe (see spambayes\windows\py2exe\readme.txt). To build the service, you need win32all built from sources (which we hope to fix), but the rest should build OK. * pop3proxy_tray.exe has all icon resources built into the executable - ditto for the bitmap resources used by the Outlook addin. * All .pyc files are shared in a single "library.zip". This means a nice small distribution, new executables are very cheap, and we could even "patch" simple upgrades by updating the .zip file of an existing installation!! It would be good if a few people could suck it and see what happens :) Mark. From T.A.Meyer at massey.ac.nz Mon Sep 15 00:05:08 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Mon Sep 15 00:05:24 2003 Subject: [spambayes-dev] Test windows binaries available Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1303471C67@its-xchg4.massey.ac.nz> > It would be good if a few people could suck it and see what happens :) Results below. > This zip file contains a single directory with the following programs: > * pop3proxy_tray.exe - a Windows GUI program If I choose "exit spambayes", nothing happens. (I have to kill it with the task manager). If I choose "stop spambayes", nothing happens. However, this doesn't work for me with the source version either. > * sb_server.exe - a console program (so presumably for "test" > purposes only) This seems to work fine for me. > * pop3proxy_service.exe - A Windows service - "-install" to install it I can install this, but not start it unless I use "-debug". I tried to stop it with cntl-c and got: """ Stopping debug service. The service is still shutting down... [that line 35 more times] Error 0xC000000B - The Python service control handler failed. File "win32serviceutil.pyc", line 649, in ServiceCtrlHandler File "pop3proxy_service.pyc", line 89, in SvcStop File "sb_server.pyc", line 770, in stop File "urllib2.pyc", line 136, in urlopen File "urllib2.pyc", line 333, in open File "urllib2.pyc", line 313, in _call_chain File "urllib2.pyc", line 849, in http_open File "urllib2.pyc", line 843, in do_open File "urllib2.pyc", line 359, in error File "urllib2.pyc", line 313, in _call_chain File "urllib2.pyc", line 419, in http_error_default urllib2.HTTPError: HTTP Error 503: Service Unavailable The service is still shutting down... [that line 23 more times] The worker failed to stop - aborting it anyway Info 0x40001004 - The pop3proxy service has stopped after 0 sessions (0 ham, 0 s pam, 0 unsure). """ Does that matter? Stopping it via the ui worked fine. > * outlook_dump_props.exe - The Outlook "dump_props.py" > program, which could be useful to have in the field for bug diagnosis. I get a "can't find default message store" if I run this without specifying a folder. Does that matter? =Tony Meyer From T.A.Meyer at massey.ac.nz Mon Sep 15 00:11:27 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Mon Sep 15 00:11:44 2003 Subject: [spambayes-dev] Test windows binaries available Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1303471C78@its-xchg4.massey.ac.nz> > * This is all built from CVS - no smarts have been added for > the "Application Data" directory etc. You will need to > move/copy your existing database and configuration to the > directory where you run these executables from. One other thing I noticed when cleaning up. The service looks for all of this in the directory's parent, not the directory itself. Adding the smarts would fix that, of course. Speaking of those smarts, to implement this, are we just running a script that sets the appropriate options in the bayescustomize.ini file? And then setting the BAYESCUSTOMIZE envar to the location of that file? Or is there something more complicated planned? =Tony Meyer From mhammond at skippinet.com.au Mon Sep 15 01:18:03 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Mon Sep 15 01:18:11 2003 Subject: [spambayes-dev] Test windows binaries available In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1303471C67@its-xchg4.massey.ac.nz> Message-ID: <00cc01c37b48$bd8afc50$f502a8c0@eden> > > This zip file contains a single directory with the > following programs: > > * pop3proxy_tray.exe - a Windows GUI program > > If I choose "exit spambayes", nothing happens. (I have to > kill it with > the task manager). > If I choose "stop spambayes", nothing happens. However, this doesn't > work for me with the source version either. Please see the log file . > > * pop3proxy_service.exe - A Windows service - "-install" to > install it > > I can install this, but not start it unless I use "-debug". > I tried to > stop it with cntl-c and got: > """ > Stopping debug service. > The service is still shutting down... > [that line 35 more times] > Error 0xC000000B - The Python service control handler failed. Probably the same error that the try got trying to shutdown. How did you try and start it? "net start pop3proxy"? Note I just saw pop3proxy_tray etc get confused with multiple proxies running. > I get a "can't find default message store" if I run this without > specifying a folder. Does that matter? Not to me >From the next mail: > Speaking of those smarts, to implement this, are we just running a > script that sets the appropriate options in the bayescustomize.ini file? > And then setting the BAYESCUSTOMIZE envar to the location of that file? > Or is there something more complicated planned? I've no idea - but presumably we want all programs to use the same logic, so a "script" doesn't make much sense. Further, we probably still want some kind of mutex to prevent multiple apps accidently starting. It may make sense to have log redirection logic there too. Someone will have to "just do it" :) Mark. -------------- next part -------------- A non-text attachment was scrubbed... Name: winmail.dat Type: application/ms-tnef Size: 2564 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes-dev/attachments/20030915/049a324c/winmail.bin From vanhorn at whidbey.com Mon Sep 15 01:34:26 2003 From: vanhorn at whidbey.com (G. Armour Van Horn) Date: Mon Sep 15 01:34:46 2003 Subject: [spambayes-dev] New behaviour? References: <00cc01c37b48$bd8afc50$f502a8c0@eden> Message-ID: <3F654F62.A5B0112D@whidbey.com> I was away from the computer for an hour or so just now, and when I came back there were a couple of other messages and three or four messages from this list. I read them, then went to the UI to review, and there were only two hams listed, neither of which were the messages about the "Test windows binaries". I dug them out, and they had X-SpamBayes headers, but they weren't on the review page, even after a refresh. Are the most solidly hams left off, since they don't need training? The two messages I looked at had X-Spambayes-Spam-Probability: 1.11022302463e-016 and 0.0 (not rounded, really 0.0). Is this new with 1a5? Or am I dealing with something odd in the proxy? (Yes, pop3proxy, Win2K) Van -- ---------------------------------------------------------- Sign up now for Quotes of the Day, a handful of quotations on a theme delivered every morning. Enlightenment! Daily, for free! mailto:twisted@whidbey.com?subject=Subscribe_QOTD For web hosting and maintenance, visit Van's home page: http://www.domainvanhorn.com/van/ ---------------------------------------------------------- From T.A.Meyer at massey.ac.nz Mon Sep 15 02:02:26 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Mon Sep 15 02:02:37 2003 Subject: [spambayes-dev] Test windows binaries available Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1303471CE9@its-xchg4.massey.ac.nz> > > If I choose "exit spambayes", nothing happens. > Please see the log file . Your recent checkin fixes this (well, in source, so presumably also the binary). > How did you try and start it? "net start pop3proxy"? No, but if I thought for a bit longer I might have done ;) Running "pop3proxy_service" gave me a message saying that it was trying to start it, so I thought it would do it for me. "net start pop3proxy" works fine (as did starting it manually in the windows Services GUI). > > Speaking of those smarts, to implement this, are we just running a > > script that sets the appropriate options in the bayescustomize.ini > > file? > > And then setting the BAYESCUSTOMIZE envar to the location of that file? > > Or is there something more complicated planned? > > I've no idea - but presumably we want all programs to use the > same logic, so a "script" doesn't make much sense. I meant a setup script, which the installer would run. It could create the appropriate bayescustomize file, which tells pop3proxy where to find everything. I presume it could also set the BAYESCUSTOMIZE envar to the location of that file, which would mean that the settings were shared by all the programs (except Outlook? If BAYESCUSTOMIZE is set, does the plug-in load it as well as the default_bayes_customize.ini file?). If we want to avoid the envar, then I guess we need to modify pop3proxy_tray and pop3proxy_service to load in the ini from the appropriate place. > Further, > we probably still want some kind of mutex to prevent multiple > apps accidently starting. It may make sense to have log > redirection logic there too. Someone will have to "just do it" :) I don't mind having a go at writing this, although: 1. I only half get how this should be done, so I'll need a lot of kicking towards the goal. 2. I won't have time this week, unless other work goes amazingly well. =Tony Meyer From Andrew.Stickland at perwill.com Mon Sep 15 04:21:55 2003 From: Andrew.Stickland at perwill.com (Andrew Stickland) Date: Mon Sep 15 04:22:09 2003 Subject: [spambayes-dev] Possible enhancement Message-ID: Tony, Thanks for your response. I understand that changing the "data_directory" setting would affect config and database which was why I suggested the possiablity of having two settings. We think there is a huge common ground on SPAM and most of the users won't bother with the 'training' in any case. Regards Andrew Stickland phone: +44 (0)1420 545031 mobile: +44 (0)7736 557126 -----Original Message----- From: Meyer, Tony [mailto:T.A.Meyer@massey.ac.nz] Sent: 13 September 2003 05:29 To: Andrew Stickland; spambayes-dev@python.org Subject: RE: [spambayes-dev] Possible enhancement > The obvious barrier to this is > that the config settings only allow you to define the data > directory which also contains the INI files The data_directory option tells the plug-in where to find all the user-specific files, not just the ini ones. If you change it, you'll also move the default_bayes_database.db and default_message_info.db files as well. > A possible second barrier is locking of the database but I've not > reviewed the code to see if this is a problem. I'll leave Mark to answer this, but I suspect it could be, if lots of people are going to be training this database at once. Changing (in the source code) the plug-in to use one of the SQL storage methods rather than the dbm that it uses at the moment might help. Or it might not ;) Are you sure that your users will all share a common definition of ham/spam, though? I'm also not sure how having a common database helps, if you have to install the plug-in on all the users machines, plus teach them how to use (train) it. =Tony Meyer ******************************************************* This email has originated from Perwill plc (Registration No. 1906964) Office registered at: 13A Market Square, Alton, Hampshire, GU34 1UR, UK Tel: +44 (0)1420 545000 Fax: +44 (0)1420 545001 www.perwill.com ******************************************************* Privileged, confidential and/or copyright information may be contained in this email, and is only for the use of the intended addressee. To copy, forward, disclose or otherwise use it in any way if you are not the intended recipient or responsible for delivering to him/her is prohibited. If you receive this email by mistake, please advise the sender immediately, by using the reply facility in your email software. We may monitor the content of emails sent and received via our network for the purposes of ensuring compliance with policies and procedures. This message is subject to and does not create or vary any contractual relationships between Perwill plc and the recipient. ******************************************************* Any opinions expressed in the email are those of the sender and not necessarily of Perwill plc. ******************************************************* This email has been scanned for known viruses using McAfee WebShield 4.5 MR1a ******************************************************* From T.A.Meyer at massey.ac.nz Mon Sep 15 05:51:59 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Mon Sep 15 05:52:13 2003 Subject: [spambayes-dev] pop3proxy suggestion Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1303471CF0@its-xchg4.massey.ac.nz> > A score column and 'sort by column' would be great. I've updated the code so that: * You provide a list (in the options) of headers you want displayed, which defaults to ("Subject", "From"), but can include any. (This removes the need for the display_to option). * You can include a column with the score. * You can include a column with the date received (unlikely to be of much use, but 99% of the code was needed for other things). * If you click a column heading messages are sorted by that column. A second click reverses the sorting. * You can show clues *or tokens* for a message. * If they are available (i.e. if the right header options are set), the score & clues that the message originally received are displayed in 'show clues' as well as current score/clues. * Show clues/tokens includes the ham/spam count for that token, like the Outlook show clues does. * An advanced version of the 'find token' boxes (on the front page) is available (hidden by default). * A much improved version of the 'find message' box (on the front page) is available (to search for a message by subject, and so on). Once 1.0a6 is out, I'll create a branch for any remaining 1.0 bugfixes (as suggested) and then check all of the above into the main branch. =Tony Meyer From skip at pobox.com Mon Sep 15 09:48:39 2003 From: skip at pobox.com (Skip Montanaro) Date: Mon Sep 15 09:48:56 2003 Subject: [spambayes-dev] sb_* in config files? In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1303471B50@its-xchg4.massey.ac.nz> References: <1ED4ECF91CDED24C8D012BCF2B034F1303471B50@its-xchg4.massey.ac.nz> Message-ID: <16229.49975.807603.946000@montanaro.dyndns.org> Tony> I'm happy to do the edits, although I can only test sb_server Tony> (pop3proxy), sb_imapfilter and sb_pop3dnd, not sb_filter Tony> (hammiefilter). I can test the last one. Send me a note when you have something you'd like checked. Skip From kennypitt at hotmail.com Mon Sep 15 09:49:54 2003 From: kennypitt at hotmail.com (Kenny Pitt) Date: Mon Sep 15 09:50:29 2003 Subject: [spambayes-dev] Outlook toolbar error In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F130332B1DD@its-xchg4.massey.ac.nz> Message-ID: <000001c37b90$3fbc54c0$300a10ac@spidynamics.com> Meyer, Tony wrote: >> Attached is a tracelog of an error that I've been getting in >> the Outlook addin. The error occurs any time I start Outlook >> when the SpamBayes toolbar already exists. If I delete the >> toolbar and restart Outlook, everything works that first >> time. But on subsequent starts, SpamBayes always fails to >> add the sub-items to the SpamBayes drop-down. > > A problem sounding a lot like this was recently fixed, yes. I can't > say for sure that it's the same one, but it sounds like it. It was > perhaps a week ago now that Mark checked in the (working) fix, so I > would hope that current anon cvs would include it. > > =Tony Meyer Thanks for the update, guys. I did a "cvs update" this morning, and for the first time in 7 days it actually updated something. The drop-down seems to be working fine now. -- Kenny Pitt From vanhorn at whidbey.com Mon Sep 15 10:53:00 2003 From: vanhorn at whidbey.com (G. Armour Van Horn) Date: Mon Sep 15 10:53:41 2003 Subject: [spambayes-dev] pop3proxy suggestion References: <1ED4ECF91CDED24C8D012BCF2B034F1303471CF0@its-xchg4.massey.ac.nz> Message-ID: <3F65D24C.D6800F87@whidbey.com> Wow! That's quite a collection of improvements. Can I raise another possibility while you're at it? Specifically, for those whose H/S balance isn't balanced, how about an option to set the default training behaviour for each category? Obviously, you will always want to train on Unsures, but if you have a filter that's running 8 to 1 in Ham vs Spam (and I have one of those) you may want to start discarding the predominant category for a while, and setting that once a week or month is more likely to work than having to click the Discard header every training session. Van "Meyer, Tony" wrote: > > A score column and 'sort by column' would be great. > > I've updated the code so that: > > * You provide a list (in the options) of headers you want displayed, > which defaults to ("Subject", "From"), but can include any. (This > removes the need for the display_to option). > > * You can include a column with the score. > > * You can include a column with the date received (unlikely to be of > much use, but 99% of the code was needed for other things). > > * If you click a column heading messages are sorted by that column. A > second click reverses the sorting. > > * You can show clues *or tokens* for a message. > > * If they are available (i.e. if the right header options are set), > the score & clues that the message originally received are displayed in > 'show clues' as well as current score/clues. > > * Show clues/tokens includes the ham/spam count for that token, like > the Outlook show clues does. > > * An advanced version of the 'find token' boxes (on the front page) is > available (hidden by default). > > * A much improved version of the 'find message' box (on the front > page) is available (to search for a message by subject, and so on). > > Once 1.0a6 is out, I'll create a branch for any remaining 1.0 bugfixes > (as suggested) and then check all of the above into the main branch. > > =Tony Meyer > > _______________________________________________ > spambayes-dev mailing list > spambayes-dev@python.org > http://mail.python.org/mailman/listinfo/spambayes-dev -- ---------------------------------------------------------- Sign up now for Quotes of the Day, a handful of quotations on a theme delivered every morning. Enlightenment! Daily, for free! mailto:twisted@whidbey.com?subject=Subscribe_QOTD For web hosting and maintenance, visit Van's home page: http://www.domainvanhorn.com/van/ ---------------------------------------------------------- From jeremy at alum.mit.edu Mon Sep 15 15:19:30 2003 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Mon Sep 15 15:19:33 2003 Subject: [spambayes-dev] pop3proxy suggestion In-Reply-To: <3F65D24C.D6800F87@whidbey.com> References: <1ED4ECF91CDED24C8D012BCF2B034F1303471CF0@its-xchg4.massey.ac.nz> <3F65D24C.D6800F87@whidbey.com> Message-ID: <1063653570.2093.156.camel@localhost.localdomain> On Mon, 2003-09-15 at 10:53, G. Armour Van Horn wrote: > Specifically, for those whose H/S balance isn't balanced, how about an > option to set the default training behaviour for each category? Obviously, > you will always want to train on Unsures, but if you have a filter that's > running 8 to 1 in Ham vs Spam (and I have one of those) you may want to > start discarding the predominant category for a while, and setting that once > a week or month is more likely to work than having to click the Discard > header every training session. I only train on unsures and false positive/negatives. I'd certainly be happy if I could set discard to be the default for ham and spam. When you're training on 100s of messages, it takes several seconds to switch all the entries. Jeremy From vanhorn at whidbey.com Mon Sep 15 15:29:50 2003 From: vanhorn at whidbey.com (G. Armour Van Horn) Date: Mon Sep 15 15:30:10 2003 Subject: [spambayes-dev] pop3proxy suggestion References: <1ED4ECF91CDED24C8D012BCF2B034F1303471CF0@its-xchg4.massey.ac.nz> <3F65D24C.D6800F87@whidbey.com> <1063653570.2093.156.camel@localhost.localdomain> Message-ID: <3F66132E.E93FBBD9@whidbey.com> Were you not aware that you can click on the word "Discard" and have it set that for all the messages in that section? Not knowing that, you must *really* want my change! But I still do, I want to zip in and check the Unsures and any errors, then train on those and the Ham so I can get back into balance. I would like to set Discard as the default for Spam. Others might want to set that for Ham, or for both Ham and Spam. I do worry a little about not training on any current messages, I think over time we're going to wish for a way to dump old clues as they have to dilute the potency of recent clues. For now, I'd just like to make it a little more likely that I'll follow my chosen training regimen automatically rather than have to remember to set it every single time. Van Jeremy Hylton wrote: > On Mon, 2003-09-15 at 10:53, G. Armour Van Horn wrote: > > Specifically, for those whose H/S balance isn't balanced, how about an > > option to set the default training behaviour for each category? Obviously, > > you will always want to train on Unsures, but if you have a filter that's > > running 8 to 1 in Ham vs Spam (and I have one of those) you may want to > > start discarding the predominant category for a while, and setting that once > > a week or month is more likely to work than having to click the Discard > > header every training session. > > I only train on unsures and false positive/negatives. I'd certainly be > happy if I could set discard to be the default for ham and spam. When > you're training on 100s of messages, it takes several seconds to switch > all the entries. > > Jeremy -- ---------------------------------------------------------- Sign up now for Quotes of the Day, a handful of quotations on a theme delivered every morning. Enlightenment! Daily, for free! mailto:twisted@whidbey.com?subject=Subscribe_QOTD For web hosting and maintenance, visit Van's home page: http://www.domainvanhorn.com/van/ ---------------------------------------------------------- From T.A.Meyer at massey.ac.nz Mon Sep 15 18:16:33 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Mon Sep 15 18:17:02 2003 Subject: [spambayes-dev] RE: [Spambayes-checkins] spambayes/spambayes Options.py, 1.72, 1.73 classifier.py, 1.9, 1.10 Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1303471DFB@its-xchg4.massey.ac.nz> > Should we also drop the 'use_chi_squared_combining" option? IMO, no. There's a chance that down the line someone will come up with an alternative combining method and want to test that out. Having the framework in place to easily swap between the two makes things easier then, and doesn't really cost anything now. > Currently, setting 'use_chi_squared_combining' to false > leaves you with no Classifier.spamprob() implementation. That was also the case before (if you set use_chi_squared combining to False, and use_gary_combining to False). It kinda fits too, as that's what you've set. A case could me made for a default, but I doubt it really matters since only people that know what they are doing with the code are likely to much about with the combining method. (A good solution, IMO, would be to have a new option 'combining_method', which defaulted to 'chi2_spamprob', but could be the name of any classifier function). =Tony Meyer From T.A.Meyer at massey.ac.nz Mon Sep 15 19:04:44 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Mon Sep 15 19:05:03 2003 Subject: [spambayes-dev] pop3proxy suggestion Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1303471E50@its-xchg4.massey.ac.nz> > Wow! That's quite a collection of improvements. These were mostly things that were requested around the time the feature freeze started, which I didn't have time to get to then. > Can I raise another possibility while you're at it? Sure :) > Specifically, for those whose H/S balance isn't balanced, how > about an option to set the default training behaviour for > each category? This sounds reasonable - various people have asked for something along these lines, too. I've done this too, so I'll check it in with the rest. =Tony Meyer From T.A.Meyer at massey.ac.nz Mon Sep 15 21:43:40 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Mon Sep 15 21:45:19 2003 Subject: [spambayes-dev] sb_* in config files? Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1303471F50@its-xchg4.massey.ac.nz> Tony> I'm happy to do the edits, although I can only test sb_server Tony> (pop3proxy), sb_imapfilter and sb_pop3dnd, not sb_filter Tony> (hammiefilter). Skip> I can test the last one. Send me a note when you have Skip> something you'd like checked. Attached are the necessary diffs. These should let you test sb_filter and (if you have a chance) the sb_server with sb_upload setup as well. Thanks, Tony -------------- next part -------------- A non-text attachment was scrubbed... Name: sb_server.diff Type: application/octet-stream Size: 4553 bytes Desc: sb_server.diff Url : http://mail.python.org/pipermail/spambayes-dev/attachments/20030916/896ca315/sb_server.obj -------------- next part -------------- A non-text attachment was scrubbed... Name: hammie.diff Type: application/octet-stream Size: 813 bytes Desc: hammie.diff Url : http://mail.python.org/pipermail/spambayes-dev/attachments/20030916/896ca315/hammie.obj -------------- next part -------------- A non-text attachment was scrubbed... Name: message.diff Type: application/octet-stream Size: 1143 bytes Desc: message.diff Url : http://mail.python.org/pipermail/spambayes-dev/attachments/20030916/896ca315/message.obj -------------- next part -------------- A non-text attachment was scrubbed... Name: ProxyUI.diff Type: application/octet-stream Size: 1463 bytes Desc: ProxyUI.diff Url : http://mail.python.org/pipermail/spambayes-dev/attachments/20030916/896ca315/ProxyUI.obj From tim.one at comcast.net Mon Sep 15 22:43:29 2003 From: tim.one at comcast.net (Tim Peters) Date: Mon Sep 15 22:43:36 2003 Subject: [spambayes-dev] RE: [Spambayes-checkins] spambayes/spambayesOptions.py, 1.72, 1.73 classifier.py, 1.9, 1.10 In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1303471DFB@its-xchg4.massey.ac.nz> Message-ID: ... [Kenny Pitt] >> Currently, setting 'use_chi_squared_combining' to false >> leaves you with no Classifier.spamprob() implementation. [Meyer, Tony] > That was also the case before (if you set use_chi_squared combining to > False, and use_gary_combining to False). It kinda fits too, as that's > what you've set. A case could me made for a default, use_chi_squared_combining defaults to True now. > but I doubt it really matters since only people that know what they are > doing with the code are likely to much about with the combining method. > (A good solution, IMO, would be to have a new option 'combining_method', > which defaulted to 'chi2_spamprob', but could be the name of any > classifier function). Let's stick to the simplest thing that could possibly work, please. An enumerated type with one value (and which has had only one interesting value for a year) would be pretty severe overkill! From T.A.Meyer at massey.ac.nz Mon Sep 15 22:49:23 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Mon Sep 15 22:49:42 2003 Subject: [spambayes-dev] RE: [Spambayes-checkins] spambayes/spambayesOptions.py, 1.72, 1.73 classifier.py, 1.9, 1.10 Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1303471FCF@its-xchg4.massey.ac.nz> [Tony] > (A good solution, IMO, would be to have a new option > 'combining_method', which defaulted to 'chi2_spamprob', but > could be the name of any classifier function). [Tim] > Let's stick to the simplest thing that could possibly work, > please. An enumerated type with one value (and which has had > only one interesting value for a year) would be pretty severe > overkill! I agree. I meant that this might be a good solution if/when an alternative combining method is introduced; although that's not clear from what I wrote. I'm happy with things as they are (I did the edit, after all ). =Tony Meyer From ta-meyer at ihug.co.nz Tue Sep 16 00:58:15 2003 From: ta-meyer at ihug.co.nz (Tony Meyer) Date: Tue Sep 16 00:59:25 2003 Subject: [spambayes-dev] Saving the pickle to a temp file first Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130212AED4@its-xchg4.massey.ac.nz> I've been going through the remaining open bugs, hoping to close a couple, and I came across this one (from April): [ 715248 ] Pickle classifier should save to a temp file first TimS says that he's going to put this code into storage.py, but it's not there now. Is there a need for it? Should this tracker just be closed? =Tony Meyer From mhammond at skippinet.com.au Tue Sep 16 05:29:43 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Tue Sep 16 05:29:49 2003 Subject: [spambayes-dev] Test windows binaries available In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1303471CE9@its-xchg4.massey.ac.nz> Message-ID: <003e01c37c35$104b7f40$f502a8c0@eden> I've put another zip of binaries at http://starship.python.net/crew/mhammond/download/SpamBayes_TestBinaries.zip This version is pretty much complete! Features over the previous version: * Lastest CVS - includes latest "pop3tray controlling the service" code. * 3 sub-directories - "outlook" (with the Outlook addin, all doc files etc), "proxy" (with the 3 proxy related executables) and "lib" (with Python and all extensions etc) * Outlook addin works 100% - better than the existing binary in a couple of areas. Addin docs are all included. As with last time, it is from current CVS - no additional smarts have been added - so the behaviour should be identical to running from source code. The multiple directories will make it trivial to create an installer - the "lib" directory is always installed, and the "outlook" and "proxy" directories can simply be included or excluded in their entirity. I'm of the impression that even without a proper "installer", this .zip file should be suitable for anyone who runs the source-code version - the same basic bugs, issues and considerations will apply. Existing Outlook users: You can test it too - simply register the new DLL ("regsvr32 outlook\spambayes_addin.dll" and restart Outlook. To go back to the version you were using, just re-register the old one ("addin.py" sourcecode, "regsvr32 old_spambayes_dir\spambayes_addin.dll" for binary users.) Please let me know if you try it. Thanks, Mark. -------------- next part -------------- A non-text attachment was scrubbed... Name: winmail.dat Type: application/ms-tnef Size: 2428 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes-dev/attachments/20030916/80686b18/winmail.bin From mhammond at skippinet.com.au Tue Sep 16 08:24:09 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Tue Sep 16 08:24:07 2003 Subject: [spambayes-dev] sourceforge logo in proxy pages? Message-ID: <007301c37c4d$6e6c6770$f502a8c0@eden> How evil would it be to add the source-forge logo on the local "pages" pop3proxy et al use? Not-that-I-care-about-stinking-SF-rankings ly, Mark. From skip at pobox.com Tue Sep 16 10:06:16 2003 From: skip at pobox.com (Skip Montanaro) Date: Tue Sep 16 10:06:32 2003 Subject: [spambayes-dev] sb_* in config files? In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1303471F50@its-xchg4.massey.ac.nz> References: <1ED4ECF91CDED24C8D012BCF2B034F1303471F50@its-xchg4.massey.ac.nz> Message-ID: <16231.6360.936838.559547@montanaro.dyndns.org> Tony> Attached are the necessary diffs. I was thinking more along the lines of you sending me a note once you'd checked things in. In any case, your first patch didn't apply correctly: % patch -p0 < ~/tmp/sb_server.diff patching file sb_server.py Hunk #1 FAILED at 458. Hunk #2 FAILED at 583. Hunk #3 FAILED at 631. 3 out of 3 hunks FAILED -- saving rejects to file sb_server.py.rej probably because your diff has a mixture of line endings. (But I'm not going to mess around trying to get it to work. There's more than ample chance I would screw something up.) I think it's fine to just check in your changes and let us know so we can test them. Skip From skip at pobox.com Tue Sep 16 10:13:51 2003 From: skip at pobox.com (Skip Montanaro) Date: Tue Sep 16 10:14:07 2003 Subject: [spambayes-dev] Saving the pickle to a temp file first In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F130212AED4@its-xchg4.massey.ac.nz> References: <1ED4ECF91CDED24C8D012BCF2B034F130212AED4@its-xchg4.massey.ac.nz> Message-ID: <16231.6815.962475.809326@montanaro.dyndns.org> Tony> [ 715248 ] Pickle classifier should save to a temp file first Tony> TimS says that he's going to put this code into storage.py, but Tony> it's not there now. Is there a need for it? Should this tracker Tony> just be closed? I would leave it open. I haven't looked at the code, but the concept is good. Skip From mhammond at skippinet.com.au Tue Sep 16 10:20:12 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Tue Sep 16 10:20:14 2003 Subject: [spambayes-dev] Test windows binaries available In-Reply-To: <003e01c37c35$104b7f40$f502a8c0@eden> Message-ID: <009101c37c5d$a4824770$f502a8c0@eden> > I've put another zip of binaries at > http://starship.python.net/crew/mhammond/download/SpamBayes_Te > stBinaries.zip And yet another! > As with last time, it is from current CVS - no additional > smarts have been added - so the behaviour should be identical > to running from source code. This new version contains the smarts I just checked in for the proxy/server - new users should have their data stored under "Application Data" - existing users should continue to use whatever they used before. Mark. -------------- next part -------------- A non-text attachment was scrubbed... Name: winmail.dat Type: application/ms-tnef Size: 1820 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes-dev/attachments/20030917/98046f0b/winmail.bin From richie at entrian.com Wed Sep 17 06:04:53 2003 From: richie at entrian.com (Richie Hindle) Date: Wed Sep 17 07:07:59 2003 Subject: [spambayes-dev] Test windows binaries available Message-ID: Mark, I've tried the binaries with mixed results. 8-) I don't have time to debug any of these problems myself (sorry!), so here's a report of what happened: Running pop3proxy_service.exe, I get a box popping up: The procedure entry point ?PyWinObject_FromSECURITY_DESCRIPTOR@@YAPAU_object@@PAX@Z could not be located in the dynamic link library PyWinTypes23.dll. This is followed by a Python exception from pop3proxy_service.pyc line 24: ImportError: DLL load failed: The specified procedure could not be found. I think this is because it's picking up PyWinTypes23.dll from c:\windows\system32 rather than the lib directory. Copying pop3proxy_service.exe to the lib directory and running it from there fixed the problem (in that I get "Service 'pop3proxy' (SpamBayes Service) installed" when I do -install) Running "pop3proxy_service.exe -help" displays the help but then says: Connecting to the service control manager.... Could not start the service - error 997 "net start pop3proxy" gives: The SpamBayes Service service is starting. The SpamBayes Service service could not be started. The service did not report an error. More help is available by typing NET HELPMSG 3534. and the event log contains: The description for Event ID ( 3 ) in Source ( pop3proxy_service ) cannot be found. The local computer may not have the necessary registry information or message DLL files to display messages from a remote computer. You may be able to use the /AUXSOURCE= flag to retrieve this description; see Help and Support for details. The following information is part of the event: File "win32serviceutil.pyc", line 663, in SvcRun File "pop3proxy_service.pyc", line 118, in SvcDoRun, exceptions.AttributeError, 'NoneType' object has no attribute 'encode'. I gave up on the service at that point and ran sb_server.exe. I edited my configuration on the Configuration page, but the Save button gives me a 500: File "spambayes\ProxyUI.pyc", line 514, in reReadOptions ImportError: No module named Options The options were saved before that exception fired (ie. my bayescustomize.ini has the correct [pop3proxy] entries), so I shut down (via "Save and Shutdown" and restarted. The proxy now isn't picking up my ini file settings. The home page reports "No POP3 proxies running." and the config file doesn't have any "Remote Servers" or "SpamBayes Ports". It does say "Your options are stored in C:\Documents and Settings\rjh\Application Data\SpamBayes\Proxy\bayescustomize.ini." which contains the correct [pop3proxy] entries - bizarre. Restarting with my POP3 server specified on the command line ("sb_server mail") looks good at first: Loading database... Listener on port 110 is proxying mail:110 but the home page says "POP3 proxy running on , proxying to ." However, the proxy is running, and classifies messages fine. Training through the web works, but seems very slow and only uses 1% or 2% CPU when training on an mbox file (that could be normal - I've never looked at the load before - but it seems odd). Although my ini file is at C:\Documents and Settings\rjh\Application Data\SpamBayes\Proxy\bayescustomize.ini, the database is at C:\src\tests\spambayes_binaries\lib\hammie.db (presumably it's in lib rather than proxy because that's where I copied the exe file). The pop3proxy cache directories and spambayes.messageinfo.db are in lib as well. pop3proxy_tray.exe doesn't want to talk to my sb_server.exe at all. The commands that launch web pages work OK, but neither "Stop SpamBayes" nor "Exit SpamBayes" stops sb_server.exe. "Stop SpamBayes" does turn the icon red though. 8-) The tray command "Check for latest version" says "Error checking the latest version." SpamBayesServer1.log contains this: Traceback (most recent call last): File "pop3proxy_tray.py", line 436, in CheckVersion File "spambayes\Version.pyc", line 108, in get_version_string KeyError: 'Full Description Binary' Sorry I can't do more to help track these problems down... -- Richie Hindle richie@entrian.com From skip at pobox.com Wed Sep 17 13:54:42 2003 From: skip at pobox.com (Skip Montanaro) Date: Wed Sep 17 13:54:58 2003 Subject: [spambayes-dev] Generating a sample training database In-Reply-To: <3F689881.3080705@parducci.net> References: <20030916174601.X36425@thermonuclear.org> <16232.27992.594713.446593@montanaro.dyndns.org> <3F689881.3080705@parducci.net> Message-ID: <16232.40930.934829.840568@montanaro.dyndns.org> bill> since the skew can work both ways (should someone like tim include bill> their extracurricular activities in the ham training sample :o), bill> wouldn't it make sense to create a number of initial databases bill> with *only* spam in them and let the user train an appropriate bill> amount of ham as part of the install? anecdotal evidence suggests bill> that just about everyone has some ham laying around, yet not bill> everyone keeps spam about. In theory, however I don't think it's trivial for people using the Outlook plugin to train on a single mail message. They'd have to move several messages from valid hammy mailboxes to a new one, train on it, then move the messages back to their original locations. We'll just have to try it and see. I'll take the lead in grabbing the ham and spam and putting together a sample training database (pickle format seems easiest). If you'd like to contribute (no more than two ham and two spam per person please), forward such messages to me and make sure the Subject: includes "Sample Ham" or "Sample Spam". I will filter such messages out using procmail before SpamBayes can see them. Skip From T.A.Meyer at massey.ac.nz Wed Sep 17 19:06:57 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Wed Sep 17 19:07:10 2003 Subject: [spambayes-dev] Test windows binaries available Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1303472631@its-xchg4.massey.ac.nz> > I've put another zip of binaries at > http://starship.python.net/crew/mhammond/download/SpamBayes_TestBinaries .zip > > And yet another! Have you removed this since Richie got it? I get a 404 at the above. =Tony Meyer From T.A.Meyer at massey.ac.nz Wed Sep 17 19:08:53 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Wed Sep 17 19:09:01 2003 Subject: [spambayes-dev] Generating a sample training database Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1303472634@its-xchg4.massey.ac.nz> [Skip] > I'll take the lead in grabbing the ham and spam and putting > together a sample training database (pickle format seems > easiest). If you'd like to contribute (no more than two ham > and two spam per person please), forward such messages to me Is there any particular sort of message that we should contribute? Something extremely hammy/spammy? Something that we think is really generic? Or just any random message we click on? =Tony Meyer From mhammond at skippinet.com.au Wed Sep 17 20:19:10 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Wed Sep 17 20:19:04 2003 Subject: [spambayes-dev] Test windows binaries available In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1303472631@its-xchg4.massey.ac.nz> Message-ID: <047601c37d7a$7bd33ff0$f502a8c0@eden> > > I've put another zip of binaries at > > > http://starship.python.net/crew/mhammond/download/SpamBayes_Te > stBinaries > .zip > > > > And yet another! > > Have you removed this since Richie got it? I get a 404 at the above. http://starship.python.net/crew/mhammond/downloads/SpamBayes_TestBinaries I left out the 's' in 'downloads'. I'll probably get a new one up in a day or 2 after looking at these problems Mark. -------------- next part -------------- A non-text attachment was scrubbed... Name: winmail.dat Type: application/ms-tnef Size: 1696 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes-dev/attachments/20030918/dfc74bf7/winmail.bin From tim.one at comcast.net Wed Sep 17 22:31:16 2003 From: tim.one at comcast.net (Tim Peters) Date: Wed Sep 17 22:31:27 2003 Subject: [spambayes-dev] Generating a sample training database In-Reply-To: <16232.40930.934829.840568@montanaro.dyndns.org> Message-ID: [Skip Montanaro] > In theory, however I don't think it's trivial for people using the > Outlook plugin to train on a single mail message. They'd have to > move several messages from valid hammy mailboxes to a new one, train > on it, then move the messages back to their original locations. It's not that hard, although I don't know how many users understand all the things they can do with Outlook. For example, I keep distinct ham and spam training folders (and those are *all* I train from). When I want to train on, e.g., a selection of ham, I Ctrl-Left-Click the ones I want, and drag the multi-selection to the ham training folder while holding the right mouse button down. When the button is released, a little menu pops up asking whether I want to Move (the messages), Copy (the messages), or Cancel (forget the whole thing). I select Copy, and that's the end of it. It takes much longer to read this sentence than to perform the whole operation. There's an even simpler way to copy: drag the selection to the desired folder while holding the Ctrl key down. I can never remember that, though (when using extreme shortcuts, I always end up copying when I want to move, and vice versa), so stick to the method that asks me what I want when it's nearly over (btw, same thing (depress right button while dragging) works in Windows Explorer for copying, moving, or linking files between folders). > We'll just have to try it and see. > > I'll take the lead in grabbing the ham and spam and putting together a > sample training database Cool! Thank you. > (pickle format seems easiest). If you'd like to contribute (no more > than two ham and two spam per person please), forward such messages to > me and make sure the Subject: includes "Sample Ham" or "Sample Spam". > I will filter such messages out using procmail before SpamBayes can > see them. Offhand, I suggest disabling all header-line clue generation except for Subject line. From tim.one at comcast.net Wed Sep 17 22:35:36 2003 From: tim.one at comcast.net (Tim Peters) Date: Wed Sep 17 22:35:43 2003 Subject: [spambayes-dev] Generating a sample training database In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1303472634@its-xchg4.massey.ac.nz> Message-ID: [Skip] >> I'll take the lead in grabbing the ham and spam and putting >> together a sample training database (pickle format seems >> easiest). If you'd like to contribute (no more than two ham >> and two spam per person please), forward such messages to me [Tony Meyer] > Is there any particular sort of message that we should contribute? > Something extremely hammy/spammy? Something that we think is really > generic? Or just any random message we click on? I suggest only msgs that score 1.00 (rounded) and 0.00 (rounded) when originally received (not after training on them) -- we're trying to catch a good deal of blatant spam with a starter database, and can't fine-tune anyway. You probably don't want to forward ham containing personal details. From mhammond at skippinet.com.au Wed Sep 17 23:22:24 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Wed Sep 17 23:22:18 2003 Subject: [spambayes-dev] Generating a sample training database In-Reply-To: Message-ID: <04d201c37d94$14c03b00$f502a8c0@eden> > There's an even simpler way to copy: drag the selection to > the desired > folder while holding the Ctrl key down. I can never remember > that, though > (when using extreme shortcuts, I always end up copying when I > want to move, > and vice versa), so stick to the method that asks me what I > want when it's > nearly over (btw, same thing (depress right button while > dragging) works in > Windows Explorer for copying, moving, or linking files > between folders). I can never remember - but I just look for the little "+" sign in the drag icon - if there is a "+", then new items will be added (ie, items will be copied). Without the "+", no new items are added, so it is a move. Windows Explorer gives the same hints. Mark. From T.A.Meyer at massey.ac.nz Wed Sep 17 23:52:18 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Wed Sep 17 23:52:35 2003 Subject: [spambayes-dev] Test windows binaries available Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F13034727CA@its-xchg4.massey.ac.nz> > I've tried the binaries with mixed results. 8-) Me, too, but better than yours :) If I try to run pop3proxy_tray it immediately dies. The log has: """ Traceback (most recent call last): File "pop3proxy_tray.py", line 459, in ? File "pop3proxy_tray.py", line 455, in main File "pop3proxy_tray.py", line 194, in __init__ File "pop3proxy_tray.py", line 383, in StartStop File "pop3proxy_tray.py", line 248, in StartService pywintypes.error: (3, 'StartService', 'The system cannot find the path specified .') """ I then figured out this is because I still had the old (binary test 1) version of the service installed :) Once Mark explained how to remove a service, it worked fine. "pop3proxy_service -remove" (duh!) > Running pop3proxy_service.exe, I get a box popping up: > > The procedure entry point > ?PyWinObject_FromSECURITY_DESCRIPTOR@@YAPAU_object@@PAX@Z > could not be located in the dynamic link library PyWinTypes23.dll. I don't get this. Is this perhaps because I have a later version of the win32 extensions than Richie? > Running "pop3proxy_service.exe -help" displays the help but then says: > > Connecting to the service control manager.... > Could not start the service - error 997 Is there some way to get it to exit after displaying the help, rather than trying to start the service? I gather that it's Windows (or the win32 extensions?) that's displaying the message rather than code in pop3proxy_service.py. > "net start pop3proxy" gives: > > The SpamBayes Service service is starting. I don't get any of those problems - I get: The SpamBayes Service service was started successfully. > I edited my configuration on the Configuration page, but the > Save button gives me a 500: Me, too. I presume this is because ProxyUI does "from Options import options" instead of "from spambayes.Options import options". I've checked in a fix for this. > The proxy now isn't picking up my ini file settings. [...] > It does say "Your options are stored in C:\Documents and > Settings\rjh\Application > Data\SpamBayes\Proxy\bayescustomize.ini." which contains the > correct [pop3proxy] entries - bizarre. This is because Options.py sets up the path *after* loading in any files ;) I've checked in a fix for this, too. > Restarting with my POP3 server specified on the command line > ("sb_server mail") looks good at first: [...] > but the home page says "POP3 proxy running on , proxying to This happened in the source, too. I probably introduced it when I abstracted out the prepare/start/stop functions. I've checked in a fix. > Although my ini file is at C:\Documents and > Settings\rjh\Application > Data\SpamBayes\Proxy\bayescustomize.ini, the database is at > C:\src\tests\spambayes_binaries\lib\hammie.db Nothing sets the "Storage":"persistent_storage_file" option, so this still defaults to the cwd. I've checked in a fix that will set the option if we follow the path through to using the app data directory, and will save the config file. Note that this won't happen if the config file already exists in this location. > pop3proxy_tray.exe doesn't want to talk to my sb_server.exe > at all. It's not clear from this, but if you were running pop3proxy_tray *and* sb_server then that's wrong. pop3proxy_tray will launch sb_server (well, the appropriate bits of sb_server.py) in a separate thread if it can't use the service. > The commands that launch web pages work OK, but > neither "Stop SpamBayes" nor "Exit SpamBayes" stops > sb_server.exe. This works for me if I use urllib.urlopen and not urllib2.urlopen. I don't know why, but that's the case here, so I suspect for Richie (and any other user) as well. It is very slow, but I suppose this is because it's sending a http page and waiting for the response. I've checked in the change. > The tray command "Check for latest version" says "Error > checking the latest version." SpamBayesServer1.log contains this: > > Traceback (most recent call last): > File "pop3proxy_tray.py", line 436, in CheckVersion > File "spambayes\Version.pyc", line 108, in get_version_string > KeyError: 'Full Description Binary' I get this too, although running from source works fine. I wondered if the Version.pyc was too old, but I replaced the one in the archive with mine (I presume this would work) and that didn't help. I'm not sure what else to try! =Tony Meyer From ta-meyer at ihug.co.nz Thu Sep 18 00:05:48 2003 From: ta-meyer at ihug.co.nz (Tony Meyer) Date: Thu Sep 18 00:05:55 2003 Subject: [spambayes-dev] Options change and 1.0a6 release Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F13026F2947@its-xchg4.massey.ac.nz> I've checked in the options changes that were discussed over the last week or so. *** This may break your configuration files. *** In particular: * If you use sb_filter and use the debug_header option, you need to use the "Headers":"include_evidence" and "Headers":"evidence_header_name" options instead. * If you use the notate_to or notate_subject options, you need to move these to the "Headers" section of your configuration file (rather than the "pop3proxy" section). * If you had sb_server storing your caches in a non-default location, you need to move these three options from the "pop3proxy" section to the "storage" section. * If you had tweaked other pop3proxy storage options (to use gzip, to not cache bulk/large mail, to not cache, or the expiry time), you need to move these from the "pop3proxy" section to the "storage" section. If you don't, you'll (hopefully :) notice that you get the "invalid option" warning, and you'll go back to the default settings. I believe this is the last thing that was scheduled before 1.0a6. The main changes have been in effect for a couple of weeks now, so we can hope that there aren't any really major bugs left. Unless someone objects (and quickly ;) I'll try and put 1.0a6 out tomorrow morning (so in about 20 hours). =Tony Meyer From tim.one at comcast.net Thu Sep 18 00:24:58 2003 From: tim.one at comcast.net (Tim Peters) Date: Thu Sep 18 00:25:03 2003 Subject: [spambayes-dev] Options change and 1.0a6 release In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F13026F2947@its-xchg4.massey.ac.nz> Message-ID: [Tony Meyer] > I've checked in the options changes that were discussed over the last > week or so. > > *** This may break your configuration files. *** > > ... > > I believe this is the last thing that was scheduled before 1.0a6. > The main changes have been in effect for a couple of weeks now, so we > can hope that there aren't any really major bugs left. Unless > someone objects (and quickly ;) I'll try and put 1.0a6 out tomorrow > morning (so in about 20 hours). I'd like to see us change the default address-headers option value to address_headers: from to cc sender reply-to for the relase (the Outlook client has done that all along), and change the Outlook client to use experimental_ham_spam_imbalance_adjustment: False (as everyone other than the Outlook client has done all along). We've had several reports of the latter change helping, and no reports of it hurting. I'll rip out the support code after the release (in order to minimize the chance of release breakage). The address_headers change is long overdue; as explained before, the current address_headers: from default is a leftover from the days when several of us were testing with strongly mixed-source corpora. But these days, if, e.g., all the email that gets sent to your myspamhoneypot@myisp.com address is spam, there's no reason to blind the classifier to that fine To: clue. From ta-meyer at ihug.co.nz Thu Sep 18 00:31:41 2003 From: ta-meyer at ihug.co.nz (Tony Meyer) Date: Thu Sep 18 00:31:48 2003 Subject: [spambayes-dev] Options change and 1.0a6 release In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1303531D47@its-xchg4.massey.ac.nz> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130212AEEA@its-xchg4.massey.ac.nz> > I'd like to see us change the default address-headers option value to > address_headers: from to cc sender reply-to Done. > and change the Outlook client to use > experimental_ham_spam_imbalance_adjustment: False Done. =Tony Meyer From mhammond at skippinet.com.au Thu Sep 18 01:02:16 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Thu Sep 18 01:02:09 2003 Subject: [spambayes-dev] Prevent multiple servers on Windows Message-ID: <051601c37da2$07b87950$f502a8c0@eden> We have touched a little on this before, and I suspect that at least one of Richie's problems with the binary could be explained by having the proxy running multiple times. I propose that for Windows, we hack in a simple mutex to prevent *any* of the server based apps from starting if any of them are already running. We can fix it later :) Does anyone object to the following patch? There will have to be changes to pop3proxy_tray/service, but they are trivial. Mark. Index: sb_server.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/scripts/sb_server.py,v retrieving revision 1.4 diff -u -r1.4 sb_server.py --- sb_server.py 13 Sep 2003 05:10:49 -0000 1.4 +++ sb_server.py 18 Sep 2003 04:55:17 -0000 @@ -118,6 +118,10 @@ newsoft = min(hard, max(soft, 1024*2048)) resource.setrlimit(resource.RLIMIT_STACK, (newsoft, hard)) +# exception may be raised if we are already running and check such things. +class AlreadyRunningException(Exception): + pass + # number to add to STAT length for each msg to fudge for spambayes headers HEADER_SIZE_FUDGE_FACTOR = 512 @@ -726,6 +730,10 @@ state.bayes.store() state.bayes.close() + try: + state.windows_mutex.Close() + except AttributeError: + pass state = State() prepare(state) @@ -743,6 +751,29 @@ Dibbler.run(launchBrowser=launchUI) def prepare(state): + # If we can, prevent multiple servers from running at the same time. + # This can be refactored later if other platforms ever want to do anything + # similar. + if sys.platform.startswith("win"): + try: + import win32event, win32api, winerror + # ideally, the mutex name could include either the username, or + # the munged path to the INI file - this would mean we would allow + # multiple starts so long as they weren't for the same user. + # However, as of now, the service version is likely to start as + # a different user, so a single mutex is best for now. + mutex_name = "SpamBayesServer" + hmutex = win32event.CreateMutex(None, True, mutex_name) + if win32api.GetLastError()==winerror.ERROR_ALREADY_EXISTS: + win32api.CloseHandle(hmutex) + raise AlreadyRunningException + # remember the handle, but no real need to explicitly close it + # when done, as it will die with the process. + state.windows_mutex = hmutex + except ImportError: + # no win32all - no worries, just start + pass + # Do whatever we've been asked to do... state.createWorkers() @@ -808,7 +839,12 @@ print get_version_string("POP3 Proxy") print "and engine %s.\n" % (get_version_string(),) - prepare(state=state) + try: + prepare(state=state) + except AlreadyRunningException: + print "ERROR: The proxy is already running on this machine." + print "Please stop the existing proxy and try again" + return if 0 <= len(args) <= 2: # Normal usage, with optional server name and port number. From T.A.Meyer at massey.ac.nz Thu Sep 18 01:22:55 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Thu Sep 18 01:23:09 2003 Subject: [spambayes-dev] Prevent multiple servers on Windows Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130347282B@its-xchg4.massey.ac.nz> > Does anyone object to the following patch? +1 here, except: > @@ -808,7 +839,12 @@ > print get_version_string("POP3 Proxy") > print "and engine %s.\n" % (get_version_string(),) > > - prepare(state=state) > + try: > + prepare(state=state) > + except AlreadyRunningException: > + print "ERROR: The proxy is already running on this machine." > + print "Please stop the existing proxy and try again" > + return > > if 0 <= len(args) <= 2: > # Normal usage, with optional server name and port number. This isn't against latest cvs. I had to move the prepare call to later on because otherwise the ui info isn't built properly if you specify stuff on the command line. It's a simple enough change, though. =Tony Meyer From richie at entrian.com Thu Sep 18 02:40:24 2003 From: richie at entrian.com (Richie Hindle) Date: Thu Sep 18 02:40:34 2003 Subject: [spambayes-dev] Test windows binaries available In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F13034727CA@its-xchg4.massey.ac.nz> References: <1ED4ECF91CDED24C8D012BCF2B034F13034727CA@its-xchg4.massey.ac.nz> Message-ID: <0ekimvkdun2m4nqfjvaaamo0tik05eu20o@4ax.com> [Richie] > pop3proxy_tray.exe doesn't want to talk to my sb_server.exe > at all. [Tony] > It's not clear from this, but if you were running pop3proxy_tray *and* > sb_server then that's wrong. pop3proxy_tray will launch sb_server > (well, the appropriate bits of sb_server.py) in a separate thread if it > can't use the service. Ah! My copy of the manual must be out of date. 8-) -- Richie Hindle richie@entrian.com From richie at entrian.com Thu Sep 18 02:41:53 2003 From: richie at entrian.com (Richie Hindle) Date: Thu Sep 18 02:42:03 2003 Subject: [spambayes-dev] Prevent multiple servers on Windows In-Reply-To: <051601c37da2$07b87950$f502a8c0@eden> References: <051601c37da2$07b87950$f502a8c0@eden> Message-ID: [Mark] > I propose that for Windows, we hack in a simple mutex to prevent *any* of > the server based apps from starting if any of them are already running. We > can fix it later :) Sure. One nit: the tray application ought to pop up a message box when it happens, rather than printing to the console/log. -- Richie Hindle richie@entrian.com From mhammond at skippinet.com.au Thu Sep 18 09:11:35 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Thu Sep 18 09:11:30 2003 Subject: [spambayes-dev] Prevent multiple servers on Windows In-Reply-To: Message-ID: <060701c37de6$63fad980$f502a8c0@eden> [Richie] > [Mark] > > I propose that for Windows, we hack in a simple mutex to > prevent *any* of > > the server based apps from starting if any of them are > already running. We > > can fix it later :) > > Sure. One nit: the tray application ought to pop up a > message box when it > happens, rather than printing to the console/log. It is all too hard :) The current pop3proxy_tray app restarts the server using the same "state" object, but only calling prepare() once. The "correct" fix appears to be to give the State object init() and close() methods, and anything else just starts looking too ugly even for my eyes . Further, pop3proxy tray needs a little work in that it tends to assume its "self.started" variable, which would be reasonable if it not for an external service and console .exe that all compete for being the one true server. None of this is too hard, but is very risky given the 0.6 release. I see 2 options: 1) Forget these new changes for now, and release 0.6 as it stands. This can still include a windows binary - if it is good enough for source-code users, it is good enough for binary users (except for the sister factor) 2) I check in some fairly intrusive changes that *should* all work . We delay 0.6 until things are again shaken out. I don't really care. Option (1) means saving my changes in a diff and uploading them to sourceforge. No decisive replies defaults to (1) Mark. From skip at pobox.com Thu Sep 18 09:36:40 2003 From: skip at pobox.com (Skip Montanaro) Date: Thu Sep 18 09:36:56 2003 Subject: [spambayes-dev] Generating a sample training database In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1303472634@its-xchg4.massey.ac.nz> References: <1ED4ECF91CDED24C8D012BCF2B034F1303472634@its-xchg4.massey.ac.nz> Message-ID: <16233.46312.16079.925144@montanaro.dyndns.org> >> I'll take the lead in grabbing the ham and spam and putting together >> a sample training database.... Tony> Is there any particular sort of message that we should contribute? Not that I can think of. The only obvious criterion is that the contents of the ham message(s) you send me not be sensitive. Tony> Something extremely hammy/spammy? Something that we think is Tony> really generic? Or just any random message we click on? Any old random messages should be fine I think. Skip From skip at pobox.com Thu Sep 18 14:16:38 2003 From: skip at pobox.com (Skip Montanaro) Date: Thu Sep 18 14:16:53 2003 Subject: [spambayes-dev] New SB option in CVS: html_ui:rows_per_section Message-ID: <16233.63110.429376.640670@montanaro.dyndns.org> I found it impossible to use sb_server (aka pop3proxy) if I had a large number of untrained mails. If I clicked a Discard link to set all the radio buttons in a section I was never patient enough to let the operation complete. I always wound up killing Safari after five or ten minutes. It turns out that Safari's JavaScript implementation just sucks in this regard. I modified the onHeader() function to call "alert(i)" every 100 iterations. Each interval between alert popups got longer and longer. Internet Explorer for the Mac was consistent in this regard. This suggests that Apple decided to use a linked list instead of an array for this particular data structure. At any rate, I just checked in a new option for the [html_ui] section of your ini file: "rows_per_section". It specifies (oddly enough) the maximum number of rows to display in each section of the review page. It defaults to 10000, so you shouldn't see a difference if you don't set it. On the other hand, if you set it to a "reasonable" number, say 20 or 50, you should find your review page displays faster and your keyboard doesn't sprout cobwebs while you wait for JavaScript to set all the radio buttons in a section to "Discard", at least if you use Safari. Skip From richie at entrian.com Thu Sep 18 14:20:37 2003 From: richie at entrian.com (Richie Hindle) Date: Thu Sep 18 14:20:48 2003 Subject: [spambayes-dev] Prevent multiple servers on Windows In-Reply-To: <060701c37de6$63fad980$f502a8c0@eden> References: <060701c37de6$63fad980$f502a8c0@eden> Message-ID: <8mtjmvcpooa0dojfensoat1fn0jvroq8q3@4ax.com> > It is all too hard :) Isn't everything? 8-) > The "correct" fix appears to be to > give the State object init() and close() methods, and anything else just > starts looking too ugly even for my eyes . Don't know - I'd have to have a very close look at it. The whole start/stop business kind of grew - and under the hands of many people - rather than being designed in from the start. > 1) Forget these new changes for now, and release 0.6 as it stands. This can > still include a windows binary - if it is good enough for source-code users, > it is good enough for binary users (except for the sister factor) > > 2) I check in some fairly intrusive changes that *should* all work . > We delay 0.6 until things are again shaken out. > > I don't really care. Option (1) means saving my changes in a diff and > uploading them to sourceforge. No decisive replies defaults to (1) I would vote for leaving the edits out of 1.0a6 - that was originally supposed to be a name-change-only release! You could always commit your changes on a branch. -- Richie Hindle richie@entrian.com From richie at entrian.com Thu Sep 18 14:41:54 2003 From: richie at entrian.com (Richie Hindle) Date: Thu Sep 18 14:42:04 2003 Subject: [spambayes-dev] Re: [Spambayes-checkins] spambayes/spambayes storage.py, 1.33, 1.34 Message-ID: [Skip] > * one import per line I've never understood why the coding standard recommends that - is there a concrete reason I'm missing, or is it a matter of taste? -- Richie Hindle richie@entrian.com From tim.one at comcast.net Thu Sep 18 15:06:58 2003 From: tim.one at comcast.net (Tim Peters) Date: Thu Sep 18 15:07:04 2003 Subject: [spambayes-dev] Re: [Spambayes-checkins] spambayes/spambayesstorage.py, 1.33, 1.34 In-Reply-To: Message-ID: [Skip] >> * one import per line [Richie] > I've never understood why the coding standard recommends that - is > there a concrete reason I'm missing, or is it a matter of taste? Some people like to sort import statements alphabetically. Guido actually appears to like to sort them by increasing length(!). One import per line facilitates both crucial practices . iow-"no"-ly y'rs - tim From skip at pobox.com Thu Sep 18 16:52:42 2003 From: skip at pobox.com (Skip Montanaro) Date: Thu Sep 18 16:53:55 2003 Subject: [spambayes-dev] Re: [Spambayes] Re: mboxtrain croaks on spam mbox file In-Reply-To: References: <16233.61616.898005.416160@montanaro.dyndns.org> Message-ID: <16234.6938.783202.843896@montanaro.dyndns.org> (Redirecting to spambayes-dev.) Drew> Anyway, after running mboxtrain on all these baby MH's, I finally Drew> found the culprit, #1688: ... Thanks. That's enough to begin tracking down the error, though I'm not the best geek for the job. Here's what I see so far. The problem appears when trying to decode the Content-Disposition header of the attachment. I know nothing about MIME email, so all I've been able to do is follow my nose to see where it leads. In email.Message.Message, the get_filename() method calls get_param(), which calls _get_params_preserve, which calls get() to grab the raw header contents, which is attachment; filename*1="eicar."; filename*2="com" It then splits that into [('attachment',''), ('filename*1','"eicar."'), ('filename*2','"com"')] and passes this to email.Utils.decode_params(). That's as far as I went, because the docstring didn't seem to match what was being passed in: params is a sequence of 2-tuples containing (content type, string value). and doesn't say anything about what's supposed to be returned. I can't believe the first elements of those tuples can be considered "content type" values in the usual MIME sense. I suspect email.Utils.decode_params() is either being called incorrectly and thus returning garbage or isn't described properly in its docstring. In this case it returns [('attachment', ''), ('filename', (None, None, '"eicar.com"'))] Skip Drew> ,---- Drew> | From nobody Thu Aug 7 09:55:57 2003 Drew> | Return-Path: Drew> | X-Gnus-Mail-Source: maildir:~/Maildir/inbox/new Drew> | Message-ID: Drew> | Delivered-To: aar@williams.mc.vanderbilt.edu Drew> | Received: (qmail 24184 invoked by alias); 7 Aug 2003 06:35:45 -0000 Drew> | Delivered-To: postmaster@williams.mc.vanderbilt.edu Drew> | Received: (qmail 24126 invoked from network); 7 Aug 2003 06:35:44 -0000 Drew> | Received: from unknown (HELO nessus) (160.129.223.39) Drew> | by williams.mc.vanderbilt.edu with SMTP; 7 Aug 2003 06:35:44 -0000 Drew> | From: nobody@example.com Drew> | To: postmaster@[160.129.208.222] Drew> | Organization: Nessus kabale Drew> | MIME-Version: 1.0 Drew> | Subject: Nessus antivirus test 3: alternative base64 attachment Drew> | Content-Type: multipart/mixed; boundary="=-=-=" Drew> | Xref: williams spam-archive-1:1689 Drew> | Lines: 13 Drew> | X-Gnus-Article-Number: 1689 Mon Aug 11 11:08:05 2003 Drew> | Drew> | Drew> | --=-=-= Drew> | Drew> | If you can read or execute the attachment, this means that you do not Drew> | have an antivirus, or that it was disabled. Drew> | Drew> | --=-=-= Drew> | Content-Type: application/octet-stream Drew> | Content-Disposition: attachment; filename*1="eicar."; filename*2="com" Drew> | Content-Description: EICAR test file Drew> | Drew> | X5O!P%@AP[4\PZX54(P^)7CC)7}$EICAR-STANDARD-ANTIVIRUS-TEST-FILE!$H+H* Drew> | --=-=-=-- Drew> `---- From skip at pobox.com Thu Sep 18 17:13:00 2003 From: skip at pobox.com (Skip Montanaro) Date: Thu Sep 18 17:13:12 2003 Subject: [spambayes-dev] Re: [Spambayes-checkins] spambayes/spambayesstorage.py, 1.33, 1.34 In-Reply-To: References: Message-ID: <16234.8156.537783.634957@montanaro.dyndns.org> Tim> [Skip] >>> * one import per line Tim> [Richie] >> I've never understood why the coding standard recommends that - is >> there a concrete reason I'm missing, or is it a matter of taste? Aside from Tim's excellent summary of why you'd want to do this (including the mysterious "dutch lengthening import technique"), I have two other reasons: * In this case, I split the import simply to be consistent with the other imports in the file, which were all one per line. * I like to split my imports into two groups, modules which are part of the Python distribution and modules which I wrote or which are otherwise third party. That just makes it easier to tell if I need to include other modules when tossing something out to my website. Skip From T.A.Meyer at massey.ac.nz Thu Sep 18 17:46:48 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Thu Sep 18 17:47:00 2003 Subject: [spambayes-dev] Re: [Spambayes-checkins]spambayes/spambayesstorage.py, 1.33, 1.34 Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1303472922@its-xchg4.massey.ac.nz> [Tim] > Guido actually appears to like to sort them by increasing length(!). So do I! You've got to admit it makes it look prettier ;) > * I like to split my imports into two groups, modules > which are part of the Python distribution and modules which I wrote or > which are otherwise third party. That just makes it easier to tell if > I need to include other modules when tossing something out to my website. I like this too, although I tend to use three groups - part of the Python distribution, part of that project (spambayes.*, for example), and third party. One reason that lends itself to one-per-line, IMO, is that it makes a diff more obvious - you can see at a glance which import has gone, rather than having to look for which one is missing. Similarly, it makes things a bit easier for merging with cvs (at least with the software I use) - if two people make edits to an import line (say one adds an import and one removes one), then this will end up being a conflict I have to manually resolve. One per line works out nicely. OTOH, cvs doesn't really like shuffling imports about to satisfy the aesthetic increasing length criteria :( Just my 2c. =Tony Meyer From T.A.Meyer at massey.ac.nz Thu Sep 18 17:58:34 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Thu Sep 18 17:58:40 2003 Subject: [spambayes-dev] Prevent multiple servers on Windows Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1303472943@its-xchg4.massey.ac.nz> > None of this is too hard, but is very risky given the 0.6 > release. I see 2 options: > > 1) Forget these new changes for now, and release 0.6 as it > stands. This can still include a windows binary - if it is > good enough for source-code users, it is good enough for > binary users (except for the sister factor) > > 2) I check in some fairly intrusive changes that *should* all > work . We delay 0.6 until things are again shaken out. I agree with Richie and vote for #1. 0.6 has been delayed several times already, and the theory was that it would be identical to 0.5 apart from the name change & option removal, which is already untrue. I doubt (sorry if I'm wrong) that anyone apart from me, you and Richie will have an opinion on this, so I'm going to go ahead and get 1.0a6 ready. It can always be pulled if I'm wrong ;) =Tony Meyer From T.A.Meyer at massey.ac.nz Thu Sep 18 21:38:05 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Thu Sep 18 21:38:16 2003 Subject: [spambayes-dev] 1.0a6 Release candidates Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130359920D@its-xchg4.massey.ac.nz> Release candidates for 1.0a6 are available from: and If anyone would like to test them out, that would be great. Apart from that, everything is ready for the release; I just have to upload the files to sf, commit the website changes, and send the announce email. In particular, if someone with a *nix box could check the tar.gz to check that it has the correct line endings (I've been burnt by that before) that would be great. I followed Richie's instructions in README-DEVEL.txt (even downloading WinCVS just for that), but I'm not sure that it worked right. A linux box arrives here on Monday, but I'd like to get it out before that... Thanks, Tony From T.A.Meyer at massey.ac.nz Thu Sep 18 22:06:23 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Thu Sep 18 22:06:37 2003 Subject: [spambayes-dev] 1.0a6 Release candidates Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1303599245@its-xchg4.massey.ac.nz> > Everything looks fine on my side on Linux > Emacs and more do the right line splitting Great, thanks. > But why the .tar.gz is 1.05a and the .zip is 1.06a ? Opps. Because I forgot to change the number before I made the .tar.gz. I'll fix that. =Tony Meyer From mhammond at skippinet.com.au Fri Sep 19 01:47:37 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Fri Sep 19 01:47:34 2003 Subject: [spambayes-dev] Test Binary installer Message-ID: <07f101c37e71$8a422220$f502a8c0@eden> I need a life! I created an installer executable for SpamBayes, which includes Outlook and the proxy apps. It tries to detect the most appropriate one to use, and warns if you try and install both (although it does let you). It optionally adds a shortcut to your startup folder, and does a few other funky things. http://starship.python.net/crew/mhammond/downloads/SpamBayes-Setup.exe It is branded as "1.0a6", and is known to *not* work on my Win9x box. Mark. From richie at entrian.com Fri Sep 19 03:08:33 2003 From: richie at entrian.com (Richie Hindle) Date: Fri Sep 19 03:08:44 2003 Subject: [spambayes-dev] 1.0a6 Release candidates Message-ID: Tony, 1.0a6 works like a charm for me (POP3 proxy on WinXP). Thanks for building it (it's always more work than you think it will be 8-) One packaging nit: you should number release candidates as such, eg. 1.0a6rc1 - otherwise there could be several 1.0a6 packages floating around, and it's hard to tell which is which. One functionality nit: setup.py nicely removed my old scripts, but didn't remove the corresponding .pyc files (some of the script import each other, which is why you get .pyc files). I'll fix up README_DEVEL.txt and setup.py this weekend, unless someone beats me to it. -- Richie Hindle richie@entrian.com From T.A.Meyer at massey.ac.nz Fri Sep 19 03:22:14 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Fri Sep 19 03:22:27 2003 Subject: [spambayes-dev] 1.0a6 Release candidates Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1303599323@its-xchg4.massey.ac.nz> > 1.0a6 works like a charm for me (POP3 proxy on WinXP). > Thanks for building it (it's always more work than you think > it will be 8-) No worries. 1.0b1 must be your turn again ;) > One packaging nit: you should number release candidates as > such, eg. 1.0a6rc1 - otherwise there could be several 1.0a6 > packages floating around, and it's hard to tell which is which. Will do. > One functionality nit: setup.py nicely removed my old > scripts, but didn't remove the corresponding .pyc files (some > of the script import each other, which is why you get .pyc files). I've checked this in. It deletes .pyo as well, in case anyone had it setup to generate those. I've just got it to quietly delete them (if it is deleting the others) since it seems implicit that the user would like the .pyc/o variants gone too. > I'll fix up README_DEVEL.txt and setup.py this weekend, > unless someone beats me to it. I did setup.py, but I'll leave you to add stuff to README-DEVEL about release candidates. Thanks, Tony From T.A.Meyer at massey.ac.nz Fri Sep 19 03:48:33 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Fri Sep 19 03:48:45 2003 Subject: [spambayes-dev] Test Binary installer Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1303599324@its-xchg4.massey.ac.nz> > I need a life! Hopefully the bourbon sufficed ;) > I created an installer executable for SpamBayes, which > includes Outlook and the proxy apps. All looks good here (Eudora, WinXP), except for: * The KeyError when checking for the latest version. Are you sure that the pyc is from latest cvs? * Saving the options still doesn't work correctly (the same ImportError is generated). Either this isn't latest cvs, or my fix didn't. * The tooltip on the tray app doesn't seem to update. I doubt this has anything to do with the installer, though. Given that stopping SpamBayes takes so long via the tray app (well, here at least), I wonder if we should indicate that it's doing something. Either change the tray icon to a 'working' icon, or the cursor. One other thing is that if I "Save and Shutdown" from the web interface, the tray app doesn't (know to) update (again, nothing to do with the installer). I wonder whether this is just one more thing to consider when looking at the start/stop/prepare stuff. > It tries to detect the most appropriate one to use, This worked nicely for me. > and warns if you try and install both (although it does let you). As did this, and the desktop & startup links. > It is branded as "1.0a6" Hmmm. What's your plan for branding the release? A "1.0a6" installer will confuse those Outlook people using version 008.1, won't it? It's nice that it ties in with the source release (and will probably almost match it), but it would probably be ok if it was 009. =Tony Meyer From papaDoc at videotron.ca Fri Sep 19 08:28:57 2003 From: papaDoc at videotron.ca (papaDoc) Date: Fri Sep 19 08:27:59 2003 Subject: [spambayes-dev] Spam and ham count can be negative ! Message-ID: <3F6AF689.7040501@videotron.ca> Hi, I'm trying to patch sb_mboxtrain to train on only the email not more than x days old. I want to do that since I'm keeping all my email and if I train on all my emails the db becomes too big. When I try my new sb_mboxtrain I got the following message (or something similar) "The spam count will be negative." Ok let see if this is possible. I use sb_mboxtrain to create my database with only 10 spams and 10 hams (This is a small amount since I want some error for the purpose of the demonstration) I'm using hammie to filter my mail for several days. Since it is making some errors I move some emails from the spam folder to the ham and vice-verca... Every night I'm using sb_mboxtrain on my ham and spam folder. If you look at the code below. (Original part of sb_mboxtrain) You can see that if it exist an Header for the message (Ex hammie has created the spam header) then it is untrain before retraining has the right ham or spam The problem is I NEVER TRAINED the database on this email. So the spam or ham count can becomes negative !!!! The second problem I don't known how to solve this problem..... if is_spam: spamtxt = options["Headers", "header_spam_string"] else: spamtxt = options["Headers", "header_ham_string"] oldtxt = msg.get(TRAINED_HDR) if force: # Train no matter what. if oldtxt != None: del msg[TRAINED_HDR] elif oldtxt == spamtxt: # Skip this one, we've already trained with it. return False elif oldtxt != None: # It's been trained, but as something else. Untrain. del msg[TRAINED_HDR] h.untrain(msg, not is_spam) h.train(msg, is_spam) msg.add_header(TRAINED_HDR, spamtxt) return True Remi papaDoc@videotron.ca From skip at pobox.com Fri Sep 19 09:06:49 2003 From: skip at pobox.com (Skip Montanaro) Date: Fri Sep 19 09:07:10 2003 Subject: [spambayes-dev] 1.0a6 Release candidates In-Reply-To: References: Message-ID: <16234.65385.48717.919535@montanaro.dyndns.org> Richie> One functionality nit: setup.py nicely removed my old scripts, Richie> but didn't remove the corresponding .pyc files (some of the Richie> script import each other, which is why you get .pyc files). Hmmm... Okay, I'll fix that in setup.py. Skip From tdickenson at geminidataloggers.com Fri Sep 19 09:11:13 2003 From: tdickenson at geminidataloggers.com (Toby Dickenson) Date: Fri Sep 19 09:11:19 2003 Subject: [spambayes-dev] Spam and ham count can be negative ! In-Reply-To: <3F6AF689.7040501@videotron.ca> References: <3F6AF689.7040501@videotron.ca> Message-ID: <200309191411.13717.tdickenson@geminidataloggers.com> On Friday 19 September 2003 13:28, papaDoc wrote: > Hi, > > > I'm trying to patch sb_mboxtrain to train on only the email not more > than x days old. I want to do that since I'm keeping all my email and > if I train on all my emails the db becomes too big. Im doing a similar thing, running sb_mboxtrain every night to do a full train on all emails from the last few years. I actually run a wrapper around sb_mboxtrain that sniffs the location of my ham and spam folders from my kmail configuration. >The problem is I NEVER TRAINED the database on this email. > So the spam or ham count can becomes negative !!!! You need the -f switch to sb_mboxtrain. -- Toby Dickenson From tim at fourstonesexpressions.com Fri Sep 19 14:57:44 2003 From: tim at fourstonesexpressions.com (Tim Stone) Date: Fri Sep 19 14:57:54 2003 Subject: [spambayes-dev] An interesting email with interesting ramifications Message-ID: This scored a solid unsure for me, which is completely understandable... The pertinent text is: Aoccdrnig to a rscheearch sdtuy at Cmabrigde Uinervtisy, it deosn't mttaer in waht oredr the ltteers in a wrod are, the olny iprmoetnt tihng is that the frist and lsat ltteer be at the rghit pclae. The rset can be a total mses and you can sitll raed it wouthit porbelm. Tihs is bcuseae the huamn mnid deos not raed ervey lteter by istlef, but the wrod as a wlohe. -- Vous exprimer; Expr?sese; Te stesso esprimere; Express yourself! Tim Stone www.fourstonesExpressions.com From aaraines at pobox.com Fri Sep 19 18:04:49 2003 From: aaraines at pobox.com (Andrew A. Raines) Date: Fri Sep 19 18:20:36 2003 Subject: [spambayes-dev] Re: An interesting email with interesting ramifications References: Message-ID: Tim Stone writes: > This scored a solid unsure for me, which is completely > understandable... The pertinent text is: > > Aoccdrnig to a rscheearch sdtuy at Cmabrigde Uinervtisy, it deosn't > mttaer in waht oredr the ltteers in a wrod are, the olny iprmoetnt > tihng is that the frist and lsat ltteer be at the rghit pclae. The > rset can be a total mses and you can sitll raed it wouthit > porbelm. Tihs is bcuseae the huamn mnid deos not raed ervey lteter by > istlef, but the wrod as a wlohe. Got the same one. However: X-Spambayes-Classification: ham; 0.00 But I'm getting so many stinkin' false negatives lately, I'm not surprised. -Drew From T.A.Meyer at massey.ac.nz Fri Sep 19 18:36:56 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Fri Sep 19 18:38:24 2003 Subject: [spambayes-dev] An interesting email with interesting ramifications Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1303599336@its-xchg4.massey.ac.nz> > This scored a solid unsure for me, which is completely > understandable... > The pertinent text is: > > Aoccdrnig to a rscheearch sdtuy at Cmabrigde Uinervtisy, it > deosn't mttaer > in waht oredr the ltteers in a wrod are, the olny iprmoetnt > tihng is that > the frist and lsat ltteer be at the rghit pclae. The rset can > be a total > mses and you can sitll raed it wouthit porbelm. Tihs is > bcuseae the huamn > mnid deos not raed ervey lteter by istlef, but the wrod as a wlohe. I've received six of these now, including three where I don't recognise the sender's address. All of them have been solidly ham so far. I'm starting to think they're spam, though... Presumably, training on a single one of these would mean that the rest were easily classified. It seems unlike that any of those words would already be tokens. =Tony Meyer From ta-meyer at ihug.co.nz Fri Sep 19 19:08:39 2003 From: ta-meyer at ihug.co.nz (Tony Meyer) Date: Fri Sep 19 19:08:51 2003 Subject: [spambayes-dev] SpamBayes 1.0a6 Source Release Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130212AF01@its-xchg4.massey.ac.nz> Version 1.0a6 of the SpamBayes source is now available. Note: this is not a release of the binary installer for the Outlook plug-in. A separate release for the plug-in installer will follow at a later date. This release follows (reasonably) close on the heals of 1.0a5 and is primarily a reshuffling and tidy-up. All the scripts have been renamed, and all the cruft supporting old option names has been removed. This will almost certainly mean that you have to edit the way you use SpamBayes (to change the name, at least). There are a few other minor improvements and bug fixes. We recommend that all source code users upgrade to the 1.0a6 release, but do so at a time when they have a few minutes available to change over references to script names, and to check their configuration files are using the correct names. Note that the plan is for this to be the final alpha release in the 1.0 series, and that 1.0 is effectively feature frozen from this point. We will continue to fix any bugs that are reported, and should be able to release a very stable 1.0b1 and then final 1.0 release in the near future. Work will soon begin on the 1.1 series, which will (no doubt!) feature many improvements. For full details about what's new in this release, please see the release notes and/or changelog: Downloads are available from . =Tony Meyer, on behalf of the spambayes team. From ta-meyer at ihug.co.nz Fri Sep 19 19:20:03 2003 From: ta-meyer at ihug.co.nz (Tony Meyer) Date: Fri Sep 19 19:20:11 2003 Subject: [spambayes-dev] Release_1_0 branch. Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130212AF02@its-xchg4.massey.ac.nz> As discussed earlier, I've created a cvs branch - 'release_1_0' - to move toward 1.0b1 and then 1.0. If I understand things rightly (going by Jeremy and Richie's comments) the main branch is now for 1.1 work, so is un-feature frozen ;). If people could check 1.0 bugfixes into the release_1_0 branch (and 1.1, as needed), that would be great. =Tony Meyer From ta-meyer at ihug.co.nz Sat Sep 20 01:04:16 2003 From: ta-meyer at ihug.co.nz (Tony Meyer) Date: Sat Sep 20 01:04:21 2003 Subject: [spambayes-dev] Auto-response text Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130212AF07@its-xchg4.massey.ac.nz> When you get a chance, could Tim/Barry/Skip update the autoresponse text? I've checked in the change to reply.txt, so you just have to do whatever follows from that. (The change is just updating the Outlook & source version numbers). Thanks, Tony From anthony at interlink.com.au Sat Sep 20 02:58:35 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Sat Sep 20 03:00:13 2003 Subject: [spambayes-dev] Fwd: SourceForge.net Service Update: CVS Message-ID: <200309200658.h8K6waaA016774@localhost.localdomain> An embedded message was scrubbed... From: "SourceForge.net Team" Subject: SourceForge.net Service Update: CVS Date: Fri, 19 Sep 2003 23:07:42 -0700 Size: 5937 Url: http://mail.python.org/pipermail/spambayes-dev/attachments/20030920/18f4add6/attachment.mht From MMARTINEZ at CSREES.USDA.GOV Sat Sep 20 13:49:25 2003 From: MMARTINEZ at CSREES.USDA.GOV (Martinez, Michael) Date: Sat Sep 20 13:48:22 2003 Subject: [spambayes-dev] Are there plans for a daemonized or compiled version of Spambayes? Message-ID: <83EED274D3127740995A6B621B88F2E101274212@csrees-exchange.csrees.usda.gov> Hi Guys, I've been running Spambayes on our agency Linux smtp gateway for several months and very happy with its classification of spam. My gateway is a qmail system and it pipes all incoming email through the hammiefilter prior to delivery. However, a performance problem arises when the gateway gets hit during peak hours with a lot of emails. What happens is the system slows down tremendously, in part due to the number of python instances that get forked in order to scan the emails. I was wondering: are there any plans to develop a lightweight, daemonized version of Spambayes? In the same vein, are there plans to port it to C or another compiled language? How difficult would this be? Any suggestions on how to lessen the impact on the system resources under a heavy email load? Thanks, Michael Martinez Linux System Administrator ISTM/CSREES United States Department of Agriculture -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes-dev/attachments/20030920/efc4f0d5/attachment.html From seanl at chaosring.org Sat Sep 20 14:24:52 2003 From: seanl at chaosring.org (Sean R. Lynch) Date: Sat Sep 20 14:31:07 2003 Subject: [spambayes-dev] Network checks Message-ID: In attempting to develop an integrated mail system that the average person can use, I've come to the conclusion that bayesian filtering alone just isn't enough. The main problem is the training time. Bayesian filters work best when they are trained on the user's mail *and* the training set is accurate. When experimenting on my dad, I have found the training set that he developed to be far from what you and I would consider accurate; he considers stuff he's interested in to be non-spam regardless of how spammy it is, and non-spam that he's not interested ends up in his spam folder. However, if something ends up in his spam quarantine, he will leave it there unless it's really something he's interested in, because of the extra effort to release it from quarantine. What this seems to indicate is the best way to develop a good training set for my dad is to have a good filter to begin with. SpamAssassin seems like it would be reasonable, but if I'm gonna use SpamAssassin, why not use its built-in Bayesian filter? The main reason I won't is that I really want to use SpamAssassin's network checks, and IMHO it's bad netizenship to run them more than once on the same message, and enough messages go to multiple users on my server that I'd really like to run SA as a content filter. I think that Bayesian filters really need to include their training time in performance analyses, rather than just comparing their ultimate performance after being trained. The "best" of the Bayesian filters seem to require the longest training times, and I don't really consider this to be a good thing, because "training time" really translates to both false positives and false negatives (an unsure is a false negative as far as I'm concerned). If IP addresses, email addresses (in the body), domains, and URLs could be shared among users of Bayesian filters, I think this would reduce training time significantly, because there are large numbers of each of them out there, but they have the potential to be the biggest spam clues. For relay IP addresses, I've been thinking of just keeping counts of spam and ham for each of them and using DNS TXT records to distribute this information. The counts would be submitted via a CGI or XMLRPC or something, and the DNS zone would be regenerated every hour. This would not be a blacklist and it wouldn't say anything technical or moral about the host listed, just that people marked this many non-spams and this many spams from this host. Email addresses, domains, and URLs are harder, because IMHO they can really only be used as spam clues if they're going to be shared. These could be done by comparing email addresses and URLs in the message to blacklists, and using the result as a feature for the Bayesian filter. This way, the spammer could include as many non-spam URLs and emails as they wanted without being able to tip the balance toward non-spam. The other things I was thinking of including are phone numbers and snail mail addresses, because these would cover a large number of the spams that don't have URLs or email addresses in the body. Almost all spams have *some* sort of contact information, unless they're chain letters, which can be filtered out by other means. All of these checks could be integrated into SpamAssassin (does SA already check URLs and stuff in the body against blacklists?), but I think it might be better to use them to generate more features for the Bayesian filters to use for classification... some sort script that just adds a bunch of keywords to the headers based on the result of network checks. This combined with a pre-trained global database that only handles features that are missing from the user's own database (ala spamprobe) would be great for a commercial spam filtering engine that requires no training time to be decent, and becomes very good with only a little training. I'll post some code eventually, but it would be great to get some feedback on the idea before I start coding. I am thinking about doing the relay statistics service first since that would be fairly widely useful. From MMARTINEZ at CSREES.USDA.GOV Sat Sep 20 15:09:43 2003 From: MMARTINEZ at CSREES.USDA.GOV (Martinez, Michael) Date: Sat Sep 20 15:08:33 2003 Subject: [spambayes-dev] Network checks Message-ID: <83EED274D3127740995A6B621B88F2E1DF7D2E@csrees-exchange.csrees.usda.gov> That's cool and all ... I mean, you've done a good job writing this book, but me and my users are happy with Spambayes. I just want some help developing a lightweight version. Mike -------------------------- Sent from my BlackBerry Wireless Handheld -----Original Message----- From: Sean R. Lynch To: spambayes-dev@python.org Sent: Sat Sep 20 14:24:52 2003 Subject: [spambayes-dev] Network checks In attempting to develop an integrated mail system that the average person can use, I've come to the conclusion that bayesian filtering alone just isn't enough. The main problem is the training time. Bayesian filters work best when they are trained on the user's mail *and* the training set is accurate. When experimenting on my dad, I have found the training set that he developed to be far from what you and I would consider accurate; he considers stuff he's interested in to be non-spam regardless of how spammy it is, and non-spam that he's not interested ends up in his spam folder. However, if something ends up in his spam quarantine, he will leave it there unless it's really something he's interested in, because of the extra effort to release it from quarantine. What this seems to indicate is the best way to develop a good training set for my dad is to have a good filter to begin with. SpamAssassin seems like it would be reasonable, but if I'm gonna use SpamAssassin, why not use its built-in Bayesian filter? The main reason I won't is that I really want to use SpamAssassin's network checks, and IMHO it's bad netizenship to run them more than once on the same message, and enough messages go to multiple users on my server that I'd really like to run SA as a content filter. I think that Bayesian filters really need to include their training time in performance analyses, rather than just comparing their ultimate performance after being trained. The "best" of the Bayesian filters seem to require the longest training times, and I don't really consider this to be a good thing, because "training time" really translates to both false positives and false negatives (an unsure is a false negative as far as I'm concerned). If IP addresses, email addresses (in the body), domains, and URLs could be shared among users of Bayesian filters, I think this would reduce training time significantly, because there are large numbers of each of them out there, but they have the potential to be the biggest spam clues. For relay IP addresses, I've been thinking of just keeping counts of spam and ham for each of them and using DNS TXT records to distribute this information. The counts would be submitted via a CGI or XMLRPC or something, and the DNS zone would be regenerated every hour. This would not be a blacklist and it wouldn't say anything technical or moral about the host listed, just that people marked this many non-spams and this many spams from this host. Email addresses, domains, and URLs are harder, because IMHO they can really only be used as spam clues if they're going to be shared. These could be done by comparing email addresses and URLs in the message to blacklists, and using the result as a feature for the Bayesian filter. This way, the spammer could include as many non-spam URLs and emails as they wanted without being able to tip the balance toward non-spam. The other things I was thinking of including are phone numbers and snail mail addresses, because these would cover a large number of the spams that don't have URLs or email addresses in the body. Almost all spams have *some* sort of contact information, unless they're chain letters, which can be filtered out by other means. All of these checks could be integrated into SpamAssassin (does SA already check URLs and stuff in the body against blacklists?), but I think it might be better to use them to generate more features for the Bayesian filters to use for classification... some sort script that just adds a bunch of keywords to the headers based on the result of network checks. This combined with a pre-trained global database that only handles features that are missing from the user's own database (ala spamprobe) would be great for a commercial spam filtering engine that requires no training time to be decent, and becomes very good with only a little training. I'll post some code eventually, but it would be great to get some feedback on the idea before I start coding. I am thinking about doing the relay statistics service first since that would be fairly widely useful. _______________________________________________ spambayes-dev mailing list spambayes-dev@python.org http://mail.python.org/mailman/listinfo/spambayes-dev From tim.one at comcast.net Sat Sep 20 18:35:35 2003 From: tim.one at comcast.net (Tim Peters) Date: Sat Sep 20 18:35:44 2003 Subject: [spambayes-dev] RE: [Spambayes] Are there plans for a daemonized or compiled version ofSpambayes? In-Reply-To: <83EED274D3127740995A6B621B88F2E101274212@csrees-exchange.csrees.usda.gov> Message-ID: [Martinez, Michael] > I've been running Spambayes on our agency Linux smtp gateway for > several months and very happy with its classification of spam. My > gateway is a qmail system and it pipes all incoming email through the > hammiefilter prior to delivery. Yup, running a distinct classifier for each email is a pretty crazy design for high-volume use. > However, a performance problem arises when the gateway gets hit during > peak hours with a lot of emails. What happens is the system slows down > tremendously, in part due to the number of python instances that get > forked in order to scan the emails. > > I was wondering: are there any plans to develop a lightweight, > daemonized version of Spambayes? The answer to that depends on you too: what are your plans? Python is a C program, and can be daemonized like any other. Note the project's pspam directory sets up a classifier backed by a ZODB database, which can be attached to via opening a ZEO connection. That would be a pleasant way to let multiple clients hook up at will to an always-running classifier. > In the same vein, are there plans to port it to C or another compiled > language? AFAICT, the most expensive part of running spambayes now is running Berkeley database lookups, and the Sleepycat bsddb implementation is already written in C. So profile before you presume to know what would help. Based on what I've measured, my interest in recoding any of the rest in C is nil. > How difficult would this be? It would be extremely tedious. You don't escape the needs for a database, for I/O, or for a variety of complex string-processing operations. The parts of the Python implementation that supply those to Python programmers are already coded in C, but much easier to use from Python than from C. From tim.one at comcast.net Sat Sep 20 19:15:07 2003 From: tim.one at comcast.net (Tim Peters) Date: Sat Sep 20 19:15:11 2003 Subject: [spambayes-dev] An interesting email with interesting ramifications In-Reply-To: Message-ID: [Tim Stone] > This scored a solid unsure for me, which is completely > understandable... The pertinent text is: > > Aoccdrnig to a rscheearch sdtuy at Cmabrigde Uinervtisy, it deosn't > mttaer in waht oredr the ltteers in a wrod are, the olny iprmoetnt > tihng is that the frist and lsat ltteer be at the rghit pclae. The > rset can be a total mses and you can sitll raed it wouthit porbelm. > Tihs is bcuseae the huamn mnid deos not raed ervey lteter by istlef, > but the wrod as a wlohe. What are the interesting ramifications? I don't expect to see spam start to take this up in a significant way (although I'm sure some will try for a while), because spam's goal is to sell, not just to be comprehensible. Some low-rent spam uses rampant misspelling already, without much success against us (the embedded URLs and the headers still paint them as spam). *Your* message scored like so for me: Combined Score: 0% (4.06812e-005) Internal ham score (*H*): 0.999935 Internal spam score (*S*): 1.66456e-005 thanks to the hammy "Hey! Tim Stone sent me a msg via spambayes-dev!" clues. From tim.one at comcast.net Sat Sep 20 19:24:58 2003 From: tim.one at comcast.net (Tim Peters) Date: Sat Sep 20 19:25:01 2003 Subject: [spambayes-dev] Auto-response text In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F130212AF07@its-xchg4.massey.ac.nz> Message-ID: [Tony] > When you get a chance, could Tim/Barry/Skip update the autoresponse > text? I've checked in the change to reply.txt, so you just have to do > whatever follows from that. (The change is just updating the Outlook > & source version numbers). Done. Thanks! I also updated the date (to "As of 2003-09-20 ..."). From tim.one at comcast.net Sat Sep 20 20:13:10 2003 From: tim.one at comcast.net (Tim Peters) Date: Sat Sep 20 20:13:14 2003 Subject: [spambayes-dev] Network checks In-Reply-To: Message-ID: [Sean R. Lynch] > In attempting to develop an integrated mail system that the average > person can use, I've come to the conclusion that bayesian filtering > alone just isn't enough. I doubt that the underlying classification technology is a factor. > The main problem is the training time. Bayesian filters work best when > they are trained on the user's mail *and* the training set is > accurate. That's true of any learning algorithm, of course -- I don't think there's anything specific to Bayesian filters in that (or to the non-Bayesian method spambayes uses!). When your decisions are based on what you've been taught, poor teaching is going to lead to poor decisions. > When experimenting on my dad, I have found the training set > that he developed to be far from what you and I would consider > accurate; I would agree that spambayes is most effective for computer geeks now. > he considers stuff he's interested in to be non-spam regardless of how > spammy it is, That's fine by us! spambayes has no built-in definitions of "ham" or "spam", and, to the contrary, is eager to *let* them mean whatever the end user wants them to mean. I would count it a failure if a message your interests you father got scored as spam by *his* classifier, regardless of what you think its classification should be. I don't care if he lives for what I'd call farm-porn spam -- if that's what he likes, that's ham to him. > and non-spam that he's not interested ends up in his spam folder. As above. > However, if something ends up in his spam quarantine, he will leave it > there unless it's really something he's interested in, because of the > extra effort to release it from quarantine. If he's training on messages he likes as ham, and on messages he doesn't like as spam, it sounds to me like he's got a darned good start. > What this seems to indicate is the best way to develop a good > training set for my dad So far you sound determined to overrule your father's judgment about what he does and doesn't like. In that case, spambayes may be the worst classifier available . > is to have a good filter to begin with. SpamAssassin seems like it > would be reasonable, but if I'm gonna use SpamAssassin, why not use > its built-in Bayesian filter? The main reason I won't is that I really > want to use SpamAssassin's network checks, and IMHO it's bad > netizenship to run them more than once on the same message, and > enough messages go to multiple users on my server that I'd really > like to run SA as a content filter. You could run SA as a rule-based content filter and disable its network checks. > I think that Bayesian filters really need to include their training > time in performance analyses, rather than just comparing their > ultimate performance after being trained. The "best" of the Bayesian > filters seem to require the longest training times, and I don't > really consider this to be a good thing, because "training time" > really translates to both false positives and false negatives (an > unsure is a false negative as far as I'm concerned). It's extremely dangerous to consider an unsure to be a false negative. They're unsure precisely because, based on the training a classifier has been given, the evidence in favor of ham is approximately equal to the evidence in favor of spam. It does appear to be the case that some people usually end up thinking an unsure is spam. This isn't universal, though! For example, it varies from time to time in my classifier, and the past few weeks I've considered most unsures to be ham (of the ones I could make up my own mind about! I throw away about half my unsures untrained-on, because I have no idea what they're on about -- might be ham, might be spam, but they're too confusing regardless to be worth the effort of researching). Different people get very different mixtures of email. If you're convinced that all unsures are really spam, move your spam cutoff lower. If you move it far enough so that it equals your ham cutoff, you'll never see another unsure again. I recommend against it, but suit yourself. > If IP addresses, email addresses (in the body), domains, and URLs > could be shared among users of Bayesian filters, I think this would > reduce training time significantly, because there are large numbers > of each of them out there, but they have the potential to be the > biggest spam clues. Sharing can be helpful for people with a shared sense of what spam is. For example, all the email sent to tech mailing lists via python.org goes thru a spambays classifier, and tens of thousands of mailing list recipients benefit from sharing what their mailing lists' classifiers' have taught about spam. This is appropriate, because most tech mailing lists have a very strong shared definition of spam (commerical messages of any type are considered spam, except for those highly specific to the mailing list topic (in which case the message necessarily contains lots of words hammy wrt the list topic)). This is a very easy form of sharing, of course, because it's confined to one classifier. Fancier schemes would require setting up distributed trust networks, etc. In the end, I bet that subsystem would dwarf the current spambayes codebase. IOW, lots of work. > For relay IP addresses, I've been thinking of just keeping counts of > spam and ham for each of them and using DNS TXT records to distribute > this information. The counts would be submitted via a CGI or XMLRPC or > something, and the DNS zone would be regenerated every hour. This > would not be a blacklist and it wouldn't say anything technical or > moral about the host listed, just that people marked this many > non-spams and this many spams from this host. > > Email addresses, domains, and URLs are harder, because IMHO they can > really only be used as spam clues if they're going to be shared. I don't agree. For example, the .biz domain appearing in a URL is a very strong spam clue in my classifier, and for what should be obvious reasons. It could be *better* if such things were shared, but they're of real use in an individual classifier already. > These could be done by comparing email addresses and URLs in the > message to blacklists, and using the result as a feature for the > Bayesian filter. Or any learning algorithm. > This way, the spammer could include as many non-spam URLs and emails > as they wanted without being able to tip the balance toward non-spam. > > The other things I was thinking of including are phone numbers and > snail mail addresses, because these would cover a large number of the > spams that don't have URLs or email addresses in the body. Almost all > spams have *some* sort of contact information, unless they're chain > letters, which can be filtered out by other means. > > All of these checks could be integrated into SpamAssassin (does SA > already check URLs and stuff in the body against blacklists?), I believe it can, yes. > but I think it might be better to use them to generate more features > for the Bayesian filters to use for classification... Some clues are so strong that they're (IMO) better suited to rule-based systems. Ours is a preponderance-of-evidence system, where no clue on its own is strong enough to drive the final decision. But if I can determine "Korean character set and from an open relay", then it's certain to be spam for me. spambayes isn't well suited to exploiting such killer-strong criteria. > some sort script that just adds a bunch of keywords to the headers < based on the result of network checks. This combined with a > pre-trained global database that only handles features that are > missing from the user's own database (ala spamprobe) would be great > for a commercial spam filtering engine that requires no training > time to be decent, and becomes very good with only a little training. You won't know that until you test an implementation in real life. Lots of people have lots of good-sounding arguments about what will and won't work. We did the work here of implementing and rigorously testing our ideas. Most of them failed in real life, BTW. > I'll post some code eventually, but it would be great to get some > feedback on the idea before I start coding. I am thinking about doing > the relay statistics service first since that would be fairly widely > useful. I do wish you luck! Win or lose, fighting spam has been a lot of fun for us here. From seanl at chaosring.org Sat Sep 20 22:23:26 2003 From: seanl at chaosring.org (Sean R. Lynch) Date: Sat Sep 20 22:23:35 2003 Subject: [spambayes-dev] RE: Network checks References: Message-ID: On Sat, 20 Sep 2003 20:13:10 -0400, Tim Peters wrote: > [Sean R. Lynch] > So far you sound determined to overrule your father's judgment about > what he does and doesn't like. In that case, spambayes may be the worst > classifier available . That's not the problem, and perhaps I wasn't explicit about what the real problem is. Learning filters work well because there are large differences between what computer geeks like us consider spam and ham, which you will see if you feed *your* spam and ham through a Kohonen network. There is a well-defined dividing line between them. However, if your ham looks a lot like spam and vice-versa, there is no such clean dividing line, and the thing just isn't going to be as good. So what I actually want to do is give a suggestion to my father that something was sent by a spammer (i.e. stick it in his quarantine folder until he starts releasing that sort of mail from quarantine). >> is to have a good filter to begin with. SpamAssassin seems like it >> would be reasonable, but if I'm gonna use SpamAssassin, why not use its >> built-in Bayesian filter? The main reason I won't is that I really want >> to use SpamAssassin's network checks, and IMHO it's bad netizenship to >> run them more than once on the same message, and enough messages go to >> multiple users on my server that I'd really like to run SA as a content >> filter. > > You could run SA as a rule-based content filter and disable its network > checks. I could, but half the reason I want SA is for the network checks, because the network checks are something that are specifically not available to a learning filter right now. >> I think that Bayesian filters really need to include their training >> time in performance analyses, rather than just comparing their ultimate >> performance after being trained. The "best" of the Bayesian filters >> seem to require the longest training times, and I don't really consider >> this to be a good thing, because "training time" really translates to >> both false positives and false negatives (an unsure is a false negative >> as far as I'm concerned). > > It's extremely dangerous to consider an unsure to be a false negative. > They're unsure precisely because, based on the training a classifier has > been given, the evidence in favor of ham is approximately equal to the > evidence in favor of spam. It does appear to be the case that some > people usually end up thinking an unsure is spam. This isn't universal, > though! For example, it varies from time to time in my classifier, and > the past few weeks I've considered most unsures to be ham (of the ones I > could make up my own mind about! I throw away about half my unsures > untrained-on, because I have no idea what they're on about -- might be > ham, might be spam, but they're too confusing regardless to be worth the > effort of researching). Different people get very different mixtures of > email. I think I misspoke a bit here. What I meant was, classifying a spam as an unsure is a false negative. > If you're convinced that all unsures are really spam, move your spam > cutoff lower. If you move it far enough so that it equals your ham > cutoff, you'll never see another unsure again. I recommend against it, > but suit yourself. Actually, I like unsures, and one idea I've been thinking about is to fall back to the rule-based filter if the learning filter gives an unsure. >> If IP addresses, email addresses (in the body), domains, and URLs could >> be shared among users of Bayesian filters, I think this would reduce >> training time significantly, because there are large numbers of each of >> them out there, but they have the potential to be the biggest spam >> clues. > > Sharing can be helpful for people with a shared sense of what spam is. > For example, all the email sent to tech mailing lists via python.org > goes thru a spambays classifier, and tens of thousands of mailing list > recipients benefit from sharing what their mailing lists' classifiers' > have taught about spam. This is appropriate, because most tech mailing > lists have a very strong shared definition of spam (commerical messages > of any type are considered spam, except for those highly specific to the > mailing list topic (in which case the message necessarily contains lots > of words hammy wrt the list topic)). > > This is a very easy form of sharing, of course, because it's confined to > one classifier. Fancier schemes would require setting up distributed > trust networks, etc. In the end, I bet that subsystem would dwarf the > current spambayes codebase. IOW, lots of work. I was thinking more along the lines of how Razor and DCC currently work with message digests, but with IP addresses, and giving scores along the lines of DCC or Razor's confidence measure. Yes, this is lots of work, but we're making smarter spammers, and we can either stay ahead of them or keep playing catch-up. The training time already makes a learning filter hard for my dad to use. When it has to be continually trained, that's going to be worse. As I said before, training time should be considered as part of the performance metric. >> Email addresses, domains, and URLs are harder, because IMHO they can >> really only be used as spam clues if they're going to be shared. > > I don't agree. For example, the .biz domain appearing in a URL is a > very strong spam clue in my classifier, and for what should be obvious > reasons. It could be *better* if such things were shared, but they're of > real use in an individual classifier already. How about .com domains? Eventually .biz won't be useful as a clue any more than .com is. If SA supports checking these against blacklists, that's great, and it can be implemented in a preprocessor for a learning filter. >> but I think it might be better to use them to generate more features >> for the Bayesian filters to use for classification... > > Some clues are so strong that they're (IMO) better suited to rule-based > systems. Ours is a preponderance-of-evidence system, where no clue on > its own is strong enough to drive the final decision. But if I can > determine "Korean character set and from an open relay", then it's > certain to be spam for me. spambayes isn't well suited to exploiting > such killer-strong criteria. That's an interesting point. However, some blacklists are better suited to rule-based systems than others. I would not, for example, use the XBL in a rule-based system, but I might use it in a preprocessor for a learning filter. SBL and the Wirehub Permblock, on the other hand, seem quite reliable, so it makes sense to use them in a rule-based classifier. >> some sort script that just adds a bunch of keywords to the headers > < based on the result of network checks. This combined with a >> pre-trained global database that only handles features that are missing >> from the user's own database (ala spamprobe) would be great for a >> commercial spam filtering engine that requires no training time to be >> decent, and becomes very good with only a little training. > > You won't know that until you test an implementation in real life. Lots > of people have lots of good-sounding arguments about what will and won't > work. We did the work here of implementing and rigorously testing our > ideas. Most of them failed in real life, BTW. Of course. I'm not telling you you should implement this because it will work, I'm saying here's my idea and I'd like feedback on it. I think what I really want is a good framework for combining a rule-based classifier with relatively stable rules (i.e. leaving out a lot of SA's body checks), a preprocessor that adds some tokens based on network checks, and a learning classifier. For example, it struck me as somewhat strange that SpamBayes bothers to de-obfuscate text that's broken up with ... when anything that uses such an approach is obviously spam. Likewise, something with an obfuscated URL is also almost certainly spam. At the same time, SB doesn't seem to bother using the existence of an obfuscated URL or text as a feature for classification. Basically it's kind of like SpamAssassin but *requiring* the learning classifier, and allowing the learning classifier to be per-user but having the rule-based classifier run per-message. Individual users shouldn't need to muck with the settings of the rule-based classifier or preprocessor because they should be quite generic. I could use the "fall back to rule-based when unsure" approach, but I'd actually like to do the combination in a way that will build on the strengths of both filters and reduce both false negatives and false positives. Actually, if I can work with the raw scores of both, and get some sort of confidence measure from the learning classifier based on how much training it's received, I can change what I consider to be unsure over time (the range of unsures needs to widen as the learning classifier gets more training), and this might actually be a useful approach. Sorry about writing yet another novel, but I've been thinking about this for a long time :) From T.A.Meyer at massey.ac.nz Sun Sep 21 03:23:09 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Sun Sep 21 03:23:32 2003 Subject: [spambayes-dev] RE: [Spambayes] Are there plans for a daemonized orcompiled version ofSpambayes? Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130359936B@its-xchg4.massey.ac.nz> [Michael Martinez] > I was wondering: are there any plans to develop a lightweight, > daemonized version of Spambayes? [Tim] > Note the project's pspam directory sets up a classifier > backed by a ZODB database, which can be attached to via > opening a ZEO connection. That would be a pleasant way to > let multiple clients hook up at will to an always-running classifier. Note also that the stuff in the pspam directory hasn't been touched in a long time and will need a bit of updating. Nothing all that difficult, though. I'm pretty sure that you can use the sb_client.py and sb_xmlrpcserver.py scripts for this, though, if you're willing to forgo ZODB/ZEO. (Or able to add a ZODBClassifier class to storage.py). =Tony Meyer From T.A.Meyer at massey.ac.nz Sun Sep 21 04:11:23 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Sun Sep 21 04:11:47 2003 Subject: [spambayes-dev] sourceforge logo in proxy pages? Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1303599370@its-xchg4.massey.ac.nz> > How evil would it be to add the source-forge logo on the > local "pages" pop3proxy et al use? Odd that no-one's answered this yet. I guess that means not all that evil! :) My only concern would be that at the moment you can do training/configuration will offline (for those that are using a dialup), and this would mean that the browser would try and get you to connect. I don't know how big a concern that would be, though. Whether it's something that sf would frown upon, I have no idea ;) =Tony Meyer From T.A.Meyer at massey.ac.nz Sun Sep 21 04:15:07 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Sun Sep 21 04:15:24 2003 Subject: [spambayes-dev] sb_* in config files? Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1303599371@its-xchg4.massey.ac.nz> [Tony] > Attached are the necessary diffs. [Skip] > I was thinking more along the lines of you sending me a note > once you'd checked things in. [...] > I think it's fine to just check in your changes and let us > know so we can test them. Somehow, I completely missed seeing this message until just now. I had figured that you ended up being too busy to get a chance, and a reply was sitting here all this time! Sorry I didn't drop the appropriate note in the end, although there was a period after I checked it in and before I built the release, when hopefully someone would have noticed a problem... > In any case, your first patch didn't apply correctly: [...] > probably because your diff has a mixture of line endings. I suppose that's because I'm using a windows line-ending source, but a cygwin diff program...I'll try and better figure that out for next time. =Tony Meyer From richie at entrian.com Sun Sep 21 05:10:19 2003 From: richie at entrian.com (Richie Hindle) Date: Sun Sep 21 05:10:36 2003 Subject: [spambayes-dev] sourceforge logo in proxy pages? In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1303599370@its-xchg4.massey.ac.nz> References: <1ED4ECF91CDED24C8D012BCF2B034F1303599370@its-xchg4.massey.ac.nz> Message-ID: <sgqqmv4ukd33sim4rmb2sl1nh839ias1pr@4ax.com> [Mark] > How evil would it be to add the source-forge logo on the > local "pages" pop3proxy et al use? [Tony] > Odd that no-one's answered this yet. I guess that means not all that > evil! :) No - it means I didn't see it first time round. IMHO it's definitley evil. > My only concern would be that at the moment you can do > training/configuration will offline (for those that are using a dialup), > and this would mean that the browser would try and get you to connect. > I don't know how big a concern that would be, though. A huge concern! I don't my modem to dial out whenever I touch the web UI. Nor do I want a broken image on all the pages. Plus it's a cheap trick. We're bigger than that. 8-) -- Richie Hindle richie@entrian.com From draconus at bigpond.com Sun Sep 21 05:13:10 2003 From: draconus at bigpond.com (Mal Thomas) Date: Sun Sep 21 05:13:18 2003 Subject: [spambayes-dev] Doco volunteer Message-ID: <000201c38020$9519ca30$7a6728cb@dadspc> I learn from the Dev Q in FAQ, that you may need the use of a doco writer from time to time. Funny that, I happen to have a little (too many years to want to remember; put it this way, do any of you remember the IBM129 ??) experience in Technical Writing. Am familiar with MS-Word, basic HTML. If there is anything you want done (provided its not going to take 30 hours a week), give me a hoi. Mal Thomas Perth Western Australia Happy User of the Outlook plug-in --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.520 / Virus Database: 318 - Release Date: 18/09/2003 -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes-dev/attachments/20030921/d9bbd152/attachment.html From pje at telecommunity.com Sun Sep 21 12:33:27 2003 From: pje at telecommunity.com (Phillip J. Eby) Date: Sun Sep 21 12:33:33 2003 Subject: [spambayes-dev] RE: Network checks In-Reply-To: <E1A16gb-0008HM-Bo@mail.python.org> Message-ID: <5.1.0.14.0.20030921122509.026255f0@mail.telecommunity.com> At 07:23 PM 9/20/03 -0700, "Sean R. Lynch" <seanl@chaosring.org> wrote: >I could, but half the reason I want SA is for the network checks, because >the network checks are something that are specifically not available to a >learning filter right now. If you don't mind running mail through both Spambayes and SA, note that you could configure SB to pay attention to the X-Spam-Status header that SpamAssassin inserts. That field contains a list of test names like NO_REAL_NAME and the like. If Spambayes tokenizes them, it will automatically learn which ones are hammy and which are spammy. Thus, you'll get the benefit of all of SpamAssassin's rules, with Spambayes' classifier. As for the daemonization, you might consider using ReadyExec to write a daemonized Spambayes. ReadyExec is a Python library and C helper program to do the equivalent of SA's spamc/spamd, but is generic for any Python program. Basically, to use ReadyExec, you write a "main" function that you'd like to be called for each invocation of the "client" program. It will see the client's sys.argv, stdin, stdout, and stderr. You run the readyexecd.py script, telling it a socket to listen on and the name of your "main" function for it to import. Then, to run the client, you run 'readyexec /path/to/socket whateverargs...'. Voila, instant daemonization. See http://readyexec.sourceforge.net/ for details. From mhammond at skippinet.com.au Sun Sep 21 18:19:19 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Sun Sep 21 18:19:17 2003 Subject: [spambayes-dev] Doco volunteer In-Reply-To: <000201c38020$9519ca30$7a6728cb@dadspc> Message-ID: <0dbf01c3808e$675aa150$f502a8c0@eden> MessageWe are pretty much up for anything! An excellent place to start would be just to look at our existing documentation and see what you hate. As an experienced tech writer, I assume there will be a fair bit in that category! The web site itself could do with a little reorganization and some enhancements - for example, some documentation on how the user would use the tool before they need to download it. The "pop3proxy" and related tools have almost no documentation for them at the moment (but unfortunately is a different application than Outlook, so you are probably not experienced with it. The Outlook docs probably need to look like they weren't written by the programmers. etc :) One thing though - we are more concerned with content than with style. We aren't that interested in changing the tools we use (which are basically HTML). If this still interests you, pick something that you would like to tackle, then send us another mail and we will point you in the right direction. Regards, Mark. -----Original Message----- From: spambayes-dev-bounces@python.org [mailto:spambayes-dev-bounces@python.org]On Behalf Of Mal Thomas Sent: Sunday, 21 September 2003 7:13 PM To: spambayes-dev@python.org Subject: [spambayes-dev] Doco volunteer I learn from the Dev Q in FAQ, that you may need the use of a doco writer from time to time. Funny that, I happen to have a little (too many years to want to remember; put it this way, do any of you remember the IBM129 ??) experience in Technical Writing. Am familiar with MS-Word, basic HTML. If there is anything you want done (provided its not going to take 30 hours a week), give me a hoi. Mal Thomas Perth Western Australia Happy User of the Outlook plug-in --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.520 / Virus Database: 318 - Release Date: 18/09/2003 -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes-dev/attachments/20030922/9874901e/attachment.html From MMARTINEZ at CSREES.USDA.GOV Mon Sep 22 08:44:31 2003 From: MMARTINEZ at CSREES.USDA.GOV (Martinez, Michael) Date: Mon Sep 22 08:43:24 2003 Subject: [spambayes-dev] Server side instructions for qmail Message-ID: <83EED274D3127740995A6B621B88F2E101274213@csrees-exchange.csrees.usda.gov> I noticed that the FAQ contains "Postfix notes" on a server side solution: (<http://spambayes.sourceforge.net/faq.html#are-there-plans-to-develop-a -server-side-spambayes-solution>). I'd like to submit the following "Qmail notes": Spambayes is installed on our agency's smtp / MX gateway. This machine runs Redhat Linux 7.1, qmail 1.03, qmail-scanner 1.16, and hbedv's "Antivir". Incoming mail is accepted by tcpserver and handed off to qmail-scanner. Qmail-scanner runs the virus software ("antivir") and hands the message to qmail. Qmail accepts local delivery on all domain-bound email. This email is delivered to ~alias/.qmail-default. (This is a standard configuration for qmail). ~alias/.qmail-default pipes each email through Spambayes. The .qmail-default is set up as follows: | /usr/local/spambayes/hammiefilter.py -d /usr/local/spambayes/.hammiedb | qmail-remote MSServer.csrees.usda.gov "$SENDER" $DEFAULT@csrees.usda.gov The permissions for the /usr/local/spambayes directory are set with the following command: chown -R qmailq.qmail /usr/local/spambayes As shown above, there are two pipes. The first pipes it through Spambayes. The second pipes it through qmail's remote delivery mechanism, which delivers the email to our Exchange Server. Delivered emails are filtered on a per-user basis in Outlook by setting the Rules to detect the Spambayes tag in the message header. If the tag reads "Spambayes-Classification: spam" then the email is either deleted or placed in the user's Spam folder. If it reads "Spambayes-Classification: unsure" then it's placed in the user's Unsure folder. If it reads "Spambayes-Classification: ham" then nothing special is done - it is delivered to the user's Inbox as normal. The user is given the choice of whether to set up his rules or not. Training of Spambayes is done in the following manner. Our users are given my email address and are told that, if they like, they may send emails to me that they consider spam, or that end up being "mis-classified" by the system. I created two directories: /usr/local/spambayes/training/spamdir /usr/local/spambayes/training/hamdir The emails sent to me by the users are retrieved from the qmail archive and placed into the appropriate directory. When I'm ready to do a training (which I do once or twice a month), I run the following commands: 1. I use a simple script to insert a blank From: line at the top of each email 2. I use a simple script to remove the qmail-scanner header from the bottom of each email. 3. uuencoded attachments are removed 4. cat /usr/local/spambayes/training/spamdir/* >> /usr/local/spambayes/training/spam 5. cat /usr/local/spambayes/training/hamdir/* >> /usr/local/spambayes/training/ham 6. /usr/local/spambayes/mboxtrain -d /usr/local/spambayes/.hammiedb -g /usr/local/spambayes/training/ham -s /usr/local/spambayes/training/spam (Step #6 can be run without shutting down qmail.) Most of the time, emails that are sent to me are clearly discernible as to whether they are spam or not. Occasionally there is an email that is borderline, or that one person considers spam but others don't. This is usually things like newsletter subscriptions or religious forums. In this case, I follow my own rule that if there is at least one person in the agency who needs or wants to receive this type of email, and as long as it is non-offensive, work-related, or there are a lot of people in the agency who have an interest in the topic, then I will either train it as ham, or, if it's already being tagged ham, leave it. An example of this are emails that discuss religious topics. There are a lot of people in this agency who are subscribed to religious discussion groups, so in my mind, it's good practice to make sure these messages are not tagged Spam. The above system works well on several levels. It's manageable because there's a central location for training and tagging spam (the smtp server). It's manageable also because our IT PC Support staff does not have to install Spambayes on each PC nor train all of our user's on its use. If a user does not like the way our system tags the emails, he does not have to set up his Outlook rules. But, we've had a good response from the users who are using their Rules. They're willing to put up with one or two mis-classified emails in order to have 95% of their junk email not in their Inbox. Michael Martinez Linux System Administrator ISTM/CSREES United States Department of Agriculture -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes-dev/attachments/20030922/060daf32/attachment.html From MMARTINEZ at CSREES.USDA.GOV Mon Sep 22 08:57:25 2003 From: MMARTINEZ at CSREES.USDA.GOV (Martinez, Michael) Date: Mon Sep 22 08:56:16 2003 Subject: [spambayes-dev] RE: [Spambayes] Are there plans for a daemonized or compiled versionofSpambayes? Message-ID: <83EED274D3127740995A6B621B88F2E101274214@csrees-exchange.csrees.usda.gov> Let's discuss this a little. I haven't looked in depth at the Spambayes code. But being the sysadmin I'm able to look at the processes running on the system. It appears that, when an email is scanned, multiple python threads get forked. Presumably this is because "hammiefilter.py" runs other *.py scripts, or exec's multiple pythons. (True? Not true?) Assuming that's what's happening, I guess I was wondering if it would be beneficial, in the sense of being less demanding on system resources, to consolidate all the routines into a single python thread? Is this feasible and worthwhile? I'm harping on the multiple-thread issue, because, the thing that happens with high email volume is that the number of python processes grows exponentially. Another thing I'm thinking about doing to mitigate the impact on resources, is running the hammiefilter in ramdisk. Suggestions are welcome. Michael Martinez Linux System Administrator ISTM/CSREES United States Department of Agriculture -----Original Message----- From: Tim Peters [mailto:tim.one@comcast.net] Sent: Saturday, September 20, 2003 6:36 PM To: Martinez, Michael Cc: spambayes@python.org; spambayes-dev@python.org Subject: RE: [Spambayes] Are there plans for a daemonized or compiled versionofSpambayes? [Martinez, Michael] > I've been running Spambayes on our agency Linux smtp gateway for > several months and very happy with its classification of spam. My > gateway is a qmail system and it pipes all incoming email through the > hammiefilter prior to delivery. Yup, running a distinct classifier for each email is a pretty crazy design for high-volume use. > However, a performance problem arises when the gateway gets hit during > peak hours with a lot of emails. What happens is the system slows down > tremendously, in part due to the number of python instances that get > forked in order to scan the emails. > > I was wondering: are there any plans to develop a lightweight, > daemonized version of Spambayes? The answer to that depends on you too: what are your plans? Python is a C program, and can be daemonized like any other. Note the project's pspam directory sets up a classifier backed by a ZODB database, which can be attached to via opening a ZEO connection. That would be a pleasant way to let multiple clients hook up at will to an always-running classifier. > In the same vein, are there plans to port it to C or another compiled > language? AFAICT, the most expensive part of running spambayes now is running Berkeley database lookups, and the Sleepycat bsddb implementation is already written in C. So profile before you presume to know what would help. Based on what I've measured, my interest in recoding any of the rest in C is nil. > How difficult would this be? It would be extremely tedious. You don't escape the needs for a database, for I/O, or for a variety of complex string-processing operations. The parts of the Python implementation that supply those to Python programmers are already coded in C, but much easier to use from Python than from C. From skip at pobox.com Mon Sep 22 12:08:09 2003 From: skip at pobox.com (Skip Montanaro) Date: Mon Sep 22 12:08:28 2003 Subject: [spambayes-dev] RE: [Spambayes] Are there plans for a daemonized or compiled versionofSpambayes? In-Reply-To: <83EED274D3127740995A6B621B88F2E101274214@csrees-exchange.csrees.usda.gov> References: <83EED274D3127740995A6B621B88F2E101274214@csrees-exchange.csrees.usda.gov> Message-ID: <16239.7785.146033.172889@montanaro.dyndns.org> Michael> I haven't looked in depth at the Spambayes code. But being the Michael> sysadmin I'm able to look at the processes running on the Michael> system. It appears that, when an email is scanned, multiple Michael> python threads get forked. Presumably this is because Michael> "hammiefilter.py" runs other *.py scripts, or exec's multiple Michael> pythons. (True? Not true?) Not true I don't think. Michael> Assuming that's what's happening, I guess I was wondering if it Michael> would be beneficial, in the sense of being less demanding on Michael> system resources, to consolidate all the routines into a single Michael> python thread? Is this feasible and worthwhile? I tried it quite awhile ago, but didn't code the front-end client in C, just Python. One problem is that you substitute network overhead for startup overhead. Assuming you maintain the long-running process as a Python program, you can try a couple things: 1 write a front-end client in Python and use a very simple protocol to communicate with the server (maybe a byte count followed by the message). The server would either spit back the message augmented with the usual scoring headers or just the score information, relying on the client to embellish the message. 2 If (and only if) the above isn't fast enough, write the simplest front-end client you can in C to avoid Python startup overhead. The first one will give you some idea what you're up against. Python's startup is probably the bottleneck, so I'm skeptical that the first option will gain you anything besides an architecture which is simple to experiment with. The Python-based server scores messages very quickly once the startup overhead is out of the way. Michael> Another thing I'm thinking about doing to mitigate the impact Michael> on resources, is running the hammiefilter in ramdisk. It probably won't buy you much, but it's a simple enough thing to try. Make sure you copy your database (pickle or bsddb file) to ramdisk as well. Skip From dave at dnh.sk.ca Mon Sep 22 12:31:13 2003 From: dave at dnh.sk.ca (Dave Hall) Date: Mon Sep 22 12:33:24 2003 Subject: [spambayes-dev] RE: [Spambayes] Are there plans for a daemonized or compiled versionofSpambayes? In-Reply-To: <16239.7785.146033.172889@montanaro.dyndns.org> References: <83EED274D3127740995A6B621B88F2E101274214@csrees-exchange.csrees.usda.gov> <16239.7785.146033.172889@montanaro.dyndns.org> Message-ID: <20030922163113.GA19595@dnh.sk.ca> On Mon, Sep 22, 2003 at 11:08:09AM -0500, Skip Montanaro wrote: > > Michael> I haven't looked in depth at the Spambayes code. But being the > Michael> sysadmin I'm able to look at the processes running on the > Michael> system. It appears that, when an email is scanned, multiple > Michael> python threads get forked. Presumably this is because > Michael> "hammiefilter.py" runs other *.py scripts, or exec's multiple > Michael> pythons. (True? Not true?) > > Not true I don't think. I think what is being seen is related to the way qmail delivers mail. There will be a single qmail-local process spawned to handle each local delivery. The qmail-local process will then spawn processes as directed by the .qmail file. So there will be a python running for each and every message being delivered at any given time. > > Michael> Assuming that's what's happening, I guess I was wondering if it > Michael> would be beneficial, in the sense of being less demanding on > Michael> system resources, to consolidate all the routines into a single > Michael> python thread? Is this feasible and worthwhile? > > I tried it quite awhile ago, but didn't code the front-end client in C, just > Python. One problem is that you substitute network overhead for startup > overhead. Assuming you maintain the long-running process as a Python > program, you can try a couple things: Another problem with a qmail setup using .qmail files is the long-running process will need to handle multiple concurrent messages. > > 1 write a front-end client in Python and use a very simple protocol to > communicate with the server (maybe a byte count followed by the > message). The server would either spit back the message augmented with > the usual scoring headers or just the score information, relying on the > client to embellish the message. > > 2 If (and only if) the above isn't fast enough, write the simplest > front-end client you can in C to avoid Python startup overhead. > > The first one will give you some idea what you're up against. Python's > startup is probably the bottleneck, so I'm skeptical that the first option > will gain you anything besides an architecture which is simple to experiment > with. The Python-based server scores messages very quickly once the startup > overhead is out of the way. An alternative would be to create a queue for messages to be classified and have qmail deliver into the queue. The long-running processes can poll the queue and process all the messages in the queue at once with one python. This will of course add some delay by serializing delivery but it would certainly decrease the number of concurrent pythons and be less of a PITA that a service of some sort. > > Michael> Another thing I'm thinking about doing to mitigate the impact > Michael> on resources, is running the hammiefilter in ramdisk. > > It probably won't buy you much, but it's a simple enough thing to try. Make > sure you copy your database (pickle or bsddb file) to ramdisk as well. Agreed, this stuff is likely already in file cache when the server is busy. -- Dave =============================================================== | <- You must be smarter than this stick to ride the Internet -Mike Handler =============================================================== From Steve.Tolkin at FMR.COM Mon Sep 22 15:39:22 2003 From: Steve.Tolkin at FMR.COM (Tolkin, Steve) Date: Mon Sep 22 15:39:36 2003 Subject: [spambayes-dev] Is max token length really 12 Message-ID: <176FDD8DC56B4946946917ECEBA4DA55CD4388@MSGBOSCLA2WIN.DMN1.FMR.COM> Skipped content of type multipart/alternative From tim.one at comcast.net Mon Sep 22 15:52:48 2003 From: tim.one at comcast.net (Tim Peters) Date: Mon Sep 22 15:52:52 2003 Subject: [spambayes-dev] Is max token length really 12 In-Reply-To: <176FDD8DC56B4946946917ECEBA4DA55CD4388@MSGBOSCLA2WIN.DMN1.FMR.COM> Message-ID: <LNBBLJKPBEHFEDALKOLCIEGNGBAB.tim.one@comcast.net> [Tolkin, Steve] > But when I look at the "Show spam clues for the > current message" I did not see a token of length 12 "VIRMMK100NTS." Everything in the quotes is part of the token, and there are 13 characters there. > (The trailing period that is there because this is at the end of a > sentence.) Is this token missed perhaps because it was all capital > letters, or a mixture of letters and digits, or because it is > immediately followed by a period? The last. spambayes doesn't distinguish between kinds of characters, except to distinguish between whitespace (blank, tab, newline, carriage return) and non-whitespace (everything else). "VIRMMK100NTS." isn't ignored, but it generates a synthesized skip:v 10 summary token instead. > This token would be the strongest indication that email was spam. So instead some other token was <wink>. From skip at pobox.com Mon Sep 22 15:54:14 2003 From: skip at pobox.com (Skip Montanaro) Date: Mon Sep 22 15:54:42 2003 Subject: [spambayes-dev] Is max token length really 12 In-Reply-To: <176FDD8DC56B4946946917ECEBA4DA55CD4388@MSGBOSCLA2WIN.DMN1.FMR.COM> References: <176FDD8DC56B4946946917ECEBA4DA55CD4388@MSGBOSCLA2WIN.DMN1.FMR.COM> Message-ID: <16239.21350.2133.481691@montanaro.dyndns.org> Steve> But when I look at the "Show spam clues for the current message" Steve> I did not see a token of length 12 "VIRMMK100NTS." (The trailing Steve> period that is there because this is at the end of a sentence.) Clues which score (by default) between 0.4 and 0.6 aren't displayed. I don't know if the Outlook plugin uses it, but you might try setting your minimum_prob_strength to 0.0 if you really want to see all clues. Wherever your ini file is, you should be able to add [Classifier] minimum_prob_strength: 0.0 A related option is max_discriminators. If your mail message has many (by default, more than 150) clues, only the most significant 150 will be considered when scoring messages. Skip From tim.one at comcast.net Mon Sep 22 16:04:04 2003 From: tim.one at comcast.net (Tim Peters) Date: Mon Sep 22 16:04:10 2003 Subject: [spambayes-dev] Is max token length really 12 In-Reply-To: <16239.21350.2133.481691@montanaro.dyndns.org> Message-ID: <LNBBLJKPBEHFEDALKOLCKEGPGBAB.tim.one@comcast.net> [Skip] > Clues which score (by default) between 0.4 and 0.6 aren't displayed. > I don't know if the Outlook plugin uses it, but you might try setting > your minimum_prob_strength to 0.0 if you really want to see all > clues. Wherever your ini file is, you should be able to add > > [Classifier] > minimum_prob_strength: 0.0 > > A related option is max_discriminators. If your mail message has > many (by default, more than 150) clues, only the most significant 150 > will be considered when scoring messages. Skip, the Outlook addin shows all the tokens in a message, regardless of any option settings. We usually don't show the exhaustive list of tokens in msgs to this list because they're usually uninteresting. The Outlook addin also shows just the significant tokens, in a section distinct from that in which it shows all tokens. From vanhorn at whidbey.com Mon Sep 22 16:35:49 2003 From: vanhorn at whidbey.com (G. Armour Van Horn) Date: Mon Sep 22 16:36:35 2003 Subject: [spambayes-dev] Training options on Configuration Page References: <83EED274D3127740995A6B621B88F2E101274213@csrees-exchange.csrees.usda.gov> Message-ID: <3F6F5D25.9D63893@whidbey.com> Greetings: On the Configuration page of the proxy, there is this paragraph which I found to be unclear: Suppress caching of bulk ham: Where message caching is enabled, this option suppresses caching of messages which are classified as ham and marked as 'Precedence: bulk' or 'Precedence: list'. If you subscribe to a high-volume mailing list then your 'Review messages' page can be overwhelmed with list messages, making training a pain. Once you've trained Spambayes on enough list traffic, you can use this option to prevent that traffic showing up in 'Review messages'. It's clear now, but when I was setting up the upgrade from 1.0a4 to 1.0a5 I missed it, or at least one of its implications. I was really surprised that one of my filters was seeing a very low quantity of Ham, and was trending toward a 20:1 imbalance before I started manually Discarding all Spam on that system. Maybe the pattern at our house is unusual, but I suspect that for most users the majority of their ham will be from lists they've joined. I'd like to suggest that the following set of buttons replace the current "Cache messages" and "Suppress caching of bulk ham" section: 1A Train only on Unsure, hide Ham and Spam on Review page 1B Train only on Unsure, show Ham and Spam on Review page 1C 1B plus default to Discard for Ham 1D 1B plus default to Discard for Spam 2A Train on all messages except Unsure 2B Train on all messages except Unsure, hide List Ham 2C Train on all messages except Unsure and List Ham, hide List Ham 2C is the effect of the current system with both caching options turned on. It leads to huge spam imbalances, at least on the system my wife filters through. (Mine doesn't, because I have a huge volume of admin ham.) If I had that array of choices, I would recommend 2A is the default for a new database, 1A as the default once there were adequate trained messages, and 1C and 1D would be used for a short period to address any imbalance. The other options are only to cover all possibilities, and might not have any real-world application. (1B is the current default with "Suppress caching of list ham" off.) Also, from the user perspective, nobody cares at all about caching. To the user these are Training Options, I would recommend the section be renamed, the typical user options (the array above) should go first, and the details about the files that support the training process follow the actual controls over training. Van -- ---------------------------------------------------------- Sign up now for Quotes of the Day, a handful of quotations on a theme delivered every morning. Enlightenment! Daily, for free! mailto:twisted@whidbey.com?subject=Subscribe_QOTD For web hosting and maintenance, visit Van's home page: http://www.domainvanhorn.com/van/ ---------------------------------------------------------- -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes-dev/attachments/20030922/9166b2a2/attachment-0001.html From gward at python.net Mon Sep 22 21:05:14 2003 From: gward at python.net (Greg Ward) Date: Mon Sep 22 21:05:20 2003 Subject: [spambayes-dev] RE: [Spambayes] Are there plans for a daemonized or compiled versionofSpambayes? In-Reply-To: <83EED274D3127740995A6B621B88F2E101274214@csrees-exchange.csrees.usda.gov> References: <83EED274D3127740995A6B621B88F2E101274214@csrees-exchange.csrees.usda.gov> Message-ID: <20030923010514.GA21245@cthulhu.gerg.ca> On 22 September 2003, Martinez, Michael said: > Another thing I'm thinking about doing to mitigate the impact on > resources, is running the hammiefilter in ramdisk. That's almost certain to be useless. Real operating systems (such as Linux) cache disk reads very nicely. I suspect Skip's suggestion of a lightweight front-end written in C is more likely to help. (OTOH, mail.python.org processes 30,000-50,000 messages per day, and a large percentage of those (40? 50? 60?) make it past the simple junk filters in Exim's config file and have to be scanned by spambayes. It's done with a Python interpreter embedded in Exim, so we save some of the overhead of starting up Python, but certainly not all. And mail.python.org is generally not heavily loaded. It sagged pretty badly under the SoBig onslaught a few weeks ago, but has been fairly healthy since then.) Greg -- Greg Ward <gward@python.net> http://www.gerg.ca/ "Question authority!" "Oh yeah? Says who?" From T.A.Meyer at massey.ac.nz Mon Sep 22 22:30:56 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Mon Sep 22 22:31:18 2003 Subject: [spambayes-dev] Is max token length really 12 Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1303599A11@its-xchg4.massey.ac.nz> [Tim] > Skip, the Outlook addin shows all the tokens in a message, > regardless of any option settings. [...] The Outlook addin also > shows just the significant tokens, in a section distinct from that > in which it shows all tokens. FWIW, the current cvs web interface now does this too. The "Clues" link shows the latter, and the "Tokens" link shows the former. (In fact, the web interface goes one better <wink>, and shows the clues/score *as the message was originally scored*, if this is available, in addition to the current situation). =Tony Meyer From T.A.Meyer at massey.ac.nz Mon Sep 22 23:02:42 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Mon Sep 22 23:02:57 2003 Subject: [spambayes-dev] Training options on Configuration Page Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1303599A42@its-xchg4.massey.ac.nz> > On the Configuration page of the proxy, there is > this paragraph which I found to be unclear: > Suppress caching of bulk ham: [...] What was unclear? IOW, what did you *think* it meant? > I'd like to suggest that the following set of buttons > replace the current "Cache messages" and "Suppress caching > of bulk ham" section: > > 1A Train only on Unsure, hide Ham and Spam on Review page > 1B Train only on Unsure, show Ham and Spam on Review page > 1C 1B plus default to Discard for Ham > 1D 1B plus default to Discard for Spam > 2A Train on all messages except Unsure > 2B Train on all messages except Unsure, hide List Ham > 2C Train on all messages except Unsure and List Ham, hide List Ham One issue with this is that is means *major* changes to the way the configuration page works. What is currently does is present several groups of Option objects to the user, allowing them to change one or more of their values (and then saves that to the file). Your suggestion means that it would have to have an additional layer above, where some options presented mean changes to multiple Option objects. I think that the wording above is unclear. You're *not* setting SpamBayes to do any training - the training is all (at the moment, and excluding the plug-in) done manually. What you're changing is which messages are displayed in the review page. This would be more accurate: 1A Hide Ham and Spam, show Unsure 1B Show all. 1C Show all, default to Discard for Ham 1D Show all, default to Discard for Spam 2A Show Ham and Spam, hide Unsure 2B Show Ham and Spam, hide Unsure, don't cache bulk-ham 2C Show Ham and Spam, hide Unsure, don't cache bulk-ham You'll see that there is no difference between 2B and 2C. You can't train on messages that you don't display, so 2B can't be done. (Unless you use the 'find message' query). At the moment 1B is the default. You can't currently stop any category displaying, so 1A, 2A-C are not possible. If options were added to do so, then I think: In review page, show messages classified as: [x] Ham [x] Spam [x] Unsure would be much clearer (and should be an advanced option). This means you retain control over what is displayed - you could, for example, have 'show ham and unsure, hide spam' (if the corpus was tilted towards spam, this would be a good choice). Presenting combinations of all the different options ends up looking very confusing when the number of combinations rises, and makes it hard to make a single change. There's also quite a difference between not showing the messages on the review page, and not caching them. Not caching them saves disk space, but means that you can't correct a misclassified message (you could if you cached, but didn't display, with the 'find message' query, or with the smtpproxy). > 2C is the effect of the current system with both caching > options turned on. It leads to huge spam imbalances, at least > on the system my wife filters through. Why then, enable the option that is off by default? There's a reason that that's the default <wink>. > If I had that array of choices, I would recommend 2A > is the default for a new database, 1A as the default > once there were adequate trained messages, and 1C and > 1D would be used for a short period to address any imbalance. As Tim would say, test it! There hasn't been enough testing to show what the 'ideal' training method is. For example, I would never recommend 2A, 2B, or 2C as training regimes - to me, it's always worth training on an unsure. I'm not sure I like changing options on a user's behalf, either. What do you do for advanced users, who know what they are doing? I'm sure they wouldn't like the options just changing themselves. *If* a training regime is ever identified as being 'the best', then I think the best move would be to set the defaults to match that, and if there are (as in your suggestion) extra things to do in certain circumstances, present a warning/suggestion to the user. (Like the plug-in warns the user if there is an imbalance). > Also, from the user perspective, nobody cares at all about caching. > To the user these are Training Options, I would recommend the section > be renamed, But doesn't that give the impression that training will be done, depending on what options you set? No training will be done unless you manually indicate a message to train. IMO, the solution to this would be to move the 3 'no_cache' options to the advanced page (perhaps also clarifying the wording, if suitable suggestions are made). If the user doesn't understand the options, then they shouldn't be setting them, just like the other advanced ones. Again IMO, I don't think that the web interface will ever be simple enough to use for a certain class of users. Right at one end are people who will never get that things have to be trained, and will have to use either a pre-trained database, or a non-training system. Beside them are a group who I think will only be able to manage drag'n'drop style training, or where clicking a button indicates that the currently selected message (in the mail client) is good/bad (people who manage to use the plug-in). Once it's finished (and once twisted is stable) the pop3dnd script will provide the former; the latter is very difficult without an actual integrated plug-in. Those that can manage a bit more can use the web interface - and those people are probably clued enough to understand that there is a difference between messages displayed to review, and messages that are trained on. =Tony Meyer From MMARTINEZ at CSREES.USDA.GOV Tue Sep 23 09:46:25 2003 From: MMARTINEZ at CSREES.USDA.GOV (Martinez, Michael) Date: Tue Sep 23 09:45:11 2003 Subject: [spambayes-dev] RE: [Spambayes] Are there plans for a daemonizedor compiled versionofSpambayes? Message-ID: <83EED274D3127740995A6B621B88F2E1DF7D40@csrees-exchange.csrees.usda.gov> > -----Original Message----- > From: Greg Ward [mailto:gward@python.net] > Sent: Monday, September 22, 2003 9:05 PM > To: spambayes-dev@python.org > Subject: Re: [spambayes-dev] RE: [Spambayes] Are there plans for a > daemonizedor compiled versionofSpambayes? > > > On 22 September 2003, Martinez, Michael said: > > Another thing I'm thinking about doing to mitigate the impact on > > resources, is running the hammiefilter in ramdisk. > > That's almost certain to be useless. Real operating systems (such as > Linux) cache disk reads very nicely. Yeah that's what I've been told. > > I suspect Skip's suggestion of a lightweight front-end written in C > is more likely to help. I'll look into it. > mail.python.org is generally not heavily loaded. It sagged pretty badly > under the SoBig onslaught a few weeks ago, but has been fairly healthy > since then.) My system sagged as well during that period ... slowed to a crawl in fact. It would take me ten minutes to log on, and five minutes for each shell command to execute. This is the reason I'm looking into reducing the load from Spambayes. Mike > > Greg > -- > Greg Ward <gward@python.net> http://www.gerg.ca/ > "Question authority!" "Oh yeah? Says who?" > > _______________________________________________ > spambayes-dev mailing list > spambayes-dev@python.org > http://mail.python.org/mailman/listinfo/spambayes-dev From skip at pobox.com Tue Sep 23 12:50:34 2003 From: skip at pobox.com (Skip Montanaro) Date: Tue Sep 23 12:50:49 2003 Subject: [spambayes-dev] Re: [Spambayes] sb_xmlrpcserver.py problems... In-Reply-To: <20030922220227.GA19503@thermal> References: <20030922220227.GA19503@thermal> Message-ID: <16240.31194.562869.176739@montanaro.dyndns.org> Jeremy> Now I'm looking to setup SpamBayes on a larger scale (for an Jeremy> ISP). I see that the combination of sb_client.py and Jeremy> sb_xmlrpcserver.py are the "spamc/spamd" of SpamBayes. Jeremy> However, the docs are sb_xmlrpcserver.py are pretty sparse. True. Jeremy> I've looked around the CVS tree and searched on-line but haven't Jeremy> found much help yet. When I try running a server: Jeremy> /usr/local/bin/sb_xmlrpcserver.py -d 127.0.0.1:65000 Jeremy> I get: Jeremy> Traceback (most recent call last): Jeremy> File "/usr/local/bin/sb_xmlrpcserver.py", line 42, in ? Jeremy> DEFAULTDB = hammie.DEFAULTDB Jeremy> AttributeError: 'module' object has no attribute 'DEFAULTDB' It appears it's been quite awhile since anyone actually used sb_xmlrpcserver.py. I took a stab at updating it to reflect current reality. The result is attached as a context diff against current CVS. It seems to work (that is, it doesn't generate a traceback) for me, however no headers are inserted in the message returned to the client. Perhaps someone else will spot that problem. Skip -------------- next part -------------- A non-text attachment was scrubbed... Name: sb_xmlrpcserver.diff Type: application/octet-stream Size: 5326 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes-dev/attachments/20030923/6e8c8a9b/sb_xmlrpcserver.obj From skip at pobox.com Tue Sep 23 14:23:41 2003 From: skip at pobox.com (Skip Montanaro) Date: Tue Sep 23 14:23:52 2003 Subject: [spambayes-dev] pop3proxy_service prob Message-ID: <16240.36781.948367.899996@montanaro.dyndns.org> I'm going to try and help a Windows guy here get started with pop3proxy running as a Windows service later today. I know nothing about Windows services though, so I thought I'd give it a whirl first. After installing SpamBayes, I executed c:\Python23\python.exe pop3proxy_service.py install then went to the Windows (this is on Win2k BTW) Services dialog, changed it from Manual to Automatic and then started it. I then visited http://localhost:8880/ and went to the configuration page. I wanted to change from database to pickle storage, so I clicked the advanced configuration button and then selected "No" for "Use database for storage". I then clicked the "Save Advanced Options" button, but got a 500 "Internal Server Error". I stopped the service then ran pop3proxy_service.py with the "debug" option and repeated the sequence. It gave me this output: Debugging service pop3proxy Loading database...Info 0x40001002 - The pop3proxy service has started as user 'Administrator', using config file 'C:\Documents and Settings\Administrator\Application Data\SpamBayes\Proxy\bayescustomize.ini'. User interface url is http://localhost:8880/ Loading database... error: uncaptured python exception, closing channel <spambayes.Dibbler._HTTPHandler connected at 0x12e7fd0> (socket.error:(10054, 'Connection reset by peer') [c:\python23\lib\asynchat.py|initiate_send|218] [c:\python23\lib\asyncore.py|send|334]) Is there a problem switching from database to pickle or have I encountered some other problem? I can see where you might have a problem switching from database to pickle without changing the name of the storage file. Unfortunately, you can't change the storage file name from the Advanced Configuration page. Skip From mhammond at skippinet.com.au Tue Sep 23 18:50:48 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Tue Sep 23 18:50:45 2003 Subject: [spambayes-dev] Removing an item from the archive Message-ID: <11f601c38225$22729000$f502a8c0@eden> [I'm CCing Barry, as I guess he is the only one who knows!] I just received a mail from a person who posted to the mailing list, but was not aware it was a mailing list, so used a "sig" with personal details. Now that mail is in the archive, and he would like it removed. Is there any way we can do that? The 2 mails in question are http://mail.python.org/pipermail/spambayes/2003-July/005920.html and http://mail.python.org/pipermail/spambayes/2003-July/005921.html Mark. From barry at python.org Tue Sep 23 18:53:10 2003 From: barry at python.org (Barry Warsaw) Date: Tue Sep 23 18:53:15 2003 Subject: [spambayes-dev] Removing an item from the archive In-Reply-To: <11f601c38225$22729000$f502a8c0@eden> References: <11f601c38225$22729000$f502a8c0@eden> Message-ID: <1064357589.1958.7.camel@anthem> On Tue, 2003-09-23 at 18:50, Mark Hammond wrote: > [I'm CCing Barry, as I guess he is the only one who knows!] > > I just received a mail from a person who posted to the mailing list, but was > not aware it was a mailing list, so used a "sig" with personal details. Now > that mail is in the archive, and he would like it removed. > > Is there any way we can do that? > > The 2 mails in question are > http://mail.python.org/pipermail/spambayes/2003-July/005920.html and > http://mail.python.org/pipermail/spambayes/2003-July/005921.html Not easily. We have to do it by hand on the server and if the archives ever get regenerated, they'll show up again. I'll put it on the list, but I don't expect many cycles to deal with this any time soon. postmaster@python.org /might/ give you more luck. -Barry From T.A.Meyer at massey.ac.nz Tue Sep 23 19:04:01 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Tue Sep 23 19:04:20 2003 Subject: [spambayes-dev] RE: [Spambayes] pop3proxy_tray.pyw Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1303599C86@its-xchg4.massey.ac.nz> > I think the OP is asking why the tray app doesn't update its > status when shutting down via the web interface. The answer > is because I didn't think about that. Ah, I see. I brought this up when Mark posted the test binaries including the tray app. IMO, it's not worth fixing this yet - it would be better to wait until the whole start/stop/prepare stuff is fully thought through and figured out (which was also discussed). (It'll probably have to end up being something like being able to register 'listeners' with the UI that are informed when certain things happen, like shutting down). =Tony Meyer From mhammond at skippinet.com.au Tue Sep 23 19:30:13 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Tue Sep 23 19:31:28 2003 Subject: [spambayes-dev] RE: [Spambayes] pop3proxy_tray.pyw In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1303599C86@its-xchg4.massey.ac.nz> Message-ID: <120501c3822a$d2480b40$f502a8c0@eden> I don't think it is that bad, or needs any listeners. I believe the patch in https://sourceforge.net/tracker/index.php?func=detail&aid=809008&group_id=61 702&atid=498105 should work correctly in this regard. Mark. > > I think the OP is asking why the tray app doesn't update its > > status when shutting down via the web interface. The answer > > is because I didn't think about that. > > Ah, I see. I brought this up when Mark posted the test binaries > including the tray app. IMO, it's not worth fixing this yet > - it would > be better to wait until the whole start/stop/prepare stuff is fully > thought through and figured out (which was also discussed). > > (It'll probably have to end up being something like being able to > register 'listeners' with the UI that are informed when certain things > happen, like shutting down). > > =Tony Meyer > > _______________________________________________ > spambayes-dev mailing list > spambayes-dev@python.org > http://mail.python.org/mailman/listinfo/spambayes-dev -------------- next part -------------- A non-text attachment was scrubbed... Name: winmail.dat Type: application/ms-tnef Size: 2164 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes-dev/attachments/20030924/35b85655/winmail.bin From tim.one at comcast.net Tue Sep 23 21:18:40 2003 From: tim.one at comcast.net (Tim Peters) Date: Tue Sep 23 21:18:43 2003 Subject: [spambayes-dev] Removing an item from the archive In-Reply-To: <11f601c38225$22729000$f502a8c0@eden> Message-ID: <LNBBLJKPBEHFEDALKOLCIEKNGBAB.tim.one@comcast.net> [Mark Hammond] > [I'm CCing Barry, as I guess he is the only one who knows!] > > I just received a mail from a person who posted to the mailing list, > but was not aware it was a mailing list, so used a "sig" with > personal details. Now that mail is in the archive, and he would like > it removed. > > Is there any way we can do that? > > The 2 mails in question are > http://mail.python.org/pipermail/spambayes/2003-July/005920.html and > http://mail.python.org/pipermail/spambayes/2003-July/005921.html Sorry, he's dead meat. It doesn't even matter whether he can talk Barry into hand-editing the python.org archives, because public mailing lists are archived all over the web. For example, see http://article.gmane.org/gmane.mail.spam.spambayes.general/5908 for another copy of the article at the first link above. OTOH, if he's looking for a hobby, it might be fun for him to track down all the copies that get made over the coming decades <wink>. tough-luck-tough-love-ly y'rs - tim From kennypitt at hotmail.com Wed Sep 24 00:32:25 2003 From: kennypitt at hotmail.com (Kenny Pitt) Date: Wed Sep 24 00:32:27 2003 Subject: [spambayes-dev] New Outlook Dialogs Problem In-Reply-To: <12d001c365da$29f8e840$f502a8c0@eden> References: <12d001c365da$29f8e840$f502a8c0@eden> Message-ID: <3F439F83.1070300@hotmail.com> Mark Hammond wrote: >>A fix was checked in for the exception caused by trying to enable >>filtering without enough training data, but I haven't heard >>any further >>public discussion of the second part about disabling the checkbox. I >>noticed that it is still not disabled in the latest dialog >>updates that >>Adam just checked in. >> >>Was it decided whether or not we want to do this? If anyone is >>interested, I will gladly update the patch that I submitted >>for this so >>that it works with Adam's new dialogs. > > > Yes, please do. I don't think we know for sure exactly what we want, but > will know it when we see it <wink>. OK, here it is. This is a very simple change to the FilterEnableProcessor in dialog_map.py. It disables the checkbox if GetDisabledReason() reports that filtering cannot be enabled, and it rechecks each time an option is changed so that the checkbox is reenabled as soon as appropriate training and option settings are done. -- Kenny Pitt -------------- next part -------------- Index: dialog_map.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/Outlook2000/dialogs/dialog_map.py,v retrieving revision 1.11 diff -u -r1.11 dialog_map.py --- dialog_map.py 19 Aug 2003 01:33:51 -0000 1.11 +++ dialog_map.py 20 Aug 2003 16:11:49 -0000 @@ -63,6 +63,14 @@ pass class FilterEnableProcessor(BoolButtonProcessor): + def OnOptionChanged(self, option): + self.Init() + + def Init(self): + BoolButtonProcessor.Init(self) + reason = self.window.manager.GetDisabledReason() + win32gui.EnableWindow(self.GetControl(), reason is None) + def UpdateValue_FromControl(self): check = win32gui.SendMessage(self.GetControl(), win32con.BM_GETCHECK) if check: From T.A.Meyer at massey.ac.nz Wed Sep 24 01:52:37 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Wed Sep 24 01:54:07 2003 Subject: [spambayes-dev] pop3proxy_service prob Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1303599F22@its-xchg4.massey.ac.nz> > "Save Advanced Options" button, but got a 500 "Internal Server Error". Did the 500 page have a traceback on it? > Is there a problem switching from database to pickle > or have I encountered some other problem? I can see where > you might have a problem switching from database to pickle > without changing the name of the storage file. Unfortunately, > you can't change the storage file name from the Advanced Configuration page. I think this is the case. I tried it myself and I get this traceback (on the 500 page): """ 500 Server error Traceback (most recent call last): File "c:\spambayes_1_0\spambayes\Dibbler.py", line 453, in found_terminator getattr(plugin, name)(**params) File "c:\spambayes_1_0\spambayes\UserInterface.py", line 684, in onChangeopts self.reReadOptions() File "c:\spambayes_1_0\spambayes\ProxyUI.py", line 519, in reReadOptions state = self.state_recreator() File "C:\spambayes_1_0\scripts\sb_server.py", line 735, in _recreateState prepare(state) File "C:\spambayes_1_0\scripts\sb_server.py", line 751, in prepare state.createWorkers() File "C:\spambayes_1_0\scripts\sb_server.py", line 622, in createWorkers self.bayes = storage.open_storage(filename, self.useDB) File "c:\spambayes_1_0\spambayes\storage.py", line 677, in open_storage return klass(data_source_name) File "c:\spambayes_1_0\spambayes\storage.py", line 90, in __init__ self.load() File "c:\spambayes_1_0\spambayes\storage.py", line 113, in load tempbayes = pickle.load(fp) EOFError """ For the moment you could just change the option manually in the config file before you start up. As an actual fix, I suppose the friendliest way to handle this would be to present a page that said "You have an existing [pickle/dbm] database with x spam and x ham. Would you like to convert or delete this?" and takes the appropriate action given the response. =Tony Meyer From T.A.Meyer at massey.ac.nz Wed Sep 24 02:15:28 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Wed Sep 24 02:15:44 2003 Subject: [spambayes-dev] Server side instructions for qmail Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1303599F2D@its-xchg4.massey.ac.nz> > I'd like to submit the following "Qmail notes": [...] Thanks! Check out: <http://spambayes.sf.net/server_side.html> (Is it just me, or is spambayes.org down?) =Tony Meyer From T.A.Meyer at massey.ac.nz Wed Sep 24 02:23:57 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Wed Sep 24 02:24:09 2003 Subject: [spambayes-dev] RE: [Spambayes] pop3proxy_tray.pyw Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1303599F33@its-xchg4.massey.ac.nz> > I don't think it is that bad, or needs any listeners. I > believe the patch in > https://sourceforge.net/tracker/index.php?func=detail&aid=8090 > 08&group_id=61702&atid=498105 should work correctly in this regard. So it does :) Is there any reason that this can't be checked into the main branch now that 1.0a6 is out? =Tony Meyer From T.A.Meyer at massey.ac.nz Wed Sep 24 02:36:44 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Wed Sep 24 02:37:15 2003 Subject: [spambayes-dev] RE: [Spambayes] sb_xmlrpcserver.py problems... Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1303599F38@its-xchg4.massey.ac.nz> [Skip] > It appears it's been quite awhile since anyone actually used > sb_xmlrpcserver.py. I took a stab at updating it to reflect > current reality. The result is attached as a context diff > against current CVS. It seems to work (that is, it doesn't > generate a traceback) for me, however no headers are inserted > in the message returned to the client. Perhaps someone else > will spot that problem. The headers are inserted for me if I apply your diff, so I'm going to check it in. Jeremy: does it work for you with Skip's diff (note that the docstring still won't print properly; I'll fix that, too). =Tony Meyer From T.A.Meyer at massey.ac.nz Wed Sep 24 03:03:20 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Wed Sep 24 03:03:26 2003 Subject: [spambayes-dev] RE: [Spambayes] error on attempting to export data Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1303599F3D@its-xchg4.massey.ac.nz> > Anyway, I installed the python2.3 windows version. Then I > unzipped the spambayes into it's own directory. I then went > to that directory, and typed exactly what you did. I get: > > C:\spambayes>c:\python23\python.exe setup.py install > > Traceback (most recent call last): > File "setup.py", line 27, in ? > from spambayes import __version__ > ImportError: No module named spambayes Good grief, he's right! setup.py can't import from spambayes, can it? Won't this only work if they've already had spambayes installed, or if the archive is on the pythonpath? I've seen an error like this before, and figured that the user was doing something weird... Could someone confirm that this is indeed a bad thing, and correct it? I presume we just have the version explicitly written there... =Tony Meyer From richie at entrian.com Wed Sep 24 03:20:59 2003 From: richie at entrian.com (Richie Hindle) Date: Wed Sep 24 03:21:14 2003 Subject: [spambayes-dev] RE: [Spambayes] error on attempting to export data In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1303599F3D@its-xchg4.massey.ac.nz> References: <1ED4ECF91CDED24C8D012BCF2B034F1303599F3D@its-xchg4.massey.ac.nz> Message-ID: <qbh2nvc3rug81padqtpkhdedgc1qdo6sgg@4ax.com> > setup.py can't import from spambayes, can it? C:\temp> mkdir x C:\temp> cd x C:\temp\x> cat > __init__.py version=1 C:\temp\x> cd .. C:\temp> python -c "from x import version; print version" 1 Yes it can. 8-) -- Richie Hindle richie@entrian.com From T.A.Meyer at massey.ac.nz Wed Sep 24 05:52:32 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Wed Sep 24 05:52:45 2003 Subject: [spambayes-dev] RE: [Spambayes] error on attempting to export data Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1303599F4C@its-xchg4.massey.ac.nz> > > setup.py can't import from spambayes, can it? [example snipped] > Yes it can. 8-) Ok, I see that now. Oh well, it made sense in my head ;) So does anyone know why some people get this error? This isn't the first time I've seen it reported. Basically, they run "python.exe setup.py install" and get a "no such package as spambayes" error. =Tony Meyer From tim at zope.com Wed Sep 24 12:01:04 2003 From: tim at zope.com (Tim Peters) Date: Wed Sep 24 12:09:08 2003 Subject: [spambayes-dev] RE: [Python-Dev] 2.3.1 is (almost) a go In-Reply-To: <200309241550.h8OFojlx031706@localhost.localdomain> Message-ID: <BIEJKCLHCIOIHAGOKOLHGEPPHEAA.tim@zope.com> Adding spambayes-dev in case anyone there has more reliable info. [Anthony Baxter] > I haven't been following the spambayes lists too closely. Are there > concrete problems with bsddb that are cropping up, or just a general > wariness of it? > > If there _is_ a problem with bsddb, it needs to be addressed. Too > many things depend on it. Reports of database corruption are common in spambayes. At least one knowledgable tester reported his problems went away after moving to a recent Sleepycat release (4.1.25, IIRC). I'm not sure any reports have come from people using the Outlook client (I was happy using a dict, but switched all 3 of my Outlook classifiers to use Berkeley instead, *hoping* to provoke a problem -- but no luck yet). They seem to come from non-Outlook people using Berkeley for the message info database. Richie got a whittled down threaded test that fails on Windows and Linux, and there's already a (Python) bug report open on that; it's not thought to be relevant to how spambayes uses Berkeley, though. From skip at pobox.com Wed Sep 24 12:11:07 2003 From: skip at pobox.com (Skip Montanaro) Date: Wed Sep 24 12:11:15 2003 Subject: [spambayes-dev] pop3proxy error Message-ID: <16241.49691.171848.826082@montanaro.dyndns.org> I helped a user here set up the POP3 proxy as a Windows service today. The environment is Python 2.3, the latest Win32all (1.57?), and the 1.0a6 source distribution. Things seem to be going fairly well, however we encountered two tracebacks. I'm not all that familiar with using it in that environment, so am at a loss to understand what they mean. The first error occurred when poking the Train button the first time: Spambayes Web Interface: Home > Review Training... Trained on 20 messages. Saving... 500 Server error Traceback (most recent call last): File "C:\Program Files\spambayes-1.0a6\spambayes\Dibbler.py", line 453, in found_terminator getattr(plugin, name)(**params) File "C:\Program Files\spambayes-1.0a6\spambayes\ProxyUI.py", line 324, in onReview self._doSave() File "C:\Program Files\spambayes-1.0a6\spambayes\UserInterface.py", line 470, in _doSave classifier.store() File "C:\Program Files\spambayes-1.0a6\spambayes\storage.py", line 229, in store self._write_state_key() File "C:\Program Files\spambayes-1.0a6\spambayes\storage.py", line 233, in _write_state_key self.db[self.statekey] = (classifier.PICKLE_VERSION, File "c:\python23\lib\shelve.py", line 130, in __setitem__ self.dict[key] = f.getvalue() TypeError: object does not support item assignment The second error occurred when poking the "Show Clues" link for a message after twiddling the advanced config to include the evidence header. Spambayes Web Interface: Home > Review > Message clues 500 Server error Traceback (most recent call last): File "C:\Program Files\spambayes-1.0a6\spambayes\Dibbler.py", line 453, in found_terminator getattr(plugin, name)(**params) File "C:\Program Files\spambayes-1.0a6\spambayes\ProxyUI.py", line 462, in onShowclues results = self._buildCluesTable(message, subject) File "C:\Program Files\spambayes-1.0a6\spambayes\UserInterface.py", line 269, in _buildCluesTable evidence=True) File "C:\Program Files\spambayes-1.0a6\spambayes\classifier.py", line 158, in chi2_spamprob clues = self._getclues(wordstream) File "C:\Program Files\spambayes-1.0a6\spambayes\classifier.py", line 391, in _getclues record = self._wordinfoget(word) File "C:\Program Files\spambayes-1.0a6\spambayes\storage.py", line 250, in _wordinfoget r = self.db.get(word) File "c:\python23\lib\shelve.py", line 110, in get if self.dict.has_key(key): AttributeError: 'int' object has no attribute 'has_key' Skip From richie at entrian.com Wed Sep 24 14:23:35 2003 From: richie at entrian.com (Richie Hindle) Date: Wed Sep 24 14:24:05 2003 Subject: [spambayes-dev] RE: [Spambayes] error on attempting to export data In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1303599F4C@its-xchg4.massey.ac.nz> References: <1ED4ECF91CDED24C8D012BCF2B034F1303599F4C@its-xchg4.massey.ac.nz> Message-ID: <k5o3nvc2r7ri27finjtqebc03u22618bau@4ax.com> [Tony] > So does anyone know why some people get this error? This isn't the > first time I've seen it reported. Basically, they run "python.exe > setup.py install" and get a "no such package as spambayes" error. The only cause I can think of is people managing to unzip the archive without restoring the directory structure - I think old versions of... Info-ZIP? PK-Zip? Can't remember... used to default to unpacking all the files into the current directory, and you needed to use a switch to restore the directory structure. This is going back a *very* long time though. -- Richie Hindle richie@entrian.com From richie at entrian.com Wed Sep 24 14:25:33 2003 From: richie at entrian.com (Richie Hindle) Date: Wed Sep 24 14:25:48 2003 Subject: [spambayes-dev] RE: [Python-Dev] 2.3.1 is (almost) a go In-Reply-To: <BIEJKCLHCIOIHAGOKOLHGEPPHEAA.tim@zope.com> References: <200309241550.h8OFojlx031706@localhost.localdomain> <BIEJKCLHCIOIHAGOKOLHGEPPHEAA.tim@zope.com> Message-ID: <h3l3nvsoiv66mpbnihu3hpgdl104d04rnj@4ax.com> [Anthony] > I haven't been following the spambayes lists too closely. Are there > concrete problems with bsddb that are cropping up, or just a general > wariness of it? [Tim] > Reports of database corruption are common in spambayes. All these at least: 807217: http://sourceforge.net/tracker/?func=detail&atid=498103&aid=807217&group_id=61702 803901: http://sourceforge.net/tracker/?func=detail&atid=498104&aid=803901&group_id=61702 809291: http://sourceforge.net/tracker/?func=detail&atid=498104&aid=809291&group_id=61702 788051: http://sourceforge.net/tracker/?func=detail&atid=498103&aid=788051&group_id=61702 [Tim] > I'm not sure any reports have come from people using the Outlook client 807217 relates to the Outlook client, but the database in question is the 'messageinfo' database, not the main classifier database. IMHO, this bug is serious enough to prevent us releasing a beta until we can sort it out. 788051 says "sounds like it is a bug with bsddb that is fixed in db-4.1.25" - what version of bsddb does Python 2.3.1 for Windows ship with? -- Richie Hindle richie@entrian.com From skip at pobox.com Wed Sep 24 14:46:51 2003 From: skip at pobox.com (Skip Montanaro) Date: Wed Sep 24 14:56:59 2003 Subject: [spambayes-dev] Re: [Spambayes] Thanks In-Reply-To: <F83082F17FBC184683C43DFCA8CF43F09E4E2D@xcgmd063.md.essd.northgrum.com> References: <F83082F17FBC184683C43DFCA8CF43F09E4E2D@xcgmd063.md.essd.northgrum.com> Message-ID: <16241.59035.884403.72304@montanaro.dyndns.org> Tom> Hey guys, thanks for what appears to be a pretty good product. We're glad you like it. Tom> I undersstand you're doing it for free (so far, anyway), and I'd Tom> like to help out, some, but I don't know how. The most obvious place I can think of that needs help is the website. I helped a guy here at Northwestern get started today with the POP3 proxy on Windows connected to Eudora. It took me awhile to get everything going and to find the appropriate info on twiddling eudora.ini file (in the FAQ). In the process I think I encountered +-----------------+ +-------------+ | Outlook Express | Internet or | | | (or similar) | <-------------------> | POP3 server | | | Intranet | | +-----------------+ +-------------+ on two or three different pages (at least one of which was in the distribution). I don't want to go through the wailing and gnashing of teeth that seems to characterize the current efforts at redoing www.python.org, however I think we would do well to consider how the website could be improved. As a start, it seems we should a link like "Install" in the left-hand margin. To someone wanting to install SpamBayes, it's not obvious which of Applications, Documentation, Frequently Asked Questions or one of the platform-specific pages is the best place to learn how to install it or troubleshoot an install. I like Wikis a lot as an easy way to work collaboratively. I'd be willing to set up a Wiki where people can scribble thoughts, structure and content. Skip From theller at python.net Wed Sep 24 15:03:38 2003 From: theller at python.net (Thomas Heller) Date: Wed Sep 24 15:11:31 2003 Subject: [spambayes-dev] Re: [Python-Dev] 2.3.1 is (almost) a go References: <200309241550.h8OFojlx031706@localhost.localdomain> <BIEJKCLHCIOIHAGOKOLHGEPPHEAA.tim@zope.com> <h3l3nvsoiv66mpbnihu3hpgdl104d04rnj@4ax.com> Message-ID: <65jiggad.fsf@python.net> Richie Hindle <richie@entrian.com> writes: > [Anthony] >> I haven't been following the spambayes lists too closely. Are there >> concrete problems with bsddb that are cropping up, or just a general >> wariness of it? > > [Tim] >> Reports of database corruption are common in spambayes. > > All these at least: > > 807217: > http://sourceforge.net/tracker/?func=detail&atid=498103&aid=807217&group_id=61702 > 803901: > http://sourceforge.net/tracker/?func=detail&atid=498104&aid=803901&group_id=61702 > 809291: > http://sourceforge.net/tracker/?func=detail&atid=498104&aid=809291&group_id=61702 > 788051: > http://sourceforge.net/tracker/?func=detail&atid=498103&aid=788051&group_id=61702 > > [Tim] >> I'm not sure any reports have come from people using the Outlook client > > 807217 relates to the Outlook client, but the database in question is the > 'messageinfo' database, not the main classifier database. > > IMHO, this bug is serious enough to prevent us releasing a beta until we > can sort it out. 788051 says "sounds like it is a bug with bsddb that is > fixed in db-4.1.25" - what version of bsddb does Python 2.3.1 for Windows > ship with? It's mentioned in the PCBuild/readme.txt file: Go to Sleepycat's download page: http://www.sleepycat.com/download/ and download version 4.1.25. The file name is db-4.1.25.NC.zip. XXX with or without strong cryptography? I picked "without". Unpack into dist\db-4.1.25 and that is what I used to build the installer. I ran the test-suite on several machines, with mixed results: win98 - always fails. win2k - most of the time it works, sometimes it fails. winXP - I have not seen any failures. Thomas From tim at zope.com Wed Sep 24 16:08:04 2003 From: tim at zope.com (Tim Peters) Date: Wed Sep 24 16:09:06 2003 Subject: [spambayes-dev] Re: [Python-Dev] 2.3.1 is (almost) a go In-Reply-To: <65jiggad.fsf@python.net> Message-ID: <LNBBLJKPBEHFEDALKOLCCEKNFDAB.tim@zope.com> [Thomas Heller, answering that Sleepycat 4.1.25 ships with Windows Pythons 2.3 and 2.3.1] > > I ran the test-suite on several machines, with mixed results: I assume you're only talking about the test_bsddb3 part of the Python test suite. That alone runs more than 200 tests, so "it fails" is exceptionally uninformative. > win98 - always faizs. For me too on Win98SE, but its 4 specific failures don't appear relevant (here with 2.3.1 on Win98SE): ====================================================================== ERROR: test01_basics (bsddb.test.test_dbshelve.EnvBTreeShelveTestCase) ---------------------------------------------------------------------- Traceback (most recent call last): File "C:\PYTHON23\lib\bsddb\test\test_dbshelve.py", line 75, in test01_basics self.do_open() File "C:\PYTHON23\lib\bsddb\test\test_dbshelve.py", line 238, in do_open self.env.open(homeDir, self.envflags | db.DB_INIT_MPOOL | db.DB_CREATE) DBAgainError: (11, 'Resource temporarily unavailable -- unable to join the environment') ====================================================================== ERROR: test01_basics (bsddb.test.test_dbshelve.EnvHashShelveTestCase) ---------------------------------------------------------------------- Traceback (most recent call last): File "C:\PYTHON23\lib\bsddb\test\test_dbshelve.py", line 75, in test01_basics self.do_open() File "C:\PYTHON23\lib\bsddb\test\test_dbshelve.py", line 238, in do_open self.env.open(homeDir, self.envflags | db.DB_INIT_MPOOL | db.DB_CREATE) DBAgainError: (11, 'Resource temporarily unavailable -- unable to join the environment') ====================================================================== ERROR: test01_basics (bsddb.test.test_dbshelve.EnvThreadBTreeShelveTestCase) ---------------------------------------------------------------------- Traceback (most recent call last): File "C:\PYTHON23\lib\bsddb\test\test_dbshelve.py", line 75, in test01_basics self.do_open() File "C:\PYTHON23\lib\bsddb\test\test_dbshelve.py", line 238, in do_open self.env.open(homeDir, self.envflags | db.DB_INIT_MPOOL | db.DB_CREATE) DBAgainError: (11, 'Resource temporarily unavailable -- unable to join the environment') ====================================================================== ERROR: test01_basics (bsddb.test.test_dbshelve.EnvThreadHashShelveTestCase) ---------------------------------------------------------------------- Traceback (most recent call last): File "C:\PYTHON23\lib\bsddb\test\test_dbshelve.py", line 75, in test01_basics self.do_open() File "C:\PYTHON23\lib\bsddb\test\test_dbshelve.py", line 238, in do_open self.env.open(homeDir, self.envflags | db.DB_INIT_MPOOL | db.DB_CREATE) DBAgainError: (11, 'Resource temporarily unavailable -- unable to join the environment') ---------------------------------------------------------------------- Ran 216 tests in 229.150s FAILED (errors=4) > win2k - most of the time it works, sometimes it fails. > winXP - I have not seen any failures. If you run only the bsddb3 thread tests in a loop, I expect you'll see them fail eventually too (that's been reported on Win2K and Linux before). What nobody has reported yet when running test_bsddb3 are the kinds of "corruption" exceptions spambayes users report. From mhammond at skippinet.com.au Wed Sep 24 20:37:56 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Wed Sep 24 20:38:25 2003 Subject: [spambayes-dev] RE: [Spambayes] error on attempting to export data In-Reply-To: <k5o3nvc2r7ri27finjtqebc03u22618bau@4ax.com> Message-ID: <14c101c382fd$55ec6b80$f502a8c0@eden> > [Tony] > > So does anyone know why some people get this error? This isn't the > > first time I've seen it reported. Basically, they run "python.exe > > setup.py install" and get a "no such package as spambayes" error. > > The only cause I can think of is people managing to unzip the archive > without restoring the directory structure - I think old versions of... > Info-ZIP? PK-Zip? Can't remember... used to default to > unpacking all the > files into the current directory, and you needed to use a switch to > restore the directory structure. This is going back a *very* > long time > though. WinZip has a check-box allowing you to keep the structure or not - and it defaults to how it was set last time. It also has an "auto overwrite" checkbox that works the same. The end result is that I have "done the wrong thing" accidently a number of times, and suspect that users could too - even without a warning being generated that they are constantly overwriting (eg) __init__.py. So yeah, I suspect that a "flat structure" is the reason. I guess setup.py could check this - eg, either "__init__.py" existing next to setup.py, or the lack of a spambayes directory would be good indications this has happened. These-users-need-the-binary <wink> ly, Mark. From T.A.Meyer at massey.ac.nz Wed Sep 24 20:39:21 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Wed Sep 24 20:42:07 2003 Subject: [spambayes-dev] pop3proxy error Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130359A16A@its-xchg4.massey.ac.nz> > Things seem to be going fairly well, however we encountered > two tracebacks. I'm not all that familiar with using it in > that environment, so am at a loss to understand what they > mean. The first error occurred when poking the Train button > the first time: [...] > TypeError: object does not support item assignment Sadly this is a bug with 1.0a6 (fixed in cvs). For some time when we recreate the 'state' object and reopened the database after saving the configuration, the UI kept using the old shelve object & database. One of the 'fixes' that crept into 1.0a6 was that we now (correctly) close the database before we reopen it. The UI, though, keeps trying to use that one, which is closed, and causes all sorts of troubles. There are two solutions AFAIK (apart from updating the code): use a pickle (I haven't checked, but I'm pretty sure that they are unaffected), or simply restart spambayes after making any changes to the config via the web interface. > The second error occurred when poking the "Show Clues" link > for a message after twiddling the advanced config to include > the evidence header. [...] > File "c:\python23\lib\shelve.py", line 110, in get > if self.dict.has_key(key): > > AttributeError: 'int' object has no attribute 'has_key' I suspect that this is the same problem; it certainly looks like it. =Tony Meyer From T.A.Meyer at massey.ac.nz Wed Sep 24 20:46:33 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Wed Sep 24 20:48:20 2003 Subject: [spambayes-dev] RE: [Spambayes] error on attempting to export data Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130359A171@its-xchg4.massey.ac.nz> [Mark] > So yeah, I suspect that a "flat structure" is the reason. I'm not sure if anyone else is following the spambayes thread, so this is what the user has just said: > I got it to work! I had to use the tarball, and unpack it to > the root of C. Then it worked as advertised. It doesn't like > it when you don't accept the default directories. This makes no sense to me ;). I'm willing to believe that Mark & Richie are right and that it was unpacked wrongly before. > These-users-need-the-binary <wink> ly, They sure do. =Tony Meyer From skip at pobox.com Thu Sep 25 08:53:54 2003 From: skip at pobox.com (Skip Montanaro) Date: Thu Sep 25 08:54:24 2003 Subject: [spambayes-dev] pop3proxy error In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F130359A16A@its-xchg4.massey.ac.nz> References: <1ED4ECF91CDED24C8D012BCF2B034F130359A16A@its-xchg4.massey.ac.nz> Message-ID: <16242.58722.252545.598395@montanaro.dyndns.org> Tony> Sadly this is a bug with 1.0a6 (fixed in cvs). Which files? I can cvs up and bundle it up for my colleague's use, but it might be easier to just apply the patch to the relevant file(s) on his 'puter. >> AttributeError: 'int' object has no attribute 'has_key' Tony> I suspect that this is the same problem; it certainly looks like Tony> it. I'll take your word for it and shout if it doesn't go away. Thx, Skip From anssi.porttikivi at teleware.fi Thu Sep 25 09:05:30 2003 From: anssi.porttikivi at teleware.fi (Anssi Porttikivi) Date: Thu Sep 25 08:58:14 2003 Subject: [spambayes-dev] An unrelated idea: categorization / cluster analysis of text files for FAQ generating Message-ID: <B36C365832C90E47A37F4FFCDDEFC46D37FB@hkisrv08.tw.fi> Sorry to bother you, but I would like to know, if anyone here has any knowledge of technologies like the following idea: Could you automatically categorize a set of messages into an optimum number of cluster subsets, where messages inside a subset would be similar to each other, in bayesian filtering terms. If this could be done without a priori manually selecting the categories that the clusters subset are, this could be used for an automated "frequently asked questions" list manitenance. Automatic categorization of incoming mail without manually choosing any criteria beforehand would also be interesting. From kennypitt at hotmail.com Thu Sep 25 09:43:58 2003 From: kennypitt at hotmail.com (Kenny Pitt) Date: Thu Sep 25 09:44:24 2003 Subject: [spambayes-dev] An unrelated idea: categorization / clusteranalysis of text files for FAQ generating In-Reply-To: <B36C365832C90E47A37F4FFCDDEFC46D37FB@hkisrv08.tw.fi> Message-ID: <Law11-OE54nG6CSZ2ii0000595f@hotmail.com> Anssi Porttikivi wrote: > Could you automatically categorize a set of messages into an optimum > number of cluster subsets, where messages inside a subset would be > similar to each other, in bayesian filtering terms. If this could be > done without a priori manually selecting the categories that the > clusters subset are, this could be used for an automated "frequently > asked questions" list manitenance. Automatic categorization of > incoming mail without manually choosing any criteria beforehand would > also be interesting. Sounds like you might want to look at POPFile, another open-source mail filter that implements at least some of the concepts you describe. It allows you to define as many "buckets" as you want for sorting your mail, and then you train it just like any other Bayesian-based filter as messages arrive. Check it out at http://popfile.sourceforge.net. They have SourceForge discussion forums that would probably be a good place to discuss your questions. -- Kenny Pitt From papaDoc at videotron.ca Thu Sep 25 10:06:03 2003 From: papaDoc at videotron.ca (papaDoc) Date: Thu Sep 25 10:11:26 2003 Subject: [spambayes-dev] patch for splitndir.py Message-ID: <3F72F64B.2050706@videotron.ca> Hi, This is a patch for splitndir.py When splitting the mbox we check if the header mailid_header_name is present then we give this name to the message file if not we set the name to be the counter number. This was/(will be) useful for me to retrieve a message that generate a exception in pop3proxy but the message was discarded in the cache directory. I will re split my inbox and copy all the file in the cache directory to test which one caused the error. Remi P.S. The more I program in Python the more I like it ;-) diff -C 3 splitndirs.py splitndirs.py.orig *** splitndirs.py Thu Sep 25 09:59:37 2003 --- splitndirs.py.orig Thu Sep 25 09:58:58 2003 *************** *** 53,61 **** import glob from spambayes import mboxutils - from spambayes.Options import options - - from email.Header import Header try: True, False --- 53,58 ---- *************** *** 119,134 **** astext = str(msg) #assert astext.endswith('\n') counter += 1 ! try: ! mail_id = options["Headers", "mailid_header_name"] ! id_str = msg.get(mail_id) ! if id_str is None: ! msgfile = open('%s/%d' % (outdirs[i], counter), 'wb') ! else: ! msgfile = open('%s/%s' % (outdirs[i], msg.get(mail_id ), 'wb') ! except: ! print "Counter = %d" % (counter) ! msgfile = open('%s/%d' % (outdirs[i], counter), 'wb') msgfile.write(astext) msgfile.close() if verbose: --- 116,122 ---- astext = str(msg) #assert astext.endswith('\n') counter += 1 ! msgfile = open('%s/%d' % (outdirs[i], counter), 'wb') msgfile.write(astext) msgfile.close() if verbose: From tim.one at comcast.net Thu Sep 25 10:15:41 2003 From: tim.one at comcast.net (Tim Peters) Date: Thu Sep 25 10:15:46 2003 Subject: [spambayes-dev] An unrelated idea: categorization / clusteranalysis of text files for FAQ generating In-Reply-To: <B36C365832C90E47A37F4FFCDDEFC46D37FB@hkisrv08.tw.fi> Message-ID: <LNBBLJKPBEHFEDALKOLCCEBFGCAB.tim.one@comcast.net> [Anssi Porttikivi] > Sorry to bother you, but I would like to know, if anyone here has any > knowledge of technologies like the following idea: > > Could you automatically categorize a set of messages into an optimum > number of cluster subsets, where messages inside a subset would be > similar to each other, in bayesian filtering terms. If this could be > done without a priori manually selecting the categories that the > clusters subset are, this could be used for an automated "frequently > asked questions" list manitenance. Automatic categorization of > incoming mail without manually choosing any criteria beforehand would > also be interesting. There is (of course) a large literature on cluster analysis. Here's a very readable intro: http://www.statsoftinc.com/textbook/stcluan.html Code up one of the 50 known methods, and see which works best <wink>. From skip at pobox.com Thu Sep 25 10:23:17 2003 From: skip at pobox.com (Skip Montanaro) Date: Thu Sep 25 10:23:34 2003 Subject: [spambayes-dev] An unrelated idea: categorization / clusteranalysis of text files for FAQ generating In-Reply-To: <Law11-OE54nG6CSZ2ii0000595f@hotmail.com> References: <B36C365832C90E47A37F4FFCDDEFC46D37FB@hkisrv08.tw.fi> <Law11-OE54nG6CSZ2ii0000595f@hotmail.com> Message-ID: <16242.64085.482045.973288@montanaro.dyndns.org> >> Could you automatically categorize a set of messages into an optimum >> number of cluster subsets, where messages inside a subset would be >> similar to each other, in bayesian filtering terms. If this could be >> done without a priori manually selecting the categories that the >> clusters subset are, this could be used for an automated "frequently >> asked questions" list manitenance. Automatic categorization of >> incoming mail without manually choosing any criteria beforehand would >> also be interesting. Kenny> Sounds like you might want to look at POPFile, another Kenny> open-source mail filter that implements at least some of the Kenny> concepts you describe. If you're interested in playing around with Python or n-way classification, you might take a look at contrib/nway.py in the SpamBayes distribution. There's a hopefully decent docstring at the top of the file which shows how to use it. Skip From skip at pobox.com Thu Sep 25 11:08:58 2003 From: skip at pobox.com (Skip Montanaro) Date: Thu Sep 25 11:09:17 2003 Subject: [spambayes-dev] Re: [Spambayes] spam error... In-Reply-To: <1064500522.3994.42.camel@nils> References: <1064493374.3994.35.camel@nils> <16242.61015.518550.425615@montanaro.dyndns.org> <1064500522.3994.42.camel@nils> Message-ID: <16243.1290.36635.982387@montanaro.dyndns.org> Maybe it's time for 1.0a7. Many people seem to be running into this particular problem. >> * If you execute >> >> import whichdb, os >> from spambayes import Options >> dbfile = os.path.expanduser(Options.options["Storage", >> "persistent_storage_file"]) >> print whichdb.whichdb(dbfile) nils> I'm new to python and had to fight with newlines first, but it nils> says 'None' Hmmm... That means it can't tell what the type of the database file is. That's odd. nils> did I forgot to install something? I don't think so. >> * The last time you power cycled your computer did you shut down >> sb_server cleanly (that is, select "Save and shutdown" from your web >> browser)? nils> surely not, I can't! why? here: nils> 500 Server error nils> Traceback (most recent call last): ... nils> File "/usr/lib/python2.2/shelve.py", line 77, in __setitem__ nils> self.dict[key] = f.getvalue() nils> TypeError: object does not support item assignment This is a bug I think Tony Meyer fixed in the past few days - since 1.0a6 was released. If you can check out from CVS, I'd recommend it. I took a look but couldn't figure out what changes fixed what bugs. Tony's in New Zealand. I don't expect him to be back at the computer for eight hours or so unless he's a night owl. nils> beyond this, I still have my config problems, that sb_server.py nils> doesn't start the pop3 proxies automatically and that I have to nils> reconfigure them first... I'm not sure what this problem would be, though it's quite possible that since you have trouble saving your configuration, it never learns your settings. Skip From skip at pobox.com Thu Sep 25 11:30:02 2003 From: skip at pobox.com (Skip Montanaro) Date: Thu Sep 25 11:30:12 2003 Subject: [spambayes-dev] Need more training messages Message-ID: <16243.2554.280733.299370@montanaro.dyndns.org> A week ago I said I would take steps to implement an initial training database which could be distributed with SpamBayes (the training pickle will be distributed, not raw email messages). So far, I have 11 emails. I'd like to get around 20 or 30 of each. If you have a free minute or two can you rummage through your email and find me two hams and two spams? Forward them to me as attachments with subjects of either "Sample Ham" or "Sample Spam", as appropriate. While no emails will be distributed, you should still only send me ham messages which are not sensitive. Thx, Skip From spamdev at royalwebhosting.com Thu Sep 25 12:31:54 2003 From: spamdev at royalwebhosting.com (Sergio Baca) Date: Thu Sep 25 12:35:10 2003 Subject: [spambayes-dev] Exception Message-ID: <179126908422.20030925193154@royalwebhosting.com> Hello, Spambayes is great program, with some exception :) Sorry it's a real exception: X-Spambayes-Exception: exceptions.UnicodeDecodeError('ascii' codec can't decode byte 0xcc in position 1: ordinal not in range(128)) in append() at C:\Python23\lib\email\Header.py line 272: ustr = unicode(s, incodec, errors) Lately many e-mails generate such exception, all of them are spam. Can it be somehow excluded, for example before transmitting that string to filter entire e-mail for such undecodable bytes? Thank you for such a great program :) ----------------------------------------- Best regards, Sergio, CEO mailto:spamdev@royalwebhosting.com http://www.royalwebhosting.com Royal Web Hosting - Royal Quality Hosting From skip at pobox.com Thu Sep 25 12:44:52 2003 From: skip at pobox.com (Skip Montanaro) Date: Thu Sep 25 12:45:04 2003 Subject: [spambayes-dev] Exception In-Reply-To: <179126908422.20030925193154@royalwebhosting.com> References: <179126908422.20030925193154@royalwebhosting.com> Message-ID: <16243.7044.144275.546974@montanaro.dyndns.org> Sergio> Spambayes is great program, with some exception :) Sorry it's a Sergio> real exception: Sergio> X-Spambayes-Exception: exceptions.UnicodeDecodeError('ascii' codec can't Sergio> decode byte 0xcc in position 1: ordinal not in range(128)) in Sergio> append() at C:\Python23\lib\email\Header.py line 272: ustr = Sergio> unicode(s, incodec, errors) Sergio> Lately many e-mails generate such exception, all of them are Sergio> spam. Can it be somehow excluded, for example before Sergio> transmitting that string to filter entire e-mail for such Sergio> undecodable bytes? This is, I think, a known and solved problem. What version of SpamBayes are you using? The latest version of the Outlook plugin is 008.1. The latest version of the source distribution is 1.0a6. Skip From T.A.Meyer at massey.ac.nz Fri Sep 26 00:20:24 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Fri Sep 26 00:20:41 2003 Subject: [spambayes-dev] Re: [Spambayes] spam error... Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F13036D9E33@its-xchg4.massey.ac.nz> > Maybe it's time for 1.0a7. Many people seem to be running > into this particular problem. I've wondered that too. What does everyone else think? [which db returns None] > Hmmm... That means it can't tell what the type of the > database file is. That's odd. I suspect it means that the file doesn't exist yet. =Tony Meyer From ta-meyer at ihug.co.nz Fri Sep 26 00:22:24 2003 From: ta-meyer at ihug.co.nz (Tony Meyer) Date: Fri Sep 26 00:22:29 2003 Subject: [spambayes-dev] spambayes.org Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130212AF63@its-xchg4.massey.ac.nz> Did spambayes.org die after all? I haven't been able to connect to it for a few days now (although spambayes.sf.net works fine). If anyone looks into this, they could also try and figure out why spambayes.org/downloads/Version.cfg redirects to spambayes.org//downloads/Version.cfg or whatever it is that causes the 008 Outlook "check for new version" to fail... =Tony Meyer From vanhorn at whidbey.com Fri Sep 26 00:31:50 2003 From: vanhorn at whidbey.com (G. Armour Van Horn) Date: Fri Sep 26 00:32:20 2003 Subject: [spambayes-dev] Re: [Spambayes] spam error... References: <1ED4ECF91CDED24C8D012BCF2B034F13036D9E33@its-xchg4.massey.ac.nz> Message-ID: <3F73C136.646DB0A3@whidbey.com> I'm for the new release, there have just been too many bugs fixed in 1.0a6 for me to be interested in facing both the name changes and the first round of problem, but I'd jump on 1.0a7 at this point. Van "Meyer, Tony" wrote: > > Maybe it's time for 1.0a7. Many people seem to be running > > into this particular problem. > > I've wondered that too. What does everyone else think? > > -- ---------------------------------------------------------- Sign up now for Quotes of the Day, a handful of quotations on a theme delivered every morning. Enlightenment! Daily, for free! mailto:twisted@whidbey.com?subject=Subscribe_QOTD For web hosting and maintenance, visit Van's home page: http://www.domainvanhorn.com/van/ ---------------------------------------------------------- -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes-dev/attachments/20030925/e5b58e3d/attachment.html From T.A.Meyer at massey.ac.nz Fri Sep 26 00:37:38 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Fri Sep 26 00:37:49 2003 Subject: [spambayes-dev] Re: [Spambayes] spam error... Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F13036D9E48@its-xchg4.massey.ac.nz> > I'm for the new release, there have just been too many > bugs fixed in 1.0a6 for me to be interested in facing both > the name changes and the first round of problem, but I'd > jump on 1.0a7 at this point. Too many? By my count (excluding documentation changes) there has been one. I suppose that could be too many, but it's a pretty generous (or miserly, depending on your point of view) limit. Or have I missed bugfixes somewhere? (Tell me if I have, since I'll otherwise neglect to include them in the changelog). =Tony Meyer From T.A.Meyer at massey.ac.nz Fri Sep 26 00:46:17 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Fri Sep 26 00:46:31 2003 Subject: [spambayes-dev] pop3proxy error Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F13036D9E4E@its-xchg4.massey.ac.nz> > Which files? I can cvs up and bundle it up for my > colleague's use, but it might be easier to just apply the > patch to the relevant file(s) on his 'puter. sb_server.py, ImapUI.py, ProxyUI.py, ServerUI.py, UserInterface.py, and storage.py. However, you can skip ImapUI unless he is using imapfilter, and skip ServerUI unless he is using pop3dnd and skip storage.py, because the change just makes the problem clearer when it does happen (and it no longer should!). So, for all intents and purposes, the files are: sb_server.py, ProxyUI.py, and UserInterface.py. > I'll take your word for it and shout if it doesn't go away. Thanks :) =Tony Meyer From T.A.Meyer at massey.ac.nz Fri Sep 26 00:59:07 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Fri Sep 26 00:59:11 2003 Subject: [spambayes-dev] Re: [Spambayes] spam error... Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F13036D9E5A@its-xchg4.massey.ac.nz> [Skip] > Maybe it's time for 1.0a7. Many people seem to be running > into this particular problem. [Tony] > I've wondered that too. What does everyone else think? [Tony, after thinking some more] OTOH, if we (ok, so I primarily mean Mark) were able to get a first release out of the binary installer that has both the Outlook plug-in and sb_server/pop3proxy_tray/pop3proxy_service, that would at least solve this problem for Windows users. =Tony Meyer From vanhorn at whidbey.com Fri Sep 26 01:40:36 2003 From: vanhorn at whidbey.com (G. Armour Van Horn) Date: Fri Sep 26 01:41:32 2003 Subject: [spambayes-dev] Re: [Spambayes] spam error... References: <1ED4ECF91CDED24C8D012BCF2B034F13036D9E48@its-xchg4.massey.ac.nz> Message-ID: <3F73D154.F0D17253@whidbey.com> I wasn't keeping track. Maybe it't only been one bug with slightly different descriptions? I can be easily confused sometimes. Van "Meyer, Tony" wrote: > > I'm for the new release, there have just been too many > > bugs fixed in 1.0a6 for me to be interested in facing both > > the name changes and the first round of problem, but I'd > > jump on 1.0a7 at this point. > > Too many? By my count (excluding documentation changes) there has been > one. I suppose that could be too many, but it's a pretty generous (or > miserly, depending on your point of view) limit. > > Or have I missed bugfixes somewhere? (Tell me if I have, since I'll > otherwise neglect to include them in the changelog). > > =Tony Meyer -- ---------------------------------------------------------- Sign up now for Quotes of the Day, a handful of quotations on a theme delivered every morning. Enlightenment! Daily, for free! mailto:twisted@whidbey.com?subject=Subscribe_QOTD For web hosting and maintenance, visit Van's home page: http://www.domainvanhorn.com/van/ ---------------------------------------------------------- From T.A.Meyer at massey.ac.nz Fri Sep 26 01:59:24 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Fri Sep 26 01:59:34 2003 Subject: [spambayes-dev] Re: [Spambayes] spam error... Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F13036D9E73@its-xchg4.massey.ac.nz> > I wasn't keeping track. Maybe it't only been one bug with > slightly different descriptions? I have seen two descriptions (for example the two Skip gave) that fit. > I can be easily confused sometimes. I know that feeling <wink>. Even though it's only (so far!) one bug (I think), it is one that many people are likely to strike, especially when first setting up. =Tony Meyer From nino at medias.cnes.fr Fri Sep 26 05:33:39 2003 From: nino at medias.cnes.fr (Fernando NIÑO) Date: Fri Sep 26 05:41:26 2003 Subject: [spambayes-dev] RedHat control script Message-ID: <200309260931.h8Q9VwA13566@cnes.fr> Hello, I added a small functionnality to sb_server.py so as to catch a termination signal and close nicely (just a couple of lines). I also wrote an init.d script that uses chkconfig (redhat only) so as to start and stop the proxy on reboot/shutdown. This may be useful to others. Cheers and keep up this great job ! -------------- next part -------------- #!/bin/bash # # spambayes: Starts the spam filter as a pop3 proxy # # Version: @(#) /etc/init.d/spambayes 1.0 # # chkconfig: - 95 21 # description: This shell script takes care of starting and stopping \ # spambayes pop3 proxy # processname: sb_server.py # # Source function library. . /etc/init.d/functions SBPROXY=/opt/bin/sb_server.py SBLOG=/var/log/spam.log SBDIR=/opt/sb_data [ -x $SBPROXY ] || exit 0 RETVAL=0 start () { date >> $SBLOG echo -n "Starting SpamBayes POP3 proxy: " if [ ! -d $SBDIR ] ; then echo "Repertoire $SBDIR non present" >> $SBLOG RETVAL=1 else cd $SBDIR ($SBPROXY 2>&1 >> $SBLOG) & RETVAL=$? fi action "" [ $RETVAL = 0 ] return $RETVAL } stop () { # stop daemon date >> $SBLOG echo -n "Stopping SpamBayes POP3 proxy: " killproc $SBPROXY 1 RETVAL=$? echo [ $RETVAL = 0 ] return $RETVAL } restart () { stop start RETVAL=$? return $RETVAL } # See how we were called. case "$1" in start) start ;; stop) stop ;; status) status $SBPROXY RETVAL=$? ;; restart) restart ;; *) echo "Usage: $0 {start|stop|restart|status}" RETVAL=1 esac exit $RETVAL -------------- next part -------------- #!/usr/bin/env python """The primary server for SpamBayes. Currently serves the web interface, and any configured POP3 and SMTP proxies. The POP3 proxy works with classifier.py, and adds a simple X-Spambayes-Classification header (ham/spam/unsure) to each incoming email. You point the proxy at your POP3 server, and configure your email client to collect mail from the proxy then filter on the added header. Usage: sb_server.py [options] [<server> [<server port>]] <server> is the name of your real POP3 server <port> is the port number of your real POP3 server, which defaults to 110. options: -h : Displays this help message. -d FILE : use the named DBM database file -D FILE : the the named Pickle database file -l port : proxy listens on this port number (default 110) -u port : User interface listens on this port number (default 8880; Browse http://localhost:8880/) -b : Launch a web browser showing the user interface. All command line arguments and switches take their default values from the [pop3proxy] and [html_ui] sections of bayescustomize.ini. For safety, and to help debugging, the whole POP3 conversation is written out to _pop3proxy.log for each run, if options["globals", "verbose"] is True. To make rebuilding the database easier, uploaded messages are appended to _pop3proxyham.mbox and _pop3proxyspam.mbox. """ # This module is part of the spambayes project, which is Copyright 2002 # The Python Software Foundation and is covered by the Python Software # Foundation license. __author__ = "Richie Hindle <richie@entrian.com>" __credits__ = "Tim Peters, Neale Pickett, Tim Stone, all the Spambayes folk." try: True, False except NameError: # Maintain compatibility with Python 2.2 True, False = 1, 0 todo = """ Web training interface: User interface improvements: o Once the pieces are on separate pages, make the paste box bigger. o Deployment: Windows executable? atlaxwin and ctypes? Or just webbrowser? o Save the stats (num classified, etc.) between sessions. o "Reload database" button. New features: o Online manual. o Links to project homepage, mailing list, etc. o List of words with stats (it would have to be paged!) a la SpamSieve. Code quality: o Cope with the email client timing out and closing the connection. Info: o Slightly-wordy index page; intro paragraph for each page. o In both stats and training results, report nham and nspam - warn if they're very different (for some value of 'very'). o "Links" section (on homepage?) to project homepage, mailing list, etc. Gimmicks: o Classify a web page given a URL. o Graphs. Of something. Who cares what? o NNTP proxy. o Zoe...! """ import os, sys, re, errno, getopt, time, traceback, socket, cStringIO from thread import start_new_thread from email.Header import Header import spambayes.message from spambayes import Dibbler from spambayes import storage from spambayes.FileCorpus import FileCorpus, ExpiryFileCorpus from spambayes.FileCorpus import FileMessageFactory, GzipFileMessageFactory from spambayes.Options import options from spambayes.UserInterface import UserInterfaceServer from spambayes.ProxyUI import ProxyUserInterface from spambayes.Version import get_version_string # Increase the stack size on MacOS X. Stolen from Lib/test/regrtest.py if sys.platform == 'darwin': try: import resource except ImportError: pass else: soft, hard = resource.getrlimit(resource.RLIMIT_STACK) newsoft = min(hard, max(soft, 1024*2048)) resource.setrlimit(resource.RLIMIT_STACK, (newsoft, hard)) # number to add to STAT length for each msg to fudge for spambayes headers HEADER_SIZE_FUDGE_FACTOR = 512 class ServerLineReader(Dibbler.BrighterAsyncChat): """An async socket that reads lines from a remote server and simply calls a callback with the data. The BayesProxy object can't connect to the real POP3 server and talk to it synchronously, because that would block the process.""" lineCallback = None def __init__(self, serverName, serverPort, lineCallback): Dibbler.BrighterAsyncChat.__init__(self) self.lineCallback = lineCallback self.request = '' self.set_terminator('\r\n') self.create_socket(socket.AF_INET, socket.SOCK_STREAM) try: self.connect((serverName, serverPort)) except socket.error, e: error = "Can't connect to %s:%d: %s" % (serverName, serverPort, e) print >>sys.stderr, error self.lineCallback('-ERR %s\r\n' % error) self.lineCallback('') # "The socket's been closed." self.close() def collect_incoming_data(self, data): self.request = self.request + data def found_terminator(self): self.lineCallback(self.request + '\r\n') self.request = '' def handle_close(self): self.lineCallback('') self.close() class POP3ProxyBase(Dibbler.BrighterAsyncChat): """An async dispatcher that understands POP3 and proxies to a POP3 server, calling `self.onTransaction(request, response)` for each transaction. Responses are not un-byte-stuffed before reaching self.onTransaction() (they probably should be for a totally generic POP3ProxyBase class, but BayesProxy doesn't need it and it would mean re-stuffing them afterwards). self.onTransaction() should return the response to pass back to the email client - the response can be the verbatim response or a processed version of it. The special command 'KILL' kills it (passing a 'QUIT' command to the server). """ def __init__(self, clientSocket, serverName, serverPort): Dibbler.BrighterAsyncChat.__init__(self, clientSocket) self.request = '' self.response = '' self.set_terminator('\r\n') self.command = '' # The POP3 command being processed... self.args = [] # ...and its arguments self.isClosing = False # Has the server closed the socket? self.seenAllHeaders = False # For the current RETR or TOP self.startTime = 0 # (ditto) self.serverSocket = ServerLineReader(serverName, serverPort, self.onServerLine) def onTransaction(self, command, args, response): """Overide this. Takes the raw request and the response, and returns the (possibly processed) response to pass back to the email client. """ raise NotImplementedError def onServerLine(self, line): """A line of response has been received from the POP3 server.""" isFirstLine = not self.response self.response = self.response + line # Is this the line that terminates a set of headers? self.seenAllHeaders = self.seenAllHeaders or line in ['\r\n', '\n'] # Has the server closed its end of the socket? if not line: self.isClosing = True # If we're not processing a command, just echo the response. if not self.command: self.push(self.response) self.response = '' # Time out after 30 seconds for message-retrieval commands if # all the headers are down. The rest of the message will proxy # straight through. if self.command in ['TOP', 'RETR'] and \ self.seenAllHeaders and time.time() > self.startTime + 30: self.onResponse() self.response = '' # If that's a complete response, handle it. elif not self.isMultiline() or line == '.\r\n' or \ (isFirstLine and line.startswith('-ERR')): self.onResponse() self.response = '' def isMultiline(self): """Returns True if the request should get a multiline response (assuming the response is positive). """ if self.command in ['USER', 'PASS', 'APOP', 'QUIT', 'STAT', 'DELE', 'NOOP', 'RSET', 'KILL']: return False elif self.command in ['RETR', 'TOP', 'CAPA']: return True elif self.command in ['LIST', 'UIDL']: return len(self.args) == 0 else: # Assume that an unknown command will get a single-line # response. This should work for errors and for POP-AUTH, # and is harmless even for multiline responses - the first # line will be passed to onTransaction and ignored, then the # rest will be proxied straight through. return False def collect_incoming_data(self, data): """Asynchat override.""" self.request = self.request + data def found_terminator(self): """Asynchat override.""" verb = self.request.strip().upper() if verb == 'KILL': self.socket.shutdown(2) self.close() raise SystemExit elif verb == 'CRASH': # For testing x = 0 y = 1/x self.serverSocket.push(self.request + '\r\n') if self.request.strip() == '': # Someone just hit the Enter key. self.command = '' self.args = [] else: # A proper command. splitCommand = self.request.strip().split() self.command = splitCommand[0].upper() self.args = splitCommand[1:] self.startTime = time.time() self.request = '' def onResponse(self): # We don't support pipelining, so if the command is CAPA and the # response includes PIPELINING, hack out that line of the response. if self.command == 'CAPA': pipelineRE = r'(?im)^PIPELINING[^\n]*\n' self.response = re.sub(pipelineRE, '', self.response) # Pass the request and the raw response to the subclass and # send back the cooked response. if self.response: cooked = self.onTransaction(self.command, self.args, self.response) self.push(cooked) # If onServerLine() decided that the server has closed its # socket, close this one when the response has been sent. if self.isClosing: self.close_when_done() # Reset. self.command = '' self.args = [] self.isClosing = False self.seenAllHeaders = False class BayesProxyListener(Dibbler.Listener): """Listens for incoming email client connections and spins off BayesProxy objects to serve them. """ def __init__(self, serverName, serverPort, proxyPort): proxyArgs = (serverName, serverPort) Dibbler.Listener.__init__(self, proxyPort, BayesProxy, proxyArgs) print 'Listener on port %s is proxying %s:%d' % \ (_addressPortStr(proxyPort), serverName, serverPort) class BayesProxy(POP3ProxyBase): """Proxies between an email client and a POP3 server, inserting judgement headers. It acts on the following POP3 commands: o STAT: o Adds the size of all the judgement headers to the maildrop size. o LIST: o With no message number: adds the size of an judgement header to the message size for each message in the scan listing. o With a message number: adds the size of an judgement header to the message size. o RETR: o Adds the judgement header based on the raw headers and body of the message. o TOP: o Adds the judgement header based on the raw headers and as much of the body as the TOP command retrieves. This can mean that the header might have a different value for different calls to TOP, or for calls to TOP vs. calls to RETR. I'm assuming that the email client will either not make multiple calls, or will cope with the headers being different. o USER: o Does no processing based on the USER command itself, but expires any old messages in the three caches. """ def __init__(self, clientSocket, serverName, serverPort): POP3ProxyBase.__init__(self, clientSocket, serverName, serverPort) self.handlers = {'STAT': self.onStat, 'LIST': self.onList, 'RETR': self.onRetr, 'TOP': self.onTop, 'USER': self.onUser} state.totalSessions += 1 state.activeSessions += 1 self.isClosed = False def send(self, data): """Logs the data to the log file.""" if options["globals", "verbose"]: state.logFile.write(data) state.logFile.flush() try: return POP3ProxyBase.send(self, data) except socket.error: # The email client has closed the connection - 40tude Dialog # does this immediately after issuing a QUIT command, # without waiting for the response. self.close() def recv(self, size): """Logs the data to the log file.""" data = POP3ProxyBase.recv(self, size) if options["globals", "verbose"]: state.logFile.write(data) state.logFile.flush() return data def close(self): # This can be called multiple times by async. if not self.isClosed: self.isClosed = True state.activeSessions -= 1 POP3ProxyBase.close(self) def onTransaction(self, command, args, response): """Takes the raw request and response, and returns the (possibly processed) response to pass back to the email client. """ handler = self.handlers.get(command, self.onUnknown) return handler(command, args, response) def onStat(self, command, args, response): """Adds the size of all the judgement headers to the maildrop size.""" match = re.search(r'^\+OK\s+(\d+)\s+(\d+)(.*)\r\n', response) if match: count = int(match.group(1)) size = int(match.group(2)) + HEADER_SIZE_FUDGE_FACTOR * count return '+OK %d %d%s\r\n' % (count, size, match.group(3)) else: return response def onList(self, command, args, response): """Adds the size of an judgement header to the message size(s).""" if response.count('\r\n') > 1: # Multiline: all lines but the first contain a message size. lines = response.split('\r\n') outputLines = [lines[0]] for line in lines[1:]: match = re.search(r'^(\d+)\s+(\d+)', line) if match: number = int(match.group(1)) size = int(match.group(2)) + HEADER_SIZE_FUDGE_FACTOR line = "%d %d" % (number, size) outputLines.append(line) return '\r\n'.join(outputLines) else: # Single line. match = re.search(r'^\+OK\s+(\d+)\s+(\d+)(.*)\r\n', response) if match: messageNumber = match.group(1) size = int(match.group(2)) + HEADER_SIZE_FUDGE_FACTOR trailer = match.group(3) return "+OK %s %s%s\r\n" % (messageNumber, size, trailer) else: return response def onRetr(self, command, args, response): """Adds the judgement header based on the raw headers and body of the message.""" # Use '\n\r?\n' to detect the end of the headers in case of # broken emails that don't use the proper line separators. if re.search(r'\n\r?\n', response): # Remove the trailing .\r\n before passing to the email parser. # Thanks to Scott Schlesier for this fix. terminatingDotPresent = (response[-4:] == '\n.\r\n') if terminatingDotPresent: response = response[:-3] # Break off the first line, which will be '+OK'. ok, messageText = response.split('\n', 1) try: msg = spambayes.message.SBHeaderMessage() msg.setPayload(messageText) msg.setId(state.getNewMessageName()) # Now find the spam disposition and add the header. (prob, clues) = state.bayes.spamprob(msg.asTokens(),\ evidence=True) msg.addSBHeaders(prob, clues) # Check for "RETR" or "TOP N 99999999" - fetchmail without # the 'fetchall' option uses the latter to retrieve messages. if (command == 'RETR' or (command == 'TOP' and len(args) == 2 and args[1] == '99999999')): cls = msg.GetClassification() if cls == options["Headers", "header_ham_string"]: state.numHams += 1 elif cls == options["Headers", "header_spam_string"]: state.numSpams += 1 else: state.numUnsure += 1 # Suppress caching of "Precedence: bulk" or # "Precedence: list" ham if the options say so. isSuppressedBulkHam = \ (cls == options["Headers", "header_ham_string"] and options["Storage", "no_cache_bulk_ham"] and msg.get('precedence') in ['bulk', 'list']) # Suppress large messages if the options say so. size_limit = options["Storage", "no_cache_large_messages"] isTooBig = size_limit > 0 and \ len(messageText) > size_limit # Cache the message. Don't pollute the cache with test # messages or suppressed bulk ham. if (not state.isTest and options["Storage", "cache_messages"] and not isSuppressedBulkHam and not isTooBig): # Write the message into the Unknown cache. message = state.unknownCorpus.makeMessage(msg.getId()) message.setSubstance(msg.as_string()) state.unknownCorpus.addMessage(message) # We'll return the message with the headers added. We take # all the headers from the SBHeaderMessage, but take the body # directly from the POP3 conversation, because the # SBHeaderMessage might have "fixed" a partial message by # appending a closing boundary separator. Remember we can # be dealing with partial message here because of the timeout # code in onServerLine. headers = [] for name, value in msg.items(): header = "%s: %s" % (name, value) headers.append(re.sub(r'\r?\n', '\r\n', header)) body = re.split(r'\n\r?\n', messageText, 1)[1] messageText = "\r\n".join(headers) + "\r\n\r\n" + body except: # Something nasty happened while parsing or classifying - # report the exception in a hand-appended header and recover. # This is one case where an unqualified 'except' is OK, 'cos # anything's better than destroying people's email... stream = cStringIO.StringIO() traceback.print_exc(None, stream) details = stream.getvalue() # Build the header. This will strip leading whitespace from # the lines, so we add a leading dot to maintain indentation. detailLines = details.strip().split('\n') dottedDetails = '\n.'.join(detailLines) headerName = 'X-Spambayes-Exception' header = Header(dottedDetails, header_name=headerName) # Insert the header, converting email.Header's '\n' line # breaks to POP3's '\r\n'. headers, body = re.split(r'\n\r?\n', messageText, 1) header = re.sub(r'\r?\n', '\r\n', str(header)) headers += "\n%s: %s\r\n\r\n" % (headerName, header) messageText = headers + body # Print the exception and a traceback. print >>sys.stderr, details # Restore the +OK and the POP3 .\r\n terminator if there was one. retval = ok + "\n" + messageText if terminatingDotPresent: retval += '.\r\n' return retval else: # Must be an error response. return response def onTop(self, command, args, response): """Adds the judgement header based on the raw headers and as much of the body as the TOP command retrieves.""" # Easy (but see the caveat in BayesProxy.__doc__). return self.onRetr(command, args, response) def onUser(self, command, args, response): """Spins off three separate threads that expires any old messages in the three caches, but does not do any processing of the USER command itself.""" start_new_thread(state.spamCorpus.removeExpiredMessages, ()) start_new_thread(state.hamCorpus.removeExpiredMessages, ()) start_new_thread(state.unknownCorpus.removeExpiredMessages, ()) return response def onUnknown(self, command, args, response): """Default handler; returns the server's response verbatim.""" return response # This keeps the global state of the module - the command-line options, # statistics like how many mails have been classified, the handle of the # log file, the Classifier and FileCorpus objects, and so on. class State: def __init__(self): """Initialises the State object that holds the state of the app. The default settings are read from Options.py and bayescustomize.ini and are then overridden by the command-line processing code in the __main__ code below.""" # Open the log file. if options["globals", "verbose"]: self.logFile = open('_pop3proxy.log', 'wb', 0) self.servers = [] self.proxyPorts = [] if options["pop3proxy", "remote_servers"]: for server in options["pop3proxy", "remote_servers"]: server = server.strip() if server.find(':') > -1: server, port = server.split(':', 1) else: port = '110' self.servers.append((server, int(port))) if options["pop3proxy", "listen_ports"]: splitPorts = options["pop3proxy", "listen_ports"] self.proxyPorts = map(_addressAndPort, splitPorts) if len(self.servers) != len(self.proxyPorts): print "pop3proxy_servers & pop3proxy_ports are different lengths!" sys.exit() # Load up the other settings from Option.py / bayescustomize.ini self.useDB = options["Storage", "persistent_use_database"] self.uiPort = options["html_ui", "port"] self.launchUI = options["html_ui", "launch_browser"] self.gzipCache = options["Storage", "cache_use_gzip"] self.cacheExpiryDays = options["Storage", "cache_expiry_days"] self.runTestServer = False self.isTest = False # Set up the statistics. self.totalSessions = 0 self.activeSessions = 0 self.numSpams = 0 self.numHams = 0 self.numUnsure = 0 # Unique names for cached messages - see `getNewMessageName()` below. self.lastBaseMessageName = '' self.uniquifier = 2 def buildServerStrings(self): """After the server details have been set up, this creates string versions of the details, for display in the Status panel.""" serverStrings = ["%s:%s" % (s, p) for s, p in self.servers] self.serversString = ', '.join(serverStrings) self.proxyPortsString = ', '.join(map(_addressPortStr, self.proxyPorts)) def createWorkers(self): """Using the options that were initialised in __init__ and then possibly overridden by the driver code, create the Bayes object, the Corpuses, the Trainers and so on.""" print "Loading database...", if self.isTest: self.useDB = True options["Storage", "persistent_storage_file"] = \ '_pop3proxy_test.pickle' # This is never saved. filename = options["Storage", "persistent_storage_file"] filename = os.path.expanduser(filename) self.bayes = storage.open_storage(filename, self.useDB) # Don't set up the caches and training objects when running the self-test, # so as not to clutter the filesystem. if not self.isTest: def ensureDir(dirname): try: os.mkdir(dirname) except OSError, e: if e.errno != errno.EEXIST: raise # Create/open the Corpuses. Use small cache sizes to avoid hogging # lots of memory. map(ensureDir, [options["Storage", "spam_cache"], options["Storage", "ham_cache"], options["Storage", "unknown_cache"]]) if self.gzipCache: factory = GzipFileMessageFactory() else: factory = FileMessageFactory() age = options["Storage", "cache_expiry_days"]*24*60*60 self.spamCorpus = ExpiryFileCorpus(age, factory, options["Storage", "spam_cache"], '[0123456789\-]*', cacheSize=20) self.hamCorpus = ExpiryFileCorpus(age, factory, options["Storage", "ham_cache"], '[0123456789\-]*', cacheSize=20) self.unknownCorpus = ExpiryFileCorpus(age, factory, options["Storage", "unknown_cache"], '[0123456789\-]*', cacheSize=20) # Given that (hopefully) users will get to the stage # where they do not need to do any more regular training to # be satisfied with spambayes' performance, we expire old # messages from not only the trained corpora, but the unknown # as well. self.spamCorpus.removeExpiredMessages() self.hamCorpus.removeExpiredMessages() self.unknownCorpus.removeExpiredMessages() # Create the Trainers. self.spamTrainer = storage.SpamTrainer(self.bayes) self.hamTrainer = storage.HamTrainer(self.bayes) self.spamCorpus.addObserver(self.spamTrainer) self.hamCorpus.addObserver(self.hamTrainer) def getNewMessageName(self): # The message name is the time it arrived, with a uniquifier # appended if two arrive within one clock tick of each other. messageName = "%10.10d" % long(time.time()) if messageName == self.lastBaseMessageName: messageName = "%s-%d" % (messageName, self.uniquifier) self.uniquifier += 1 else: self.lastBaseMessageName = messageName self.uniquifier = 2 return messageName # Option-parsing helper functions def _addressAndPort(s): """Decode a string representing a port to bind to, with optional address.""" s = s.strip() if ':' in s: addr, port = s.split(':') return addr, int(port) else: return '', int(s) def _addressPortStr((addr, port)): """Encode a string representing a port to bind to, with optional address.""" if not addr: return str(port) else: return '%s:%d' % (addr, port) state = State() proxyListeners = [] def _createProxies(servers, proxyPorts): """Create BayesProxyListeners for all the given servers.""" for (server, serverPort), proxyPort in zip(servers, proxyPorts): listener = BayesProxyListener(server, serverPort, proxyPort) proxyListeners.append(listener) def _recreateState(): global state # Close the existing listeners and create new ones. This won't # affect any running proxies - once a listener has created a proxy, # that proxy is then independent of it. for proxy in proxyListeners: proxy.close() del proxyListeners[:] # Close the database (if there is one); we should anyway, and gdbm # complains if we try to reopen it without closing it first. if hasattr(state, "bayes"): state.bayes.store() state.bayes.close() state = State() prepare(state) _createProxies(state.servers, state.proxyPorts) return state def main(servers, proxyPorts, uiPort, launchUI): """Runs the proxy forever or until a 'KILL' command is received or someone hits Ctrl+Break.""" _createProxies(servers, proxyPorts) httpServer = UserInterfaceServer(uiPort) proxyUI = ProxyUserInterface(state, _recreateState) httpServer.register(proxyUI) Dibbler.run(launchBrowser=launchUI) def prepare(state): # Do whatever we've been asked to do... state.createWorkers() # Launch any SMTP proxies. Note that if the user hasn't specified any # SMTP proxy information in their configuration, then nothing will # happen. import sb_smtpproxy servers, proxyPorts = sb_smtpproxy.LoadServerInfo() proxyListeners.extend(sb_smtpproxy.CreateProxies(servers, proxyPorts, state)) # setup info for the web interface state.buildServerStrings() def start(state): # kick everything off main(state.servers, state.proxyPorts, state.uiPort, state.launchUI) def stop(state): # Shutdown as though through the web UI. This will save the DB, allow # any open proxy connections to complete, etc. from urllib import urlopen, urlencode urlopen('http://localhost:%d/save' % state.uiPort, urlencode({'how': 'Save & shutdown'})).read() # =================================================================== # __main__ driver. # =================================================================== def run(): # Read the arguments. try: opts, args = getopt.getopt(sys.argv[1:], 'hbpsd:D:l:u:') except getopt.error, msg: print >>sys.stderr, str(msg) + '\n\n' + __doc__ sys.exit() runSelfTest = False for opt, arg in opts: if opt == '-h': print >>sys.stderr, __doc__ sys.exit() elif opt == '-b': state.launchUI = True elif opt == '-d': # dbm file state.useDB = True options["Storage", "persistent_storage_file"] = arg elif opt == '-D': # pickle file state.useDB = False options["Storage", "persistent_storage_file"] = arg elif opt == '-p': # dead option print >>sys.stderr, "-p option is no longer supported, use -D\n" print >>sys.stderr, __doc__ sys.exit() elif opt == '-l': state.proxyPorts = [_addressAndPort(arg)] elif opt == '-u': state.uiPort = int(arg) # Let the user know what they are using... print get_version_string("POP3 Proxy") print "and engine %s.\n" % (get_version_string(),) if 0 <= len(args) <= 2: # Normal usage, with optional server name and port number. if len(args) == 1: state.servers = [(args[0], 110)] elif len(args) == 2: state.servers = [(args[0], int(args[1]))] # Default to listening on port 110 for command-line-specified servers. if len(args) > 0 and state.proxyPorts == []: state.proxyPorts = [('', 110)] prepare(state=state) start(state=state) else: print >>sys.stderr, __doc__ # # Modif medias # import signal def onSignal(signum, stackframe): print 'SpamBayes server medias: got signal ', signum if (signum == 1): try: stop(state=state) except socket.error: pass # # Fin modif medias # if __name__ == '__main__': signal.signal(1, onSignal) # Modif medias run() -------------- next part -------------- -- ----------------------------------------------------------------------- Fernando NI?O CNES - BPi 2102 Medias-France/IRD 18, Av. Edouard Belin T?l: 05.61.27.40.74 31401 Toulouse Cedex 04 From kennypitt at hotmail.com Fri Sep 26 08:51:05 2003 From: kennypitt at hotmail.com (Kenny Pitt) Date: Fri Sep 26 08:51:32 2003 Subject: [spambayes-dev] spambayes.org In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F130212AF63@its-xchg4.massey.ac.nz> Message-ID: <Law11-OE70gAjoJpmh900006881@hotmail.com> Tony Meyer wrote: > Did spambayes.org die after all? I haven't been able to connect to > it for a few days now (although spambayes.sf.net works fine). > > If anyone looks into this, they could also try and figure out why > spambayes.org/downloads/Version.cfg redirects to > spambayes.org//downloads/Version.cfg or whatever it is that causes > the 008 Outlook "check for new version" to fail... > spambayes.org seems to be working fine this morning. The reason Version.cfg redirects to "//downloads/Version.cfg" is that is what the server reports as the new location of the file. Don't know what code is generating the redirect header, though. Here are the raw HTTP response headers returned from a request to http://spambayes.org/downloads/Version.cfg: HTTP/1.1 302 Found Date: Fri, 26 Sep 2003 12:46:02 GMT Server: Apache/1.3.23 (Unix) Debian GNU/Linux Location: http://spambayes.sourceforge.net//downloads/Version.cfg Connection: close Content-Type: text/html; charset=iso-8859-1 -- Kenny Pitt From anssi.porttikivi at teleware.fi Fri Sep 26 10:27:32 2003 From: anssi.porttikivi at teleware.fi (Anssi Porttikivi) Date: Fri Sep 26 10:20:15 2003 Subject: [spambayes-dev] spambayes.org Message-ID: <B36C365832C90E47A37F4FFCDDEFC46D37FC@hkisrv08.tw.fi> Any ideas for making SpamBayes to install to Outlook Office 2000 Premium on top of CrossOver Office on RedHat 9? I tried it, but the SpamBayes install wizard had confused buttons and a mostly blank window. "Cancel" butoon got me out of it, leaving me without Spambayes. BTW, could it theoreticall be possible to run SpamBayes on the Exchange server side? Of course we would still want personal configurations data! Now I have to set up a separate installation for all workstations I use... From skip at pobox.com Fri Sep 26 10:33:45 2003 From: skip at pobox.com (Skip Montanaro) Date: Fri Sep 26 10:33:57 2003 Subject: [spambayes-dev] spambayes.org In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F130212AF63@its-xchg4.massey.ac.nz> References: <1ED4ECF91CDED24C8D012BCF2B034F130212AF63@its-xchg4.massey.ac.nz> Message-ID: <16244.20041.174211.419308@montanaro.dyndns.org> Tony> Did spambayes.org die after all? I haven't been able to connect Tony> to it for a few days now (although spambayes.sf.net works fine). Works for me. Tony> If anyone looks into this, they could also try and figure out why Tony> spambayes.org/downloads/Version.cfg redirects to Tony> spambayes.org//downloads/Version.cfg or whatever it is that causes Tony> the 008 Outlook "check for new version" to fail... That's an Apache configuration issue on www.spambayes.org (aka redir-mail-3-gandi.net at the moment): % telnet spambayes.org 80 Trying 62.80.122.198... telnet: connect to address 62.80.122.198: Connection refused Trying 80.67.173.5... Connected to redir-mail-3.gandi.net. Escape character is '^]'. GET /Version.cfg HTTP/1.0 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ What I asked for Host: www.spambayes.org HTTP/1.1 302 Found Date: Fri, 26 Sep 2003 14:32:03 GMT Server: Apache/1.3.23 (Unix) Debian GNU/Linux Location: http://spambayes.sourceforge.net//Version.cfg ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ What it redirected to Connection: close Content-Type: text/html; charset=iso-8859-1 <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> <HTML><HEAD> <TITLE>302 Found</TITLE> </HEAD><BODY> <H1>Found</H1> The document has moved <A HREF="http://spambayes.sourceforge.net//Version.cfg">here</A>.<P> <HR> <ADDRESS>Apache/1.3.23 Server at redir-www-telehouse2.gandi.net Port 80</ADDRESS> </BODY></HTML> Connection closed by foreign host. Skip From skip at pobox.com Fri Sep 26 10:37:58 2003 From: skip at pobox.com (Skip Montanaro) Date: Fri Sep 26 10:38:15 2003 Subject: [spambayes-dev] Re: [Spambayes] spam error... In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F13036D9E73@its-xchg4.massey.ac.nz> References: <1ED4ECF91CDED24C8D012BCF2B034F13036D9E73@its-xchg4.massey.ac.nz> Message-ID: <16244.20294.129690.622375@montanaro.dyndns.org> Van> I wasn't keeping track. Maybe it't only been one bug with slightly Van> different descriptions? Tony> I have seen two descriptions (for example the two Skip gave) that Tony> fit. There's also the problem that the setup.py script tries to import __version__ from the spambayes package. I think we concluded that it was probably pilot error when operating WinZip and not an error in SpamBayes proper, but I'd like to get confirmation of that. It would also be nice if setup.py could sniff that problem out before trying that import (or catching the ImportError and then sniff around a bit). Skip From skip at pobox.com Fri Sep 26 12:06:40 2003 From: skip at pobox.com (Skip Montanaro) Date: Fri Sep 26 12:06:52 2003 Subject: [spambayes-dev] artificially tweaking the spam/ham to deal with n-way scoring Message-ID: <16244.25616.279288.220168@montanaro.dyndns.org> I'm horsing around with my n-way script a bit and have written a small training script to help maintain the many training databases. Here's how it works. Suppose I have four input mailboxes: python, personal, cars and music, each with a different number of messages (450, 50, 1400, and 50 messages, respectively). I run sb_mboxtrain.py over each one calling all messages "good", e.g.: sb_mboxtrain.py -d python-ham.db -g python sb_mboxtrain.py -d personal-ham.db -g personal sb_mboxtrain.py -d cars-ham.db -g cars sb_mboxtrain.py -d music-ham.db -g music I then create python.db by initializing it with the contents of python-ham.db, but swap all the counts (treating python-ham.db as all "spam"), then merge in the other three unchanged. When I'm finished, I have these figures in the four databases: db spam ham python.db 450 1500 personal.db 50 1900 cars.db 1400 550 music.db 50 1900 My nway script then scores messages against those four databases. As a result of the way I'm building the databases, I can have very extreme ratios. Again, in real life I want to cluster with more mailboxes (I have 15 at the moment). I haven't thought of a good way to truly balance the scores and only run sb_mboxtrain once against each mbox file. I'm thinking I should fudge things. When I merge the multiple smaller databases into a single daatbase I was thinking I would try scaling the larger values by enough to satisfy this relationship: 0.5 <= ham count/spam count <= 2.0 Using the python database as an example, I would scale all ham counts by 900/1500, giving a 2-to-1 ham:spam ratio. Is that roughly what the now defunct imbalance ratio did? Skip From greg at electricrain.com Fri Sep 26 19:28:40 2003 From: greg at electricrain.com (Gregory P. Smith) Date: Fri Sep 26 19:28:44 2003 Subject: [spambayes-dev] Re: [Python-Dev] 2.3.1 is (almost) a go In-Reply-To: <BIEJKCLHCIOIHAGOKOLHGEPPHEAA.tim@zope.com> References: <200309241550.h8OFojlx031706@localhost.localdomain> <BIEJKCLHCIOIHAGOKOLHGEPPHEAA.tim@zope.com> Message-ID: <20030926232840.GB17491@zot.electricrain.com> On Wed, Sep 24, 2003 at 12:01:04PM -0400, Tim Peters wrote: > [Anthony Baxter] > > I haven't been following the spambayes lists too closely. Are there > > concrete problems with bsddb that are cropping up, or just a general > > wariness of it? > > > > If there _is_ a problem with bsddb, it needs to be addressed. Too > > many things depend on it. > > Reports of database corruption are common in spambayes. At least one > knowledgable tester reported his problems went away after moving to a recent > Sleepycat release (4.1.25, IIRC). ... > They seem to come from non-Outlook people > using Berkeley for the message info database. Richie got a whittled down > threaded test that fails on Windows and Linux, and there's already a > (Python) bug report open on that; it's not thought to be relevant to how > spambayes uses Berkeley, though. Its been my impression that the sporatic bsddb testsuite failures are BerkeleyDB related rather than anything the python module can be responsible for (other than the previously mentioned needed improvements in cleaning up the on disk "temporary" test environment before launching tests). For anyone reporting berkeleydb issues, its important to find out the version of BerkeleyDB they're using and on what platform. When I find time I want to take the simple test that Richie created, try it in C and if it still fails there (it should) on the 4.2.xx release canidate i'll submit it to sleepycat. The full BerkeleyDB library is unfortunately complicated because it supports so much more than most people need. -g From skip at pobox.com Fri Sep 26 22:11:22 2003 From: skip at pobox.com (Skip Montanaro) Date: Fri Sep 26 23:11:42 2003 Subject: [spambayes-dev] dump db version in db['saved state']? Message-ID: <16244.61898.688879.91423@montanaro.dyndns.org> I've been messing around a lot manipulating training databases. I always wind up special-casing db["saved state"] because it's a three-element tuple (version, spamcount, hamcount), while all the token keys are two-element tuples. Has the version ever changed? Do we expect it to change? If not, can we get rid of it? That way, anything which loops over a database won't need to worry about the 'saved state' key, at least not nearly as often. Skip From anthony at interlink.com.au Sat Sep 27 01:16:10 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Sat Sep 27 01:18:12 2003 Subject: [spambayes-dev] Re: [Python-Dev] 2.3.1 is (almost) a go In-Reply-To: <20030926232840.GB17491@zot.electricrain.com> Message-ID: <200309270516.h8R5GAT7007511@localhost.localdomain> >>> "Gregory P. Smith" wrote > Its been my impression that the sporatic bsddb testsuite failures > are BerkeleyDB related rather than anything the python module can be > responsible for (other than the previously mentioned needed improvements > in cleaning up the on disk "temporary" test environment before launching > tests). For anyone reporting berkeleydb issues, its important to find > out the version of BerkeleyDB they're using and on what platform. I seem to recall some concern that using bsddb without a DBEnv being created could be causing problems - if this is the case, should we consider putting a deprecation warning into the code for people doing this? Anthony -- Anthony Baxter <anthony@interlink.com.au> It's never too late to have a happy childhood. From rob at hooft.net Sat Sep 27 02:06:15 2003 From: rob at hooft.net (Rob Hooft) Date: Sat Sep 27 02:06:14 2003 Subject: [spambayes-dev] dump db version in db['saved state']? In-Reply-To: <16244.61898.688879.91423@montanaro.dyndns.org> References: <16244.61898.688879.91423@montanaro.dyndns.org> Message-ID: <3F7528D7.4030007@hooft.net> Skip Montanaro wrote: > I've been messing around a lot manipulating training databases. I always > wind up special-casing db["saved state"] because it's a three-element tuple > (version, spamcount, hamcount), while all the token keys are two-element > tuples. > > Has the version ever changed? Do we expect it to change? If not, can we > get rid of it? That way, anything which loops over a database won't need to > worry about the 'saved state' key, at least not nearly as often. We can change the format, just get rid of the version number and then increment it..... :-) And now for something completely different: I just checked the "double slash" in the gandi redirect for spambayes.org is gone now since I reconfigured it yesterday. Rob -- Rob W.W. Hooft || rob@hooft.net || http://www.hooft.net/people/rob/ From martin at v.loewis.de Sat Sep 27 02:58:13 2003 From: martin at v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=) Date: Sat Sep 27 02:58:39 2003 Subject: [spambayes-dev] Re: [Python-Dev] 2.3.1 is (almost) a go In-Reply-To: <200309270516.h8R5GAT7007511@localhost.localdomain> References: <200309270516.h8R5GAT7007511@localhost.localdomain> Message-ID: <m3pthmbtve.fsf@mira.informatik.hu-berlin.de> Anthony Baxter <anthony@interlink.com.au> writes: > I seem to recall some concern that using bsddb without a DBEnv being > created could be causing problems - if this is the case, should we > consider putting a deprecation warning into the code for people > doing this? Not until we have determined with certainty that this *is* the case. Regards, Martin From skip at pobox.com Sat Sep 27 14:11:07 2003 From: skip at pobox.com (Skip Montanaro) Date: Sat Sep 27 14:12:30 2003 Subject: [spambayes-dev] Re: [Python-Dev] 2.3.1 is (almost) a go In-Reply-To: <m3pthmbtve.fsf@mira.informatik.hu-berlin.de> References: <200309270516.h8R5GAT7007511@localhost.localdomain> <m3pthmbtve.fsf@mira.informatik.hu-berlin.de> Message-ID: <16245.53947.33728.86658@montanaro.dyndns.org> Martin> Anthony Baxter <anthony@interlink.com.au> writes: >> I seem to recall some concern that using bsddb without a DBEnv being >> created could be causing problems - if this is the case, should we >> consider putting a deprecation warning into the code for people doing >> this? Martin> Not until we have determined with certainty that this *is* the Martin> case. According to info I got from Sleepycat, a DBEnv is required for using in a multi-threaded environment. Since the legacy bsddb module API is widely used and use of threading has increased in the past couple years, I think we need to figure out how to solve that problem. This is just a wild-ass guess (I think I posted something like it before), but maybe all that's needed is to extend the bsddb.(bt|hash|rn)open functions to accept a dbenv arg, define a module-level default environment in bsddb/__init__.py which is used as the dbenv arg if the caller doesn't provide one. The __init__.py code would look like this: _env = db.DBEnv() def hashopen(file, flag='c', mode=0666, pgsize=None, ffactor=None, nelem=None, cachesize=None, lorder=None, hflags=0, dbenv=_env): flags = _checkflag(flag) d = db.DB(dbenv) ... def btopen(file, flag='c', mode=0666, btflags=0, cachesize=None, maxkeypage=None, minkeypage=None, pgsize=None, lorder=None, dbenv=_env): flags = _checkflag(flag) d = db.DB(dbenv) ... def rnopen(file, flag='c', mode=0666, rnflags=0, cachesize=None, pgsize=None, lorder=None, rlen=None, delim=None, source=None, pad=None, dbenv=_env): flags = _checkflag(flag) d = db.DB(dbenv) ... My reading of the bsddb3 docs at <http://pybsddb.sourceforge.net/bsddb3.html> suggests that should be sufficient (though certain args may need to be passed to the DBEnv() call). See <http://python.org/sf/775414>. Attached is a modified version of the hammer.py script which seems to not fail for me on either Windows run from IDLE (Python 2.3, BDB 4.1.6) or Mac OS X (Python CVS, BDB 4.2.1). The original script failed for me on Windows but not Mac OS X. Can some other people for whom the original script fails please try it? (I also attached it to bug #775414.) Skip -------------- next part -------------- A non-text attachment was scrubbed... Name: studly_hammer.py Type: application/octet-stream Size: 1818 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes-dev/attachments/20030927/b92fcc07/studly_hammer.obj From libove at felines.org Sat Sep 27 14:19:14 2003 From: libove at felines.org (Jay Libove) Date: Sat Sep 27 14:18:48 2003 Subject: [spambayes-dev] suggestion: add option to delete messages on clicking "delete as spam" Message-ID: <F22493275DA6AF41937802596110548D0FF18C@reset4.felines.org> Presently, (0.81) clicking "delete as spam" simply moves a message to the known-to-be-spam folder (and trains). I'd like to request an enhancement, so that clicking "delete as spam" will still train, but will actually delete the message rather than moving it to the known-spam folder (which I then have to go in to to delete it anyway). Thanks! -Jay Libove, CISSP -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes-dev/attachments/20030927/cb10effa/attachment.html From martin at v.loewis.de Sat Sep 27 16:39:32 2003 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sat Sep 27 16:39:57 2003 Subject: [spambayes-dev] Re: [Python-Dev] 2.3.1 is (almost) a go In-Reply-To: <16245.53947.33728.86658@montanaro.dyndns.org> References: <200309270516.h8R5GAT7007511@localhost.localdomain> <m3pthmbtve.fsf@mira.informatik.hu-berlin.de> <16245.53947.33728.86658@montanaro.dyndns.org> Message-ID: <3F75F584.8040207@v.loewis.de> Skip Montanaro wrote: > According to info I got from Sleepycat, a DBEnv is required for using in a > multi-threaded environment. Since the legacy bsddb module API is widely > used and use of threading has increased in the past couple years, I think we > need to figure out how to solve that problem. This is just a wild-ass guess > (I think I posted something like it before), but maybe all that's needed is > to extend the bsddb.(bt|hash|rn)open functions to accept a dbenv arg, define > a module-level default environment in bsddb/__init__.py which is used as the > dbenv arg if the caller doesn't provide one. The __init__.py code would > look like this: > > _env = db.DBEnv() I think we should define a set of flags for the "common case". In general, multiple threads may write to the same database, as might multiple instances of the application. IOW, we might want to create the environment with DB_INIT_CDB|DB_INIT_MPOOL|DB_THREAD. Applications that don't want to suffer from the possible serialization of CDB would need to use their own environment. Regards, Martin From greg at electricrain.com Sat Sep 27 16:51:47 2003 From: greg at electricrain.com (Gregory P. Smith) Date: Sat Sep 27 16:51:58 2003 Subject: [spambayes-dev] Re: [Python-Dev] 2.3.1 is (almost) a go In-Reply-To: <3F75F584.8040207@v.loewis.de> References: <200309270516.h8R5GAT7007511@localhost.localdomain> <m3pthmbtve.fsf@mira.informatik.hu-berlin.de> <16245.53947.33728.86658@montanaro.dyndns.org> <3F75F584.8040207@v.loewis.de> Message-ID: <20030927205147.GH17491@zot.electricrain.com> > >According to info I got from Sleepycat, a DBEnv is required for using in a > >multi-threaded environment. Since the legacy bsddb module API is widely > >used and use of threading has increased in the past couple years, I think > >we > >need to figure out how to solve that problem. This is just a wild-ass > >guess > >(I think I posted something like it before), but maybe all that's needed is > >to extend the bsddb.(bt|hash|rn)open functions to accept a dbenv arg, > >define > >a module-level default environment in bsddb/__init__.py which is used as > >the > >dbenv arg if the caller doesn't provide one. The __init__.py code would > >look like this: > > > > _env = db.DBEnv() > > I think we should define a set of flags for the "common case". In > general, multiple threads may write to the same database, as might > multiple instances of the application. IOW, we might want to create > the environment with DB_INIT_CDB|DB_INIT_MPOOL|DB_THREAD. It is worth noting that using a DBEnv in this manner will create a bunch of auxilary DBEnv related files on the disk. That has the potential of confusing people who expect only the database file. It also means that separate unrelated databases cannot exist in the same directory without being part of the same DBEnv (which BerkeleyDB multiprocess access should handle just fine; but it might not be what people expect). > Applications that don't want to suffer from the possible > serialization of CDB would need to use their own environment. An alternative would be to say that applications that want to use bsddb with threading need to use the *real* BerkeleyDB API rather than the ancient compatibility interface. (it'd be easy to check that bsddb doesn't get used by multiple threads, raising an exception if it does) greg From martin at v.loewis.de Sat Sep 27 17:17:46 2003 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sat Sep 27 17:18:05 2003 Subject: [spambayes-dev] Re: [Python-Dev] 2.3.1 is (almost) a go In-Reply-To: <20030927205147.GH17491@zot.electricrain.com> References: <200309270516.h8R5GAT7007511@localhost.localdomain> <m3pthmbtve.fsf@mira.informatik.hu-berlin.de> <16245.53947.33728.86658@montanaro.dyndns.org> <3F75F584.8040207@v.loewis.de> <20030927205147.GH17491@zot.electricrain.com> Message-ID: <3F75FE7A.6070800@v.loewis.de> Gregory P. Smith wrote: > It is worth noting that using a DBEnv in this manner will create a bunch > of auxilary DBEnv related files on the disk. That has the potential of > confusing people who expect only the database file. It also means that > separate unrelated databases cannot exist in the same directory without > being part of the same DBEnv (which BerkeleyDB multiprocess access should > handle just fine; but it might not be what people expect). I see. I could accept that it is confusing; would it also be backwards compatible (i.e. would BerkeleyDB create those files on demand, and would old Python installation still be able to read the database even if those files where around)? > An alternative would be to say that applications that want to use bsddb > with threading need to use the *real* BerkeleyDB API rather than the > ancient compatibility interface. (it'd be easy to check that bsddb > doesn't get used by multiple threads, raising an exception if it does) But is it also easy to detect that multiple applications try to use the same database? Regards, Martin From greg at electricrain.com Sat Sep 27 17:33:32 2003 From: greg at electricrain.com (Gregory P. Smith) Date: Sat Sep 27 17:33:36 2003 Subject: [spambayes-dev] Re: [Python-Dev] 2.3.1 is (almost) a go In-Reply-To: <3F75FE7A.6070800@v.loewis.de> References: <200309270516.h8R5GAT7007511@localhost.localdomain> <m3pthmbtve.fsf@mira.informatik.hu-berlin.de> <16245.53947.33728.86658@montanaro.dyndns.org> <3F75F584.8040207@v.loewis.de> <20030927205147.GH17491@zot.electricrain.com> <3F75FE7A.6070800@v.loewis.de> Message-ID: <20030927213332.GI17491@zot.electricrain.com> On Sat, Sep 27, 2003 at 11:17:46PM +0200, "Martin v. L?wis" wrote: > Gregory P. Smith wrote: > > >It is worth noting that using a DBEnv in this manner will create a bunch > >of auxilary DBEnv related files on the disk. That has the potential of > >confusing people who expect only the database file. It also means that > >separate unrelated databases cannot exist in the same directory without > >being part of the same DBEnv (which BerkeleyDB multiprocess access should > >handle just fine; but it might not be what people expect). > > I see. I could accept that it is confusing; would it also be backwards > compatible (i.e. would BerkeleyDB create those files on demand, and > would old Python installation still be able to read the database even > if those files where around)? It would not. Its already not backwards compatible. Thats what the bsddb185 module is for. The BerkeleyDB file format changed between 3.1, 3.2 and 4.0 even. BerkeleyDB can upgrade from an older format to the current one (using the DB.upgrade method). > >An alternative would be to say that applications that want to use bsddb > >with threading need to use the *real* BerkeleyDB API rather than the > >ancient compatibility interface. (it'd be easy to check that bsddb > >doesn't get used by multiple threads, raising an exception if it does) > > But is it also easy to detect that multiple applications try to use > the same database? No. But it is easy to document. The old bsddb module never allowed it either. I only suggest detecting multithreaded access because it is possible to do. Anyone who wants multiprocess access should use the real DBEnv+DB interface and know what they're doing. -g From skip at pobox.com Sat Sep 27 18:07:57 2003 From: skip at pobox.com (Skip Montanaro) Date: Sat Sep 27 18:08:17 2003 Subject: [spambayes-dev] Re: [Python-Dev] 2.3.1 is (almost) a go In-Reply-To: <3F75F584.8040207@v.loewis.de> References: <200309270516.h8R5GAT7007511@localhost.localdomain> <m3pthmbtve.fsf@mira.informatik.hu-berlin.de> <16245.53947.33728.86658@montanaro.dyndns.org> <3F75F584.8040207@v.loewis.de> Message-ID: <16246.2621.472392.671642@montanaro.dyndns.org> Martin> Applications that don't want to suffer from the possible Martin> serialization of CDB would need to use their own environment. I've given that a little more thought. Instead of encumbering the factory functions which implement the old API with an optional dbenv argument, I think if people want to provide their own environment it's reasonable to expect them to use the new bsddb3 API. All we should do with the old API is make it work in a multi-threaded environment if possible. Skip From skip at pobox.com Sat Sep 27 18:09:37 2003 From: skip at pobox.com (Skip Montanaro) Date: Sat Sep 27 18:09:56 2003 Subject: [spambayes-dev] Re: [Python-Dev] 2.3.1 is (almost) a go In-Reply-To: <20030927205147.GH17491@zot.electricrain.com> References: <200309270516.h8R5GAT7007511@localhost.localdomain> <m3pthmbtve.fsf@mira.informatik.hu-berlin.de> <16245.53947.33728.86658@montanaro.dyndns.org> <3F75F584.8040207@v.loewis.de> <20030927205147.GH17491@zot.electricrain.com> Message-ID: <16246.2721.881433.68068@montanaro.dyndns.org> Greg> An alternative would be to say that applications that want to use Greg> bsddb with threading need to use the *real* BerkeleyDB API rather Greg> than the ancient compatibility interface. (it'd be easy to check Greg> that bsddb doesn't get used by multiple threads, raising an Greg> exception if it does) Well, perhaps a warning. I doubt you can tell if the programmer has provided his own locks around db accesses. Skip From greg at electricrain.com Sat Sep 27 19:09:07 2003 From: greg at electricrain.com (Gregory P. Smith) Date: Sat Sep 27 19:09:18 2003 Subject: [spambayes-dev] Re: [Python-Dev] 2.3.1 is (almost) a go In-Reply-To: <16246.2621.472392.671642@montanaro.dyndns.org> References: <200309270516.h8R5GAT7007511@localhost.localdomain> <m3pthmbtve.fsf@mira.informatik.hu-berlin.de> <16245.53947.33728.86658@montanaro.dyndns.org> <3F75F584.8040207@v.loewis.de> <16246.2621.472392.671642@montanaro.dyndns.org> Message-ID: <20030927230907.GJ17491@zot.electricrain.com> On Sat, Sep 27, 2003 at 05:07:57PM -0500, Skip Montanaro wrote: > > Martin> Applications that don't want to suffer from the possible > Martin> serialization of CDB would need to use their own environment. > > I've given that a little more thought. Instead of encumbering the factory > functions which implement the old API with an optional dbenv argument, I > think if people want to provide their own environment it's reasonable to > expect them to use the new bsddb3 API. All we should do with the old API is > make it work in a multi-threaded environment if possible. > > Skip Agreed, that sounds like a good option. In my previous email I had forgotten about the DB_PRIVATE flag to DBEnv objects; that prevents them from writing extra DBEnv files to the filesystem for use when multi-process access is not needed. I just committed the small change needed to bsddb/__init__.py for it to use a DBEnv allowing multithreaded access. The original hammer.py from bug 775414 has been running for 15 minutes without problems on my alpha with BerkeleyDB 4.1.25. -g From tim.one at comcast.net Sat Sep 27 22:37:45 2003 From: tim.one at comcast.net (Tim Peters) Date: Sat Sep 27 22:37:52 2003 Subject: [spambayes-dev] dump db version in db['saved state']? In-Reply-To: <16244.61898.688879.91423@montanaro.dyndns.org> Message-ID: <LNBBLJKPBEHFEDALKOLCOEPEGCAB.tim.one@comcast.net> [Skip] > I've been messing around a lot manipulating training databases. I > always wind up special-casing db["saved state"] because it's a > three-element tuple (version, spamcount, hamcount), while all the > token keys are two-element tuples. > > Has the version ever changed? Sure -- the original WordInfo saved gads more info (like last access time). The *intent* of the version number has always been ignored, though: it's there so that future code can reliably recognize that a database it's reading is an old one, and auto-convert it to the current version. Instead, code so far has simply barfed if it sees a version number it doesn't like. > Do we expect it to change? It's good defensive practice to expect change. > If not, can we get rid of it? That way, anything which loops over a > database won't need to worry about the 'saved state' key, at least > not nearly as often. How freakin' lazy can you get <wink>? A database should record its own version number. There's no particular reason it got saved in a 3-tuple mixed in with other stuff, though. If you think life is going to get clarified if the database stored only 2-tuples, then, for example, bump the version number again, store a 2-tuple containing the version number, and a distinct 2-tuple storing the #s of ham and spam trained on. From T.A.Meyer at massey.ac.nz Sat Sep 27 23:16:19 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Sat Sep 27 23:16:43 2003 Subject: [spambayes-dev] spambayes.org Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F13036D9EA4@its-xchg4.massey.ac.nz> > Any ideas for making SpamBayes to install to Outlook Office > 2000 Premium on top of CrossOver Office on RedHat 9? I imagine that this will happen as soon as someone capable of doing the development to make it work has CrossOver Office on RedHat 9 :) In the meantime, you can always use the POP3 proxy or IMAP filter with it. Not quite the same experience, but the results are the same. > BTW, could it theoreticall be possible to run SpamBayes on > the Exchange server side? Of course we would still want > personal configurations data! Now I have to set up a separate > installation for all workstations I use... Start by reading <http://spambayes.sourceforge.net/server_side.html> - there are comments there from people that have successfully used spambayes server side (including one that is with Exchange). Neither has individual training data, AFAICT, but you could probably figure out a way to do that. =Tony Meyer From ta-meyer at ihug.co.nz Sat Sep 27 23:21:10 2003 From: ta-meyer at ihug.co.nz (Tony Meyer) Date: Sat Sep 27 23:21:16 2003 Subject: [spambayes-dev] spambayes.org In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F130364AF37@its-xchg4.massey.ac.nz> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130212AF78@its-xchg4.massey.ac.nz> [Me] > Did spambayes.org die after all? I haven't been able to connect > to it for a few days now (although spambayes.sf.net works fine). [Skip, Rob and Kenny assert that it works for them] Looks like it's just me then, I suppose. Odd. A tracert dies at gandi-gw.cdg4.fr.mfnx.net [62.4.77.238]. The "gandi" bit was in addresses that Skip and Rob mentioned, so it looks like it's getting close. Presumably something that will sort itself out, then. Thanks for confirming that it's me, though :) =Tony Meyer From barry at python.org Sun Sep 28 00:42:30 2003 From: barry at python.org (Barry Warsaw) Date: Sun Sep 28 00:42:44 2003 Subject: [spambayes-dev] Re: [Python-Dev] 2.3.1 is (almost) a go In-Reply-To: <16246.2621.472392.671642@montanaro.dyndns.org> References: <200309270516.h8R5GAT7007511@localhost.localdomain> <m3pthmbtve.fsf@mira.informatik.hu-berlin.de> <16245.53947.33728.86658@montanaro.dyndns.org> <3F75F584.8040207@v.loewis.de> <16246.2621.472392.671642@montanaro.dyndns.org> Message-ID: <1064724149.31604.83.camel@anthem> On Sat, 2003-09-27 at 18:07, Skip Montanaro wrote: > Martin> Applications that don't want to suffer from the possible > Martin> serialization of CDB would need to use their own environment. > > I've given that a little more thought. Instead of encumbering the factory > functions which implement the old API with an optional dbenv argument, I > think if people want to provide their own environment it's reasonable to > expect them to use the new bsddb3 API. All we should do with the old API is > make it work in a multi-threaded environment if possible. +1 -Barry From anthony at interlink.com.au Sun Sep 28 01:03:39 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Sun Sep 28 01:06:02 2003 Subject: [spambayes-dev] Re: [Python-Dev] 2.3.1 is (almost) a go In-Reply-To: <1064724149.31604.83.camel@anthem> Message-ID: <200309280503.h8S53e3P005335@localhost.localdomain> >>> Barry Warsaw wrote > On Sat, 2003-09-27 at 18:07, Skip Montanaro wrote: > > I've given that a little more thought. Instead of encumbering the factory > > functions which implement the old API with an optional dbenv argument, I > > think if people want to provide their own environment it's reasonable to > > expect them to use the new bsddb3 API. All we should do with the old API is > > make it work in a multi-threaded environment if possible. > > +1 > -Barry I'm +1 on this as well, but with the caveat that if we _can't_ make it work reliably in a MT environment, we should either decline the opportunity to mangle the user's data, or at _least_ issue a warning. Anthony -- Anthony Baxter <anthony@interlink.com.au> It's never too late to have a happy childhood. From martin at v.loewis.de Sun Sep 28 15:18:08 2003 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun Sep 28 15:18:36 2003 Subject: [spambayes-dev] Re: [Python-Dev] 2.3.1 is (almost) a go In-Reply-To: <20030927213332.GI17491@zot.electricrain.com> References: <200309270516.h8R5GAT7007511@localhost.localdomain> <m3pthmbtve.fsf@mira.informatik.hu-berlin.de> <16245.53947.33728.86658@montanaro.dyndns.org> <3F75F584.8040207@v.loewis.de> <20030927205147.GH17491@zot.electricrain.com> <3F75FE7A.6070800@v.loewis.de> <20030927213332.GI17491@zot.electricrain.com> Message-ID: <3F7733F0.6060803@v.loewis.de> Gregory P. Smith wrote: >>I see. I could accept that it is confusing; would it also be backwards >>compatible (i.e. would BerkeleyDB create those files on demand, and >>would old Python installation still be able to read the database even >>if those files where around)? > > > It would not. Its already not backwards compatible. Thats what the > bsddb185 module is for. I should be more specific: If CDB is activated by default in 2.3.1, would that compatible with files created in 2.3.0 (which are not 1.85 files, but some 4.x files). >>But is it also easy to detect that multiple applications try to use >>the same database? > > > No. But it is easy to document. The old bsddb module never allowed > it either. I doubt many users are aware of that restriction (I, myself, wasn't). Regards, Martin From martin at v.loewis.de Sun Sep 28 15:34:00 2003 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun Sep 28 15:34:25 2003 Subject: [spambayes-dev] Re: [Python-Dev] 2.3.1 is (almost) a go In-Reply-To: <16246.2621.472392.671642@montanaro.dyndns.org> References: <200309270516.h8R5GAT7007511@localhost.localdomain> <m3pthmbtve.fsf@mira.informatik.hu-berlin.de> <16245.53947.33728.86658@montanaro.dyndns.org> <3F75F584.8040207@v.loewis.de> <16246.2621.472392.671642@montanaro.dyndns.org> Message-ID: <3F7737A8.8010400@v.loewis.de> Skip Montanaro wrote: > I've given that a little more thought. Instead of encumbering the factory > functions which implement the old API with an optional dbenv argument, I > think if people want to provide their own environment it's reasonable to > expect them to use the new bsddb3 API. All we should do with the old API is > make it work in a multi-threaded environment if possible. But wouldn't that precisely involve creating environments? If you merely propose that the environment is not a parameter - +1. Regards, Martin From tim.one at comcast.net Sun Sep 28 21:38:44 2003 From: tim.one at comcast.net (Tim Peters) Date: Sun Sep 28 21:38:56 2003 Subject: [spambayes-dev] RE: [Python-Dev] 2.3.1 is (almost) a go In-Reply-To: <16245.53947.33728.86658@montanaro.dyndns.org> Message-ID: <LNBBLJKPBEHFEDALKOLCCECOGDAB.tim.one@comcast.net> [Skip Montanaro] > ... > Attached is a modified version of the hammer.py script which seems to > not fail for me on either Windows run from IDLE (Python 2.3, BDB > 4.1.6) or Mac OS X (Python CVS, BDB 4.2.1). The original script > failed for me on Windows but not Mac OS X. Can some other people for > whom the original script fails please try it? (I also attached it to > bug #775414.) On Win98SE with current Python 2.3.1, it doesn't fail, but it never seemed to finish for me either. Staring at WinTop showed that the Python process stopped accumulating cycles. Can't be killed with Ctrl+C (no visible effect). Can be killed with Ctrl+Break. Dumping print "%s %s" % (thread.get_ident(), i) at the top of the hammer loop showed that the threads get through several hundred iterations, then all printing stops. Attaching to a debug-build Python from the debugger when a freeze occurs isn't terribly illuminating. One thread's stack shows _BSDDB_D! __db_win32_mutex_lock + 134 bytes _BSDDB_D! __lock_get + 2264 bytes _BSDDB_D! __lock_get + 197 bytes _BSDDB_D! __ham_get_meta + 120 bytes _BSDDB_D! __ham_c_dup + 4201 bytes _BSDDB_D! __db_c_put + 2544 bytes _BSDDB_D! __db_put + 507 bytes _DB_put(DBObject * 0x016cff88, __db_txn * 0x016d0000, __db_dbt * 0x016cc000, __db_dbt * 0x50d751fe, int 0) line 562 + 35 bytes The main thread's stack shows _BSDDB_D! __db_win32_mutex_lock + 134 bytes _BSDDB_D! __lock_get + 2264 bytes _BSDDB_D! __lock_get + 197 bytes _BSDDB_D! __db_lget + 365 bytes _BSDDB_D! __ham_lock_bucket + 105 bytes _BSDDB_D! __ham_get_cpage + 195 bytes _BSDDB_D! __ham_item_next + 25 bytes _BSDDB_D! __ham_call_hash + 2479 bytes _BSDDB_D! __ham_c_dup + 4307 bytes _BSDDB_D! __db_c_put + 2544 bytes _BSDDB_D! __db_put + 507 bytes _DB_put(DBObject * 0x008fe2e8, __db_txn * 0x00000000, __db_dbt * 0x0062f230, __db_dbt * 0x0062f248, int 0) line 562 + 35 bytes DB_ass_sub(DBObject * 0x008fe2e8, _object * 0x00b83178, _object * 0x00b83370) line 2330 + 23 bytes PyObject_SetItem(_object * 0x008fe2e8, _object * 0x00b83178, _object * 0x00b83370) line 123 + 18 bytes eval_frame(_frame * 0x00984948) line 1448 + 17 bytes ... The other threads are somewhere in the OS kernel and don't have useful tracebacks. This varies from run to run, but all threads with a useful stack are always stuck at the same place in __db_win32_mutex_lock. All in all, looks like it's simply deadlocked. Running the original hammer.py under current CVS Python freezes in the same way now. I added this info to the bug report: http://www.python.org/sf/775414 From tim.one at comcast.net Sun Sep 28 22:32:04 2003 From: tim.one at comcast.net (Tim Peters) Date: Sun Sep 28 22:32:11 2003 Subject: [spambayes-dev] artificially tweaking the spam/ham to deal withn-way scoring In-Reply-To: <16244.25616.279288.220168@montanaro.dyndns.org> Message-ID: <LNBBLJKPBEHFEDALKOLCOEDBGDAB.tim.one@comcast.net> [Skip] > I'm horsing around with my n-way script a bit and have written a small > training script to help maintain the many training databases. Here's > how it works. Suppose I have four input mailboxes: python, personal, > cars and music, each with a different number of messages (450, 50, > 1400, and 50 messages, respectively). I run sb_mboxtrain.py over > each one calling all messages "good", e.g.: > > sb_mboxtrain.py -d python-ham.db -g python > sb_mboxtrain.py -d personal-ham.db -g personal > sb_mboxtrain.py -d cars-ham.db -g cars > sb_mboxtrain.py -d music-ham.db -g music > > I then create python.db by initializing it with the contents of > python-ham.db, but swap all the counts (treating python-ham.db as all > "spam"), then merge in the other three unchanged. When I'm finished, > I have these figures in the four databases: > > db spam ham > python.db 450 1500 > personal.db 50 1900 > cars.db 1400 550 > music.db 50 1900 > > My nway script then scores messages against those four databases. > > As a result of the way I'm building the databases, I can have very > extreme ratios. Again, in real life I want to cluster with more > mailboxes (I have 15 at the moment). I haven't thought of a good way > to truly balance the scores and only run sb_mboxtrain once against > each mbox file. I'm thinking I should fudge things. When I merge > the multiple smaller databases into a single daatbase I was thinking > I would try scaling the larger values by enough to satisfy this > relationship: > > 0.5 <= ham count/spam count <= 2.0 > > Using the python database as an example, I would scale all ham counts > by 900/1500, giving a 2-to-1 ham:spam ratio. > > Is that roughly what the now defunct imbalance ratio did? Yup, but your way isn't as extreme. In all cases the spamprob we use is a weighted average of 0.5 and a by-counting spamprob guess. The imbalance adjustment just reduced the weight on the by-counting spamprob guess, to what it would have been if we had an equal number of ham and spam msgs. Your adjustment doesn't change the by-counting spamprob guess either (well, it does, but due to quantization error: multiplying a count by 900/1500 = 0.6 usually won't yield an exact integer, and you'll have to lose the info in the fractional part to store the result as an integer). Your adjustment also reduces the weight, but not as drastically. From ta-meyer at ihug.co.nz Mon Sep 29 03:34:42 2003 From: ta-meyer at ihug.co.nz (Tony Meyer) Date: Mon Sep 29 03:34:48 2003 Subject: [spambayes-dev] Web Interface Help Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130212AF96@its-xchg4.massey.ac.nz> I've checked in a basic structure for some 'help' pages for the web interface. Basically there's a link in the footer which will display either the standard help page, or a page specific to the one on which the user clicked. There's a rudimentary standard page now, plus a different one for the review page, just as an example, but pretty much all the help needs to be written yet (or borrowed from other docs). There's a 'help' gif that goes with it, but it doesn't really match the other ones. If whoever made those (Richie?) could make a matching help one, that would be great :) Anyway, the point is, if anyone would like to write something to go in the help page for a particular part of the interface, please feel free :) At the moment the onHelp function in UserInterface.py serves up the correct help text given the topic, but this should be moved out (into a set of .ht files?) at some point. Comments & criticsm welcome as usual, of course ;) =Tony Meyer From T.A.Meyer at massey.ac.nz Mon Sep 29 03:38:54 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Mon Sep 29 03:39:00 2003 Subject: [spambayes-dev] Re: [Spambayes] spam error... Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F13036DA354@its-xchg4.massey.ac.nz> > There's also the problem that the setup.py script tries to > import __version__ from the spambayes package. I think we > concluded that it was probably pilot error when operating > WinZip and not an error in SpamBayes proper, but I'd like to > get confirmation of that. It would also be nice if setup.py > could sniff that problem out before trying that import (or > catching the ImportError and then sniff around a bit). Would this work? Is there anywhere else we could look for it? """ try: from spambayes.Version import __version__ except ImportError: # File should be there, maybe the user expanded it all # into one directory? try: from Version import __version__ except ImportError: print "Cannot find Version.py - something appears to " \ "have gone wrong unpacking the archive. Please " \ "try again, ensuring that you expand the complete " \ "archive, and retain the directory structure." sys.exit() """ =Tony Meyer From kennypitt at hotmail.com Mon Sep 29 09:49:11 2003 From: kennypitt at hotmail.com (Kenny Pitt) Date: Mon Sep 29 09:49:41 2003 Subject: [spambayes-dev] spambayes.org In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F130212AF78@its-xchg4.massey.ac.nz> Message-ID: <Law11-OE39Ji7v5Bv6o000087de@hotmail.com> Tony Meyer wrote: > [Me] >> Did spambayes.org die after all? I haven't been able to connect >> to it for a few days now (although spambayes.sf.net works fine). > > [Skip, Rob and Kenny assert that it works for them] > > Looks like it's just me then, I suppose. Odd. A tracert dies at > gandi-gw.cdg4.fr.mfnx.net [62.4.77.238]. The "gandi" bit was in > addresses that Skip and Rob mentioned, so it looks like it's getting > close. Presumably something that will sort itself out, then. > > Thanks for confirming that it's me, though :) Close indeed! My tracert indicates that it should be the next hop, which reports as redir-mail-3.gandi.net. That server shares the same IP as spambayes.org, so I assume it must be the server where the domain is hosted. Very odd that you would die after getting that close, while others of us don't seem to be having any trouble. -- Kenny Pitt From anthony at interlink.com.au Mon Sep 29 09:53:02 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Mon Sep 29 09:55:21 2003 Subject: [spambayes-dev] spambayes.org In-Reply-To: <Law11-OE39Ji7v5Bv6o000087de@hotmail.com> Message-ID: <200309291353.h8TDr3WZ002221@localhost.localdomain> >>> "Kenny Pitt" wrote > Close indeed! My tracert indicates that it should be the next hop, > which reports as redir-mail-3.gandi.net. That server shares the same IP > as spambayes.org, so I assume it must be the server where the domain is > hosted. Very odd that you would die after getting that close, while > others of us don't seem to be having any trouble. Note that thanks to the Nachi worm and it's ilk, a number of ISPs have been putting in all sorts of wacky filtering of ICMP traffic. Some versions of traceroute use ICMP packets, so it's quite possible that you're just seeing collateral damage from attempts to block/slow worms. That's assuming the problem is now _just_ ICMP, not the site itself... From adam.walker at rbwconsulting.com Mon Sep 29 13:06:07 2003 From: adam.walker at rbwconsulting.com (Adam Walker) Date: Mon Sep 29 13:06:18 2003 Subject: [spambayes-dev] Web Interface Help In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F130212AF96@its-xchg4.massey.ac.nz> Message-ID: <20030929170614.D14FC13E241@sack.dreamhost.com> I remember looking at a psp file I found in that directory a long time ago. The icons were just "wingding" font or some similar symbol font. > -----Original Message----- > There's a 'help' gif that goes with it, but it doesn't really match the > other ones. If whoever made those (Richie?) could make a matching help > one, > that would be great :) From popiel at wolfskeep.com Mon Sep 29 13:25:25 2003 From: popiel at wolfskeep.com (T. Alexander Popiel) Date: Mon Sep 29 13:25:28 2003 Subject: [spambayes-dev] Slightly new tactic Message-ID: <20030929172525.830EB2DE7F@cashew.wolfskeep.com> I just noticed a new variation on an old spammer tactic in my unsure folder: in HTML mail, text colors which almost but not quite match the background color. I know that matching the background exactly is an old tactic, but this spam seemed to have a random adjustment to the font color, by plus or minus 3 quanta on the blue scale. The mail still showed up as high unsure, and I don't think we bother with trying to determine which text in a multipart or HTML message is actually visible; however, if we ever do try to figure that out, then we may want to put some wiggle room on the color-matching. - Alex From tim.one at comcast.net Mon Sep 29 13:53:26 2003 From: tim.one at comcast.net (Tim Peters) Date: Mon Sep 29 13:53:36 2003 Subject: [spambayes-dev] Slightly new tactic In-Reply-To: <20030929172525.830EB2DE7F@cashew.wolfskeep.com> Message-ID: <LNBBLJKPBEHFEDALKOLCOEHIGDAB.tim.one@comcast.net> [T. Alexander Popiel] > I just noticed a new variation on an old spammer tactic > in my unsure folder: in HTML mail, text colors which > almost but not quite match the background color. I know > that matching the background exactly is an old tactic, > but this spam seemed to have a random adjustment to the > font color, by plus or minus 3 quanta on the blue scale. > > The mail still showed up as high unsure, and I don't > think we bother with trying to determine which text in > a multipart or HTML message is actually visible; We do when it's easy. For example, we strip out HTML tags, the contents of HTML comments, and the contents of <style> sections. We don't do anything with color now, not even record when a color attribute is present. Matching foreground against background colors would require real parsing to model the structure of the document, and we don't do anything like that now. To do a good job of it, we'd have to parse style sheets too. > however, if we ever do try to figure that out, then we may want > to put some wiggle room on the color-matching. Yup. From ta-meyer at ihug.co.nz Tue Sep 30 00:56:05 2003 From: ta-meyer at ihug.co.nz (Tony Meyer) Date: Tue Sep 30 00:56:13 2003 Subject: [spambayes-dev] Logging Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F13026F2976@its-xchg4.massey.ac.nz> A while back I mentioned that it would be nice to have various levels of logging rather than just the True/False globals:verbose setting that there is now. Skip mentioned that if we're going to do that then we might as well use the logging module, although it would break 2.2 compatibility. Mark already has some logging code in both the Outlook plug-in and pop3proxy_service/tray for when they are run as binaries/a service. Would anyone be opposed if we did start using the logging module in places? How incompatible is it with 2.2? I've used it with 2.2.3 (but only logging to stdout, to a file, and to a rotating file), but not earlier, and we currently support 2.2 (or so we say ;). (Obviously for the 2.2.3 compatibility, there needs to be a compatLogging module like the compatSets one we already have.) =Tony Meyer From skip at pobox.com Tue Sep 30 09:30:26 2003 From: skip at pobox.com (Skip Montanaro) Date: Tue Sep 30 09:30:38 2003 Subject: [spambayes-dev] Logging In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F13026F2976@its-xchg4.massey.ac.nz> References: <1ED4ECF91CDED24C8D012BCF2B034F13026F2976@its-xchg4.massey.ac.nz> Message-ID: <16249.34162.262904.627817@montanaro.dyndns.org> Tony> (Obviously for the 2.2.3 compatibility, there needs to be a Tony> compatLogging module like the compatSets one we already have.) Is that necessarily the case? The sets module uses itertools, so it's difficult to backport it to 2.2.x as-is. I'm not so sure about the logging module. Maybe it works out-of-the-box on 2.2.x, so we could just deliver it (or a very slightly modified version of it) with SpamBayes and check in setup.py whether or not to install it. Skip From Remi.Ricard at simlog.com Tue Sep 30 10:18:48 2003 From: Remi.Ricard at simlog.com (Remi Ricard) Date: Tue Sep 30 10:17:29 2003 Subject: [spambayes-dev] About Bug 814322 Message-ID: <3F7990C8.4010001@simlog.com> Hi, Tony added a patch for the bug 814322 This is his comment: > Basically the review page would die if one of the cached messages was moved from > the cache directory by someone other than spambayes (for example a virus > protection program). Instead, we ignore the problem and just don't present that message > for review. I don't use a virus protection program or anything else and I see the error message often. It is possible that this error can be caused by the fact that spambayes is not able to separate the mime part of a message (to generate file 123456 and 123456-1) so when training there is a error since the file 123456-1 (the mime part) is not found ?? Remi papaDoc@videotron.ca From popiel at wolfskeep.com Tue Sep 30 12:53:47 2003 From: popiel at wolfskeep.com (T. Alexander Popiel) Date: Tue Sep 30 12:53:52 2003 Subject: [spambayes-dev] Logging In-Reply-To: Message from "Tony Meyer" <ta-meyer@ihug.co.nz> of "Tue, 30 Sep 2003 16:56:05 +1200." <1ED4ECF91CDED24C8D012BCF2B034F13026F2976@its-xchg4.massey.ac.nz> References: <1ED4ECF91CDED24C8D012BCF2B034F13026F2976@its-xchg4.massey.ac.nz> Message-ID: <20030930165347.ABF072DDC3@cashew.wolfskeep.com> In message: <1ED4ECF91CDED24C8D012BCF2B034F13026F2976@its-xchg4.massey.ac.nz> "Tony Meyer" <ta-meyer@ihug.co.nz> writes: >A while back I mentioned that it would be nice to have various levels of >logging rather than just the True/False globals:verbose setting that >there is now. Skip mentioned that if we're going to do that then we >might as well use the logging module, although it would break 2.2 >compatibility. Mark already has some logging code in both the Outlook >plug-in and pop3proxy_service/tray for when they are run as binaries/a >service. > >Would anyone be opposed if we did start using the logging module in >places? How incompatible is it with 2.2? I've used it with 2.2.3 (but >only logging to stdout, to a file, and to a rotating file), but not >earlier, and we currently support 2.2 (or so we say ;). > >(Obviously for the 2.2.3 compatibility, there needs to be a >compatLogging module like the compatSets one we already have.) > >=Tony Meyer It should be noted that 2.2.1 is the most recent python packaged for Debian stable (which at least some of your users (me! me!) run). Are there some simple tests that I can do to determine how much logging support exists there? - Alex From skip at pobox.com Tue Sep 30 13:00:04 2003 From: skip at pobox.com (Skip Montanaro) Date: Tue Sep 30 13:00:17 2003 Subject: [spambayes-dev] Logging In-Reply-To: <20030930165347.ABF072DDC3@cashew.wolfskeep.com> References: <1ED4ECF91CDED24C8D012BCF2B034F13026F2976@its-xchg4.massey.ac.nz> <20030930165347.ABF072DDC3@cashew.wolfskeep.com> Message-ID: <16249.46740.636638.581709@montanaro.dyndns.org> Alex> It should be noted that 2.2.1 is the most recent python packaged Alex> for Debian stable (which at least some of your users (me! me!) Alex> run). Are there some simple tests that I can do to determine how Alex> much logging support exists there? I suspect import logging will fail. You might try grabbing the 2.3 logging module and tests from CVS and see if they run okay. If so, I'd be all for including the logging module with the SpamBayes release. Skip From anthony at interlink.com.au Tue Sep 30 13:13:36 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Tue Sep 30 13:15:33 2003 Subject: [spambayes-dev] Logging In-Reply-To: <20030930165347.ABF072DDC3@cashew.wolfskeep.com> Message-ID: <200309301713.h8UHDajn012637@localhost.localdomain> >>> "T. Alexander Popiel" wrote > It should be noted that 2.2.1 is the most recent python packaged for > Debian stable (which at least some of your users (me! me!) run). > Are there some simple tests that I can do to determine how much > logging support exists there? python -c "import logging" Anthony From popiel at wolfskeep.com Tue Sep 30 13:33:35 2003 From: popiel at wolfskeep.com (T. Alexander Popiel) Date: Tue Sep 30 13:33:38 2003 Subject: [spambayes-dev] Logging In-Reply-To: Message from Skip Montanaro <skip@pobox.com> of "Tue, 30 Sep 2003 12:00:04 CDT." <16249.46740.636638.581709@montanaro.dyndns.org> References: <1ED4ECF91CDED24C8D012BCF2B034F13026F2976@its-xchg4.massey.ac.nz> <20030930165347.ABF072DDC3@cashew.wolfskeep.com> <16249.46740.636638.581709@montanaro.dyndns.org> Message-ID: <20030930173335.9526F2DDC3@cashew.wolfskeep.com> In message: <16249.46740.636638.581709@montanaro.dyndns.org> Skip Montanaro <skip@pobox.com> writes: > > Alex> It should be noted that 2.2.1 is the most recent python packaged > Alex> for Debian stable (which at least some of your users (me! me!) > Alex> run). Are there some simple tests that I can do to determine how > Alex> much logging support exists there? > >I suspect > > import logging > >will fail. Aye, this breaks. >You might try grabbing the 2.3 logging module and tests from CVS >and see if they run okay. If so, I'd be all for including the logging >module with the SpamBayes release. I'll try this... - Alex From skip at pobox.com Tue Sep 30 16:12:22 2003 From: skip at pobox.com (Skip Montanaro) Date: Tue Sep 30 16:13:02 2003 Subject: [spambayes-dev] Re: [Spambayes] Error from pop3proxy In-Reply-To: <000301c3878c$cf2cd960$6401a8c0@home.co.uk> References: <000301c3878c$cf2cd960$6401a8c0@home.co.uk> Message-ID: <16249.58278.674451.186755@montanaro.dyndns.org> Alan> I got this error from pop3proxy running on my W2K server :- Alan> error: uncaptured python exception, closing channel Alan> <sb_smtpproxy.BayesSMTPProxy connected 192.168.1.100:1571 at Alan> 0x1833aa8> (bsddb._db.DBRunRecoveryError:(-3098 2, Alan> 'DB_RUNRECOVERY: Fatal error, run database recovery -- fatal Alan> region error detected; run recovery') A number of people have seen this. It appears to be related to incorrect locking of the database in multi-threaded applications. If you're not a programmer that may not make sense to you. The simplest way to work around that may be to switch to the Pickle form of the database. However, before attempting that switch, let us know what version of SpamBayes you are running. Some bugs in the configuration saving code were fixed only recently. Tony's the expert in this area. For the spambayes-dev folks: If we're not properly mediating access to the on-disk version of the DBDictClassifier, we're probably not mediating access to the in-memory PickledClassifier either. Maybe all we need to do is add a little bit of locking to the Classifier class or its subclasses. Skip From alan at mullen.demon.co.uk Tue Sep 30 16:17:35 2003 From: alan at mullen.demon.co.uk (Alan Campbell) Date: Tue Sep 30 16:17:42 2003 Subject: [spambayes-dev] RE: [Spambayes] Error from pop3proxy In-Reply-To: <16249.58278.674451.186755@montanaro.dyndns.org> Message-ID: <000001c3878f$e30255c0$6401a8c0@home.co.uk> Thanks for the quick reply. Its OK, I am a porogrammer, so I do understand what multi-threading is. I am using version 1.06a, which I downloaded a couple of weeks ago. I will try using a pickle. Great product. I am very impressed with it so far. --- mailto:alan@mullen.demon.co.uk http://www.mullen.demon.co.uk/ -----Original Message----- From: Skip Montanaro [mailto:skip@pobox.com] Sent: 30 September 2003 21:12 To: alan@mullen.demon.co.uk Cc: spambayes@python.org; spambayes-dev@python.org Subject: Re: [Spambayes] Error from pop3proxy Alan> I got this error from pop3proxy running on my W2K server :- Alan> error: uncaptured python exception, closing channel Alan> <sb_smtpproxy.BayesSMTPProxy connected 192.168.1.100:1571 at Alan> 0x1833aa8> (bsddb._db.DBRunRecoveryError:(-3098 2, Alan> 'DB_RUNRECOVERY: Fatal error, run database recovery -- fatal Alan> region error detected; run recovery') A number of people have seen this. It appears to be related to incorrect locking of the database in multi-threaded applications. If you're not a programmer that may not make sense to you. The simplest way to work around that may be to switch to the Pickle form of the database. However, before attempting that switch, let us know what version of SpamBayes you are running. Some bugs in the configuration saving code were fixed only recently. Tony's the expert in this area. For the spambayes-dev folks: If we're not properly mediating access to the on-disk version of the DBDictClassifier, we're probably not mediating access to the in-memory PickledClassifier either. Maybe all we need to do is add a little bit of locking to the Classifier class or its subclasses. Skip From tim.one at comcast.net Tue Sep 30 20:41:23 2003 From: tim.one at comcast.net (Tim Peters) Date: Tue Sep 30 20:41:43 2003 Subject: [spambayes-dev] RE: [Spambayes] Error from pop3proxy In-Reply-To: <16249.58278.674451.186755@montanaro.dyndns.org> Message-ID: <LNBBLJKPBEHFEDALKOLCKECHGEAB.tim.one@comcast.net> [Skip] > ... > For the spambayes-dev folks: If we're not properly mediating access > to the on-disk version of the DBDictClassifier, we're probably not > mediating access to the in-memory PickledClassifier either. Maybe > all we need to do is add a little bit of locking to the Classifier > class or its subclasses. We don't do any locking in storage.py, mainly because that wasn't intended to be thread-safe. Before trying to make it thread-safe, someone has to identify the specific use cases in which the API has to support concurrent access. I really don't know what they may be, and low-level locking can be very expensive. If it amounts to no more than making training mutually exclusive with scoring, then some gross locks at a higher level would be a lot cheaper. But, so far, nobody has identified a specific sequence of actions leading to DBRunRecoveryError. I'll speculate about one possible problem with Berkeley: if it isn't shut down cleanly, DBRunRecoveryError may well be an *expected* exception when you next start it, and running recovery at such times would then be a normal part of using Berkeley. Until we know what's triggering DBRunRecoveryError, I'm just as inclined to believe it can't be fixed without incorporating recovery as I am to believe it's due to a thread race.