From sethg at GoodmanAssociates.com Fri Dec 3 01:08:10 2004 From: sethg at GoodmanAssociates.com (Seth Goodman) Date: Fri Dec 3 01:08:10 2004 Subject: [spambayes-dev] require subscription? Message-ID: I wonder if anyone else is willing to reconsider the decision to operate this list without requiring subscribers to register? Spambayes does keep the spam on the list out of my inbox, but unfortunately, virtually all of it winds up in Unsure, so I have to go through it all anyway. Despite training on all of these unsures, some spam, some ham, Spambayes still puts quite a few posts in the Unsure folder. In addition, this tends to skew my training set because the Spambayes lists are the only two lists I'm on that don't require registration. All of my other list traffic is taken care of by Outlook rules and doesn't need to go through Spambayes. This makes for a much more focused training set, since Spambayes list traffic looks very little like my normal correspondence. I don't feel that the Mailman subscribe/confirmation process would really scare anybody away, as virtually all lists do this, and for exactly the same reason. Why give spammers this free multiplier? I'm sure the other list participants would understand and be happy for the spam reduction. At the very least, there is no excuse for having Spambayes-DEV not require registration, but I would argue for requiring it for both lists. -- Seth Goodman From tameyer at ihug.co.nz Fri Dec 3 01:34:30 2004 From: tameyer at ihug.co.nz (Tony Meyer) Date: Fri Dec 3 01:35:08 2004 Subject: [spambayes-dev] require subscription? In-Reply-To: Message-ID: > I wonder if anyone else is willing to reconsider the decision > to operate this list without requiring subscribers to > register? Requiring registration for spambayes@python.org would break the ability to automatically submit a problem report that sb_server offers (and Outlook may offer in the future). I don't have the data, but I suspect that a great many of the people that report bugs on spambayes@python.org are not subscribers, and they are the people that need help (and many of them would probably find subscribing difficult, given that what else they find difficult), and we need to know where the bugs are to try and fix them. In addition, there are different levels of response to a non-member posting: accept (as now), hold, reject, discard. Are you advocating hold or reject? If it's hold, then that means that someone (the way it's set up now, Tim, TimS, or Barry) needs to go through the held messages and approve/reject them. I doubt any of those people have the time or inclination to do that. I could find the time, but it would be time that I would otherwise spend answering questions, which I think is more worthwhile. All of this is in FAQ 5.5, BTW: > I don't feel that the Mailman subscribe/confirmation process > would really scare anybody away, I do. > as virtually all lists do > this, and for exactly the same reason. Why give spammers > this free multiplier? I'm sure the other list participants > would understand and be happy for the spam reduction. At the > very least, there is no excuse for having Spambayes-DEV not > require registration, but I would argue for requiring it for > both lists. I wouldn't care if spambayes-dev required membership in order to post, but I'm definitely -1 for spambayes@python.org. OTOH, does spambayes-dev even get any spam? I don't recall any offhand (and tend not to bother filtering any of the spambayes lists). =Tony.Meyer From sethg at GoodmanAssociates.com Fri Dec 3 08:32:13 2004 From: sethg at GoodmanAssociates.com (Seth Goodman) Date: Fri Dec 3 08:32:15 2004 Subject: [spambayes-dev] require subscription? In-Reply-To: Message-ID: > From: Tony Meyer > Sent: Thursday, December 02, 2004 6:35 PM > > > > I wonder if anyone else is willing to reconsider the decision > > to operate this list without requiring subscribers to > > register? > > Requiring registration for spambayes@python.org would break the ability to > automatically submit a problem report that sb_server offers (and > Outlook may offer in the future). Automatic posts to the list makes it unwise to require subscription. I didn't realize you had that feature as I use the Outlook plug-in. <...> > > In addition, there are different levels of response to a > non-member posting: accept (as now), hold, reject, discard. Are you > advocating hold or reject? Reject with message containing subscribe link, but that would irritate someone who is trying to do their first automatic problem report while they are having the problem. <...> > > I don't feel that the Mailman subscribe/confirmation process > > would really scare anybody away, > > I do. Your list, your rules. I think I'll follow your advise and stop filtering the lists. -- Seth Goodman From barry at python.org Fri Dec 3 16:09:56 2004 From: barry at python.org (Barry Warsaw) Date: Fri Dec 3 16:09:58 2004 Subject: [spambayes-dev] require subscription? In-Reply-To: References: Message-ID: <1102086596.9174.67.camel@presto.wooz.org> On Fri, 2004-12-03 at 02:32, Seth Goodman wrote: > Automatic posts to the list makes it unwise to require subscription. I > didn't realize you had that feature as I use the Outlook plug-in. You can whitelist the auto-posting address though. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 307 bytes Desc: This is a digitally signed message part Url : http://mail.python.org/pipermail/spambayes-dev/attachments/20041203/6bf7c325/attachment.pgp From skip at pobox.com Fri Dec 3 16:51:16 2004 From: skip at pobox.com (Skip Montanaro) Date: Fri Dec 3 16:50:25 2004 Subject: [spambayes-dev] require subscription? In-Reply-To: References: Message-ID: <16816.35700.50598.735456@montanaro.dyndns.org> Tony> Requiring registration for spambayes@python.org would break the Tony> ability to automatically submit a problem report that sb_server Tony> offers (and Outlook may offer in the future). I don't have the Tony> data, but I suspect that a great many of the people that report Tony> bugs on spambayes@python.org are not subscribers, and they are the Tony> people that need help (and many of them would probably find Tony> subscribing difficult, given that what else they find difficult), Tony> and we need to know where the bugs are to try and fix them. Agreed. I find it irritating as hell that if I have the every once in awhile problem with some tool that I have to go through the subscribe/confirm dance before I can ask "I upgraded foo on my bar platform, now it doesn't work, why?". Tony> In addition, there are different levels of response to a Tony> non-member posting: accept (as now), hold, reject, discard. Are Tony> you advocating hold or reject? If it's hold, then that means that Tony> someone (the way it's set up now, Tim, TimS, or Barry) needs to go Tony> through the held messages and approve/reject them. I doubt any of Tony> those people have the time or inclination to do that. I do it for several other lists. It's not an overwhelming job (I have a mailman front-end script I use to condense the review page), but it would be much more difficult for spambayes since the list isn't filtered. >> I don't feel that the Mailman subscribe/confirmation process would >> really scare anybody away, Tony> I do. "scare" is maybe not the right verb. I think most people are too busy to bother subscribing just to ask a question. Skip From sethg at GoodmanAssociates.com Fri Dec 3 16:54:57 2004 From: sethg at GoodmanAssociates.com (Seth Goodman) Date: Fri Dec 3 16:54:59 2004 Subject: [spambayes-dev] require subscription? In-Reply-To: <1102086596.9174.67.camel@presto.wooz.org> Message-ID: > From: Barry Warsaw > Sent: Friday, December 03, 2004 9:10 AM > > > On Fri, 2004-12-03 at 02:32, Seth Goodman wrote: > > > Automatic posts to the list makes it unwise to require subscription. I > > didn't realize you had that feature as I use the Outlook plug-in. > > You can whitelist the auto-posting address though. OK, then they'll spam that address, though it will take a while to get discovered. -- Seth Goodman From tim.peters at gmail.com Fri Dec 3 17:23:50 2004 From: tim.peters at gmail.com (Tim Peters) Date: Fri Dec 3 17:23:52 2004 Subject: [spambayes-dev] require subscription? In-Reply-To: References: Message-ID: <1f7befae04120308231d4b63a3@mail.gmail.com> [Tony Meyer] > ... > I wouldn't care if spambayes-dev required membership in order to > post, Me neither. > but I'm definitely -1 for spambayes@python.org. Me too. The spambayes user list gets messages from people whose technical knowledge approaches 0 (and, sometimes, appears to approach 0 from the negative side ). It's fine to require subscriptions on tech lists, but the users list isn't a tech list. > OTOH, does spambayes-dev even get any spam? I don't recall > any offhand (and tend not to bother filtering any of the spambayes > lists). I don't know -- which at least implies that the spam rate on spambayes-dev is too low for either of us to have noticed it. From barry at python.org Fri Dec 3 17:25:44 2004 From: barry at python.org (Barry Warsaw) Date: Fri Dec 3 17:25:54 2004 Subject: [spambayes-dev] require subscription? In-Reply-To: <16816.35700.50598.735456@montanaro.dyndns.org> References: <16816.35700.50598.735456@montanaro.dyndns.org> Message-ID: <1102091143.340.23.camel@geddy.wooz.org> On Fri, 2004-12-03 at 10:51, Skip Montanaro wrote: > "scare" is maybe not the right verb. I think most people are too busy to > bother subscribing just to ask a question. BTW, I tend to agree. I sure wish Mailman did a mail-back confirmation for non-member postings. I also wish I knew someone who could do something about that . -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 307 bytes Desc: This is a digitally signed message part Url : http://mail.python.org/pipermail/spambayes-dev/attachments/20041203/757a5484/attachment.pgp From tim.peters at gmail.com Fri Dec 3 17:50:48 2004 From: tim.peters at gmail.com (Tim Peters) Date: Fri Dec 3 17:50:51 2004 Subject: [spambayes-dev] require subscription? In-Reply-To: <1102091143.340.23.camel@geddy.wooz.org> References: <16816.35700.50598.735456@montanaro.dyndns.org> <1102091143.340.23.camel@geddy.wooz.org> Message-ID: <1f7befae041203085011ce00f3@mail.gmail.com> [Barry Warsaw ] > BTW, I tend to agree. I sure wish Mailman did a mail-back > confirmation for non-member postings. I also wish I knew > someone who could do something about that . I've been thinking about that, and have come to believe that the difficulty is mostly due to your over-complicating the issue. All we really need to do is change Mailman to send every message it receives to barry@python.org, instead of bothering list owners and subscribers with them. Then you can personally review each one, and approve, reject, or discard, as appropriate. The only approved sender address on all lists would be hard-coded to barry@python.org, and you would resend the messages you deemed worthy of subscriber attention. Since you already have GnuPG configured, it will be easy for Mailman to confirm that you really sent such messages. Even better, users could send their messages to barry@python.org to begin with! That will make Mailman even simpler. It will also save users considerable time and embarrassment. For example, sometimes I really can't tell whether a segfault in Zope is due to a Python bug, a Zope bug, or a bug in platform-supplied software. It's embarrassing to post about it to what turns out to be a wrong list, and that wastes a lot of peoples' time. Of course barry@python.org could figure out the right list to send it to, making life easier for everyone. Even better, despite that, e.g., I do want to subscribe to various Zope lists, I don't want to see a lot of the messages they carry. If I could subscribe to barry@python.org instead, barry@python.org could figure out which messages to send me, based on feedback I generously supply to barry@python.org about messages barry@python.org sent me that I didn't really want to see. In the end, barry@python.org will be the only address anyone needs to know, for list subscriptions, and for sending messages to lists. Mailman can be thrown away entirely, and presto! All the problems people have with Mailman will go away along with it. Best of all, we could start doing it today! In fact, I think I'll send this message to barry@python.org, to kick it off. From barry at python.org Fri Dec 3 18:55:40 2004 From: barry at python.org (Barry Warsaw) Date: Fri Dec 3 18:55:45 2004 Subject: [Fwd: Re: [spambayes-dev] require subscription?] Message-ID: <1102096540.333.39.camel@geddy.wooz.org> This is an automated message. I am on permanent vacation and will probably not read your message. However, this auto-posting address now uses Spambayes which is a virtually flawless system for separating legitimate email (called "ham") from spam. Spambayes has determined that this message is: SPAM (100% confidence) Thus it has been REJECTED If you believe that this classification is in error, do not reply to this address, since you are probably wrong, and your message will be discarded. Instead, you can email Tim Peters since he is solely and personally responsible for any and all misclassifications. Tim has agreed to indemnify all users of Spambayes and to immediately and diligently correct all incorrect determinations of a message's spamminess. He swears he will not sleep until this error is fixed. -------------- next part -------------- An embedded message was scrubbed... From: Tim Peters Subject: Re: [spambayes-dev] require subscription? Date: Fri, 3 Dec 2004 11:50:48 -0500 Size: 4606 Url: http://mail.python.org/pipermail/spambayes-dev/attachments/20041203/52bda58b/attachment.mht From skip at pobox.com Fri Dec 3 22:08:35 2004 From: skip at pobox.com (Skip Montanaro) Date: Fri Dec 3 22:07:32 2004 Subject: [spambayes-dev] require subscription? In-Reply-To: <1102091143.340.23.camel@geddy.wooz.org> References: <16816.35700.50598.735456@montanaro.dyndns.org> <1102091143.340.23.camel@geddy.wooz.org> Message-ID: <16816.54739.177425.140284@montanaro.dyndns.org> Barry> I sure wish Mailman did a mail-back confirmation for non-member Barry> postings. I also wish I knew someone who could do something Barry> about that . Don't mail-back confirmations just help overload the net? Maybe I misunderstand what the term means. Skip From barry at python.org Fri Dec 3 22:09:54 2004 From: barry at python.org (Barry Warsaw) Date: Fri Dec 3 22:09:57 2004 Subject: [spambayes-dev] require subscription? In-Reply-To: <16816.54739.177425.140284@montanaro.dyndns.org> References: <16816.35700.50598.735456@montanaro.dyndns.org> <1102091143.340.23.camel@geddy.wooz.org> <16816.54739.177425.140284@montanaro.dyndns.org> Message-ID: <1102108194.340.72.camel@geddy.wooz.org> On Fri, 2004-12-03 at 16:08, Skip Montanaro wrote: > Barry> I sure wish Mailman did a mail-back confirmation for non-member > Barry> postings. I also wish I knew someone who could do something > Barry> about that . > > Don't mail-back confirmations just help overload the net? Maybe I > misunderstand what the term means. E.g. gmane. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 307 bytes Desc: This is a digitally signed message part Url : http://mail.python.org/pipermail/spambayes-dev/attachments/20041203/c424ef7b/attachment.pgp From popiel at wolfskeep.com Fri Dec 3 23:10:54 2004 From: popiel at wolfskeep.com (T. Alexander Popiel) Date: Fri Dec 3 23:11:08 2004 Subject: [spambayes-dev] require subscription? In-Reply-To: Message from "Seth Goodman" of "Fri, 03 Dec 2004 09:54:57 CST." References: Message-ID: <20041203221054.042742DFF6@cashew.wolfskeep.com> In message: "Seth Goodman" writes: >> From: Barry Warsaw >> Sent: Friday, December 03, 2004 9:10 AM >> >> >> On Fri, 2004-12-03 at 02:32, Seth Goodman wrote: >> >> > Automatic posts to the list makes it unwise to require subscription. I >> > didn't realize you had that feature as I use the Outlook plug-in. >> >> You can whitelist the auto-posting address though. > >OK, then they'll spam that address, though it will take a while to get >discovered. Y'know, considering who we are, I'm surprised that nobody's suggested adding a basic (content) pattern matcher for whitelisting in Mailman. After all, spam is very unlikely to look like one of our automated bug report forms. - Alex From kennypitt at hotmail.com Fri Dec 3 23:12:36 2004 From: kennypitt at hotmail.com (Kenny Pitt) Date: Fri Dec 3 23:13:04 2004 Subject: [spambayes-dev] Enhanced Outlook statistics display Message-ID: I've done some work on displaying more detailed statistics in the Outlook plugin. Attached is a screenshot of my current statistics. You'll notice that I've also added some accuracy stats to the stats that have already been discussed. These are based on the "spam batting average" concept recently proposed by John Graham-Cumming (www.jgc.org) wherein false positives are measured as a percentage of the number of ham messages instead as a percentage of the total. If noone has any issues with this format, I'll go ahead and check in the changes next week. -- Kenny Pitt -------------- next part -------------- A non-text attachment was scrubbed... Name: SpamBayes-OutlookStatistics.png Type: image/png Size: 13329 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes-dev/attachments/20041203/766b820b/SpamBayes-OutlookStatistics.png From anthony at interlink.com.au Sat Dec 4 16:42:32 2004 From: anthony at interlink.com.au (Anthony Baxter) Date: Sat Dec 4 16:42:55 2004 Subject: [spambayes-dev] require subscription? In-Reply-To: <1f7befae041203085011ce00f3@mail.gmail.com> References: <1102091143.340.23.camel@geddy.wooz.org> <1f7befae041203085011ce00f3@mail.gmail.com> Message-ID: <200412050242.33527.anthony@interlink.com.au> On Saturday 04 December 2004 03:50, Tim Peters wrote: > In the end, barry@python.org will be the only address anyone needs to > know, for list subscriptions, and for sending messages to lists. > Mailman can be thrown away entirely, and presto! All the problems > people have with Mailman will go away along with it. The problem is that I hear the internals of barry@python.org are a nightmare to maintain. Hardly anyone wants to touch them. And the project that forked from barry@python.org in an attempt to make a more maintainable version is still several years from being mature enough to be useful. From barry at python.org Sun Dec 5 04:55:34 2004 From: barry at python.org (Barry Warsaw) Date: Sun Dec 5 04:55:40 2004 Subject: [spambayes-dev] Re: spambayes setup.py,1.30,1.31 In-Reply-To: References: Message-ID: <41B286B6.8090901@python.org> Tony Meyer wrote: > Update of /cvsroot/spambayes/spambayes > In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv29241 > > Modified Files: > setup.py > Log Message: > Update PyPI details and an error message. > > For convenience, when doing an sdist, get the script to print out an MD5 checksum > and the size of the created archive(s) for us. If only I could figure a way to get > Inno to do this, too ;) > > Index: setup.py > =================================================================== > RCS file: /cvsroot/spambayes/spambayes/setup.py,v > retrieving revision 1.30 > retrieving revision 1.31 > diff -C2 -d -r1.30 -r1.31 This revision broke "python setup.py install" -- checked with both Python 2.3 and 2.4: % python setup.py install running install running build running build_py running build_scripts running install_lib running install_scripts Traceback (most recent call last): File "setup.py", line 131, in ? classifiers = [ File "/usr/local/lib/python2.4/distutils/core.py", line 149, in setup dist.run_commands() File "/usr/local/lib/python2.4/distutils/dist.py", line 946, in run_commands self.run_command(cmd) File "/usr/local/lib/python2.4/distutils/dist.py", line 966, in run_command cmd_obj.run() File "/usr/local/lib/python2.4/distutils/command/install.py", line 505, in run self.run_command(cmd_name) File "/usr/local/lib/python2.4/distutils/cmd.py", line 333, in run_command self.distribution.run_command(command) File "/usr/local/lib/python2.4/distutils/dist.py", line 966, in run_command cmd_obj.run() File "setup.py", line 74, in run return parent.run(self) TypeError: unbound method run() must be called with sdist instance as first argument (got install_scripts instance instead) Reverting to setup.py 1.30 fixes the problem. -Barry From tameyer at ihug.co.nz Mon Dec 6 04:05:41 2004 From: tameyer at ihug.co.nz (Tony Meyer) Date: Mon Dec 6 04:07:09 2004 Subject: [spambayes-dev] Re: spambayes setup.py,1.30,1.31 In-Reply-To: Message-ID: > This revision broke "python setup.py install" -- checked with both > Python 2.3 and 2.4: Sorry, my bad. v1.32 should fix this (when copy & pasting code I reused a global variable name which was not the thing to do...). =Tony.Meyer From pjf at thinkage.ca Mon Dec 6 19:07:17 2004 From: pjf at thinkage.ca (Peter Fraser) Date: Mon Dec 6 19:23:10 2004 Subject: [spambayes-dev] Small Company Problem Message-ID: <887691AAF56A5C4A803F6850D35C089D3A7992@cerveau.thoughts.thinkage.ca> I work in a small company, as a result we do not send very many emails to each other. And we have had email address for many many years. To add interest to the spam, the spammer often use employee's names and email address. The net result is that spambayes decides that these words are bad! So short email sent to employees in the company end up being classed as Spam or Maybe Spam. Training doesn't help. The true spam always overwhelms. What I think would help, would be the ability do define ignore words. That way defining an employee's name as an ignore word, would allow spambayes to classify a message based on the rest of the context. From nas at arctrix.com Mon Dec 6 21:38:18 2004 From: nas at arctrix.com (Neil Schemenauer) Date: Mon Dec 6 21:38:22 2004 Subject: [spambayes-dev] Small Company Problem In-Reply-To: <887691AAF56A5C4A803F6850D35C089D3A7992@cerveau.thoughts.thinkage.ca> References: <887691AAF56A5C4A803F6850D35C089D3A7992@cerveau.thoughts.thinkage.ca> Message-ID: <20041206203818.GA31280@mems-exchange.org> On Mon, Dec 06, 2004 at 01:07:17PM -0500, Peter Fraser wrote: > To add interest to the spam, the spammer often use > employee's names and email address. The net result > is that spambayes decides that these words are bad! > > So short email sent to employees in the company > end up being classed as Spam or Maybe Spam. Training > doesn't help. The true spam always overwhelms. Hi Peter, Those words should appear in both ham and spam messages and therefore should have a neutral score. It sounds like you may have a ham/spam imbalance in your training set (e.g. more spam messages). For best results, the number of spam messages in the training set should be equal to the number of ham messages. Neil From johan.vandecasteele at minoc.com Mon Dec 6 14:57:21 2004 From: johan.vandecasteele at minoc.com (Johan Vandecasteele | Minoc) Date: Mon Dec 6 23:32:38 2004 Subject: [spambayes-dev] URGENT: covermount cd-rom Clickx Magazine Message-ID: <200412061354.iB6DsAbZ032130@outmx012.isp.belgacom.be> Hello, I'm Johan Vandecasteele, marketingmanager of Belgium's leading computer consumer magazines Clickx Magazine and PC Magazine. We are including a free covermount cd-rom with our Clickx Magazine (January 11 2005), and we would like to include freeware anti-spam software SpamBayes on that cd. Can you please tell us if we can include the software on our cd-rom. Please visit our website www.clickxmagazine.be for information on our magazine. The download link to our mediakit (Dutch of French only - sorry) is http://sales.minoc.com/newsletter/Mediakit Minoc Business Press 2004 FR.pdf - it will give you some idea of our company and portfolio. Hope to hear from you soon, Kind regards, Johan Vandecasteele ------------------------------------------ Johan Vandecasteele marketingmanager Clickx Magazine / PC Magazine / Smart Business Strategies johan.vandecasteele@minoc.com Tel: +32 14/44.20.61 Fax: +32 14/44.20.66 http://www.clickxmagazine.be/ http://www.pcmagazine.be/ http://www.zdnet.be/ http://www.gamespot.be/ Minoc Business Press NV Everdongenlaan 15 bus 1 B - 2300 Turnhout -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes-dev/attachments/20041206/09f2fcf6/attachment.html From tameyer at ihug.co.nz Tue Dec 7 01:47:17 2004 From: tameyer at ihug.co.nz (Tony Meyer) Date: Tue Dec 7 01:47:58 2004 Subject: [spambayes-dev] URGENT: covermount cd-rom Clickx Magazine In-Reply-To: Message-ID: > Can you please tell us if we can include the software on > our cd-rom. Yes. For more details see the LICENSE.txt file included with the distribution. You're probably after part #2: """ 2. Subject to the terms and conditions of this License Agreement, PSF hereby grants Licensee a nonexclusive, royalty-free, world-wide license to reproduce, analyze, test, perform and/or display publicly, prepare derivative works, distribute, and otherwise use the Software alone or in any derivative version, provided, however, that PSF's License Agreement and PSF's notice of copyright, i.e., "Copyright (c) 2002-2004 Python Software Foundation; All Rights Reserved" are retained the Software alone or in any derivative version prepared by Licensee. """ =Tony.Meyer From tameyer at ihug.co.nz Tue Dec 7 02:25:34 2004 From: tameyer at ihug.co.nz (Tony Meyer) Date: Tue Dec 7 02:26:09 2004 Subject: [spambayes-dev] Enhanced Outlook statistics display In-Reply-To: Message-ID: > I've done some work on displaying more detailed statistics in > the Outlook plugin. Attached is a screenshot of my current > statistics. You'll notice that I've also added some accuracy > stats to the stats that have already been discussed. These > are based on the "spam batting average" concept recently > proposed by John Graham-Cumming (www.jgc.org) wherein false > positives are measured as a percentage of the number of ham > messages instead as a percentage of the total. As long as we stay away from actually expressing it as a "batting average" and avoid baseball terminology. Those numbers are just completely confusing to me, and I suspect most non-Americans. > If noone has any issues with this format, I'll go ahead and > check in the changes next week. Sorry I didn't get to this earlier, but I was +1 on checking them in anyway - looks good! =Tony.Meyer From tameyer at ihug.co.nz Tue Dec 7 02:36:26 2004 From: tameyer at ihug.co.nz (Tony Meyer) Date: Tue Dec 7 02:37:10 2004 Subject: [spambayes-dev] require subscription? In-Reply-To: Message-ID: [Skip, regarding moderation] > I do it for several other lists. It's not an overwhelming > job (I have a mailman front-end script I use to condense the > review page), but it would be much more difficult for > spambayes since the list isn't filtered. I suppose I would be willing to moderate (non-member postings to) spambayes@python.org for a while to see how it went, but I wouldn't want to be the only one doing so. (I could then see a situation where I came in in the (NZ) morning, approved various posts and then answered them, rather than them going through in the (NZ) night and someone else answering them before I even read them). So if anyone else (someone not in Australia/NZ would be best, probably) here feels that moderation is worth a go, speak up and we can see what the results are. (The consensus is definitely against requiring subscription, though, it appears). [Seth Goodman] > Automatic posts to the list makes it unwise to require > subscription. I didn't realize you had that feature as I use > the Outlook plug-in. There's some code floating round in the plug-in somewhere that's designed to be used for that, but it's just never been added. It's possible that it'll make it into 1.1 - it does increase the chance that important details (and the log) are included in the report. > Your list, your rules. Well, our list, our rules, for some suitably defined "our". > I think I'll follow your advise and stop filtering the lists. It's wasn't so much advice as just what I do, but I suspect that it will work reasonably well. There isn't so much spam (IMO) that using the good old delete key is onerous. See above, however. =Tony.Meyer From sethg at GoodmanAssociates.com Tue Dec 7 03:04:18 2004 From: sethg at GoodmanAssociates.com (Seth Goodman) Date: Tue Dec 7 03:04:27 2004 Subject: [spambayes-dev] require subscription? In-Reply-To: Message-ID: > From: Tony Meyer [mailto:tameyer@ihug.co.nz] > Sent: Monday, December 06, 2004 7:36 PM <...> > I suppose I would be willing to moderate (non-member postings to) > spambayes@python.org for a while to see how it went, but I > wouldn't want to be the only one doing so. (I could then see a > situation where I came in in the (NZ) morning, approved various > posts and then answered them, rather than them going through in > the (NZ) night and someone else answering them before I even > read them). > > So if anyone else (someone not in Australia/NZ would be best, > probably) here feels that moderation is worth a go, speak up and > we can see what the results are. I'd be willing to help out. I'm in the central U.S. -0600 time zone, so I don't know how that works out for you. -- Seth Goodman From tim.peters at gmail.com Tue Dec 7 03:05:06 2004 From: tim.peters at gmail.com (Tim Peters) Date: Tue Dec 7 03:16:45 2004 Subject: [spambayes-dev] require subscription? In-Reply-To: References: Message-ID: <1f7befae04120618057618e4e2@mail.gmail.com> [Tony Meyer] > ... > I suppose I would be willing to moderate (non-member postings to) > spambayes@python.org for a while to see how it went, but I > wouldn't want to be the only one doing so. I don't want you doing it at all, Tony. It's a minor but endless time sink, and your other contributions to this project are far more valuable. The only way moderation works on lists with non-trivial traffic is to automatically reject non-member postings, and we're both opposed to that. Else it just pisses everyone off, as no matter how sincere people are going into it, the *reality* of having messages waiting for review, all day long, all night long, 365.2425 days per year, every year, eventually wears them out. If a great hue & cry arises demanding moderation on the user list, I'll do it. But I don't see sufficient crying yet, let alone sufficient hueing . From skip at pobox.com Tue Dec 7 03:52:59 2004 From: skip at pobox.com (Skip Montanaro) Date: Tue Dec 7 03:53:23 2004 Subject: [spambayes-dev] require subscription? In-Reply-To: References: Message-ID: <16821.6923.625999.456305@montanaro.dyndns.org> Tony> I suppose I would be willing to moderate (non-member postings to) Tony> spambayes@python.org for a while to see how it went, but I Tony> wouldn't want to be the only one doing so. I'll help out. Any potential European moderators? You might want to take a look at the mmfold.py script at http://manatee.mojam.com/~skip/python/ I normally drive it with a small a shell function: mmcheck () { for url in http://mail.python.org/mailman/admindb/python-mode \ http://mail.python.org/mailman/admindb/pydotorg \ http://mail.python.org/mailman/admindb/python-dev \ http://mail.python.org/mailman/admindb/python-help \ http://mail.python.org/mailman/admindb/pythonmac-sig \ http://manatee.mojam.com/mailman/admindb/csv ; do python ~/tmp/mmfold.py $url; done } and just run that every couple days to process whatever needs attention. Skip From tim.peters at gmail.com Tue Dec 7 04:38:54 2004 From: tim.peters at gmail.com (Tim Peters) Date: Tue Dec 7 04:48:36 2004 Subject: [spambayes-dev] require subscription? In-Reply-To: <16821.6923.625999.456305@montanaro.dyndns.org> References: <16821.6923.625999.456305@montanaro.dyndns.org> Message-ID: <1f7befae041206193837f203cb@mail.gmail.com> [Skip Montanaro] > I'll help out. Any potential European moderators? > > You might want to take a look at the mmfold.py script at > > http://manatee.mojam.com/~skip/python/ > > I normally drive it with a small a shell function: > > mmcheck () > { > for url in http://mail.python.org/mailman/admindb/python-mode \ > http://mail.python.org/mailman/admindb/pydotorg \ > http://mail.python.org/mailman/admindb/python-dev \ > http://mail.python.org/mailman/admindb/python-help \ > http://mail.python.org/mailman/admindb/pythonmac-sig \ > http://manatee.mojam.com/mailman/admindb/csv ; > do > python ~/tmp/mmfold.py $url; > done > } > > and just run that every couple days to process whatever needs > attention. Skip, AFAIK, none of those lists are configured to hold all non-member posts for review. RIght? If we hold all non-member posts on the spambayes user list, "every couple days" doesn't cut it -- people often post questions there when they feel it's an emergency. The only list I moderate with a hold-all-non-member-postings policy is the PSF board mailing list, and I try to respond to holds instantly there. This interferes with sleep , but it's important. Luckily, non-members rarely have legitimate reasons to post to that list, but uncounted thousands of non-members have legit reasons to post to the SB user list. From skip at pobox.com Tue Dec 7 05:01:39 2004 From: skip at pobox.com (Skip Montanaro) Date: Tue Dec 7 05:01:49 2004 Subject: [spambayes-dev] require subscription? In-Reply-To: <1f7befae041206193837f203cb@mail.gmail.com> References: <16821.6923.625999.456305@montanaro.dyndns.org> <1f7befae041206193837f203cb@mail.gmail.com> Message-ID: <16821.11043.163281.527734@montanaro.dyndns.org> >> ... http://mail.python.org/mailman/admindb/python-mode \ >> http://mail.python.org/mailman/admindb/pydotorg \ >> http://mail.python.org/mailman/admindb/python-dev \ >> http://mail.python.org/mailman/admindb/python-help \ >> http://mail.python.org/mailman/admindb/pythonmac-sig \ >> http://manatee.mojam.com/mailman/admindb/csv ; Tim> Skip, AFAIK, none of those lists are configured to hold all Tim> non-member posts for review. RIght? Actually, all but python-help currently hold non-member posts for review. It's generally not too bad if there are two or more people moderating a list. The only two of the above that get a lot of crap are python-help and pythonmac-sig. Are we going to put spambayes itself in front of spambayes@python.org? That will trim much of the cruft. Tim> ... uncounted thousands of non-members have legit reasons to post Tim> to the SB user list. I think they've have to learn a little patience, just like the rest of us. If that's not deemed reasonable, I vote for the status quo. Skip From nas at arctrix.com Tue Dec 7 06:52:51 2004 From: nas at arctrix.com (Neil Schemenauer) Date: Tue Dec 7 06:52:55 2004 Subject: [spambayes-dev] require subscription? In-Reply-To: <1f7befae041206193837f203cb@mail.gmail.com> References: <16821.6923.625999.456305@montanaro.dyndns.org> <1f7befae041206193837f203cb@mail.gmail.com> Message-ID: <20041207055251.GA32723@mems-exchange.org> On Mon, Dec 06, 2004 at 10:38:54PM -0500, Tim Peters wrote: > Skip, AFAIK, none of those lists are configured to hold all non-member > posts for review. RIght? If we hold all non-member posts on the > spambayes user list, "every couple days" doesn't cut it -- people > often post questions there when they feel it's an emergency. I guess motivation for requiring subscriptions is to stop spam. Does spambayes-dev really get enough spam to justify that change? I don't think it does. However, my calibration may be off since I've seen the massive amount of junk mail that mail.python.org is rejecting every minute. Neil From kenny.pitt at gmail.com Tue Dec 7 16:22:08 2004 From: kenny.pitt at gmail.com (Kenny Pitt) Date: Tue Dec 7 16:22:12 2004 Subject: [spambayes-dev] Enhanced Outlook statistics display In-Reply-To: References: Message-ID: <2a052b99041207072263cff73d@mail.gmail.com> Tony Meyer wrote: > As long as we stay away from actually expressing it as a "batting average" > and avoid baseball terminology. Those numbers are just completely confusing > to me, and I suspect most non-Americans. I agree. I think sticking with percentages is the way to go. I suppose it's a reasonable analogy once it's explained, but it isn't going to be obvious even to a baseball fanatic just from looking at the numbers. The main thing I took from the proposal was expressing accuracy separately for ham and spam instead of taking each as a percentage of the total messages. Even if 99% of messages have been classified correctly, it makes a big difference whether the remaining 1% represents spam that made it through to the inbox vs. ham that was removed from the inbox by mistake. I'm still a little unsure (ok, pun intended, couldn't resist ) how to treat unsures in this. Currently I'm showing the primary accuracy results based on the number of messages that SpamBayes classified as either ham or spam, with a separate percentage showing the additional messages that were classified as unsure. Another option I considered was measuring the percentage of messages removed from the inbox. It seems that ham and spam are somewhat asymmetric with regards to unsures. I suspect that most people are ok with spam being classified as unsure as long as it isn't left in their inbox, but they would prefer not to see a ham message removed from the inbox even if it is only moved to the unsure bin. Any thoughts/suggestions/preferences? -- Kenny Pitt From tmokros at neo.rr.com Wed Dec 8 21:50:30 2004 From: tmokros at neo.rr.com (Todd Mokros) Date: Wed Dec 8 22:02:04 2004 Subject: [spambayes-dev] Support for X-Spambayes-Trained headers in sb_imapfilter Message-ID: I use sb_filter from procmail to filter and autotrain incoming messages which end up on a cyrus imap server. I then use sb_imapfilter for retraining mistakes. The problem here is that sb_filter uses the X-Spambayes-Trained header to indicate how a message was trained, whereas sb_imapfilter only looks at the messageinfo db. I've modified sb_imapfilter to use the X-Spambayes-Trained header to determine if untraining is required, and update it accordingly. Is there interest in these changes? If so, I can clean it up for submission, adding an option to choose between the two methods of tracking message training. -- Todd Mokros From Mike at GastonFamily.org Wed Dec 8 22:52:34 2004 From: Mike at GastonFamily.org (Mike Gaston) Date: Wed Dec 8 22:52:38 2004 Subject: [spambayes-dev] Question about re-training SpamBayes Message-ID: <0I8F00D33BFN2D@mta1.srv.hcvlny.cv.net> I didn't see this question listed in the FAQs: I need to re-train SpamBayes. I've saved all the spam it's detected over the past year, so it's no problem giving it plenty of examples of spam. My "ham" messages (the good stuff) is all located in my Deleted folder, which I never empty. But SpamBayes won't let me choose my Deleted folder as a source of training. Why, and how can I get around this? Thanks. Mike Gaston Somerville NJ Mike@GastonFamily.org -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes-dev/attachments/20041208/1a5beede/attachment.html From kennypitt at hotmail.com Wed Dec 8 23:29:02 2004 From: kennypitt at hotmail.com (Kenny Pitt) Date: Wed Dec 8 23:30:05 2004 Subject: [spambayes-dev] Question about re-training SpamBayes In-Reply-To: <0I8F00D33BFN2D@mta1.srv.hcvlny.cv.net> Message-ID: "Deleted Items" is a special folder in Outlook and sometimes produces strange results when used in conjunction with SpamBayes. To prevent this, we specifically prevent the user from selecting "Deleted Items" in SpamBayes. To work around it, just create a normal folder for good messages and move all your good mail from "Deleted Items" into that folder. A couple of notes about training, though. We generally recommend that you not train SpamBayes on a large amount of existing mail. SpamBayes learns quickly, and keeping your training data small usually means that SpamBayes can adapt to new types of mail more quickly. I would recommend choosing at most about 10 messages of each type, putting those into separate "Training - Spam" and "Training - Good" folders and then training on those. When I retrain, I don't give it any initial training at all. I just delete the database files from the data folder and let SpamBayes start from scratch. And for any initial training that you do, try to make sure that you have an approximately equal number of good and spam messages. -- Kenny Pitt _____ From: spambayes-dev-bounces@python.org [mailto:spambayes-dev-bounces@python.org] On Behalf Of Mike Gaston Sent: Wednesday, December 08, 2004 4:53 PM To: spambayes-dev@python.org Subject: [spambayes-dev] Question about re-training SpamBayes I didn't see this question listed in the FAQs: I need to re-train SpamBayes. I've saved all the spam it's detected over the past year, so it's no problem giving it plenty of examples of spam. My "ham" messages (the good stuff) is all located in my Deleted folder, which I never empty. But SpamBayes won't let me choose my Deleted folder as a source of training. Why, and how can I get around this? Thanks. Mike Gaston Somerville NJ Mike@GastonFamily.org -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes-dev/attachments/20041208/47179c46/attachment.htm From tameyer at ihug.co.nz Thu Dec 9 00:17:00 2004 From: tameyer at ihug.co.nz (Tony Meyer) Date: Thu Dec 9 00:17:38 2004 Subject: [spambayes-dev] Support for X-Spambayes-Trained headers insb_imapfilter In-Reply-To: Message-ID: > I use sb_filter from procmail to filter and autotrain > incoming messages which end up on a cyrus imap server. > I then use sb_imapfilter for retraining mistakes. The > problem here is that sb_filter uses the > X-Spambayes-Trained header to indicate how a message was > trained, whereas sb_imapfilter only looks at the > messageinfo db. I've modified sb_imapfilter to use the > X-Spambayes-Trained header to determine if > untraining is required, and update it accordingly. > > Is there interest in these changes? If so, I can clean it up for > submission, adding an option to choose between the two > methods of tracking message training. I would (personally) have preferred an optional change that let sb_filter add appropriate entries to the messageinfo database, but yes I think we'd be interested. Please make sure that the diff is a context diff against current CVS (1.1's sb_imapfilter is very different to 1.0.x's). It's up to you how you submit the patch, but I would like it to work like this: * Don't add yet another option (there are so many already!) * Have it fall back to looking for the X-Spambayes-Trained header if it can't find an entry in the messageinfo db. * Don't add the header (avoiding rewriting the messages is nice with IMAP), just update the messageinfo db. This should work as long as nothing else is needed access to that header (sb_filter doesn't ever read it, right?) This would appear in 1.1, most probably. =Tony.Meyer From tameyer at ihug.co.nz Thu Dec 9 05:01:31 2004 From: tameyer at ihug.co.nz (Tony Meyer) Date: Thu Dec 9 05:02:06 2004 Subject: [spambayes-dev] Enhanced Outlook statistics display In-Reply-To: Message-ID: > I'm still a little unsure (ok, pun intended, couldn't resist > ) how to treat unsures in this. Currently I'm showing > the primary accuracy results based on the number of messages > that SpamBayes classified as either ham or spam, with a > separate percentage showing the additional messages that were > classified as unsure. This works for me. For filters like SpamBayes that do h/s/u rather than just h/s I can't see any way to meaningfully compare results unless some weighting selected (which makes it reasonably arbitrary). I think we just have to have four stats. I've wondered about putting up the spam "cost" as calculated by the various testtools scripts (by default 10*fp+fn+0.2*unsure). It does give a single figure for accuracy that takes into consideration how bad fp's are and works with unsures - and you could use it with filters that don't have an unsure category. The weights are adjustable via options, although few people would. (As an aside: if I had the time for a little research project (which I won't for at least another year), it would be interesting to examine how people do actually weigh fp/fn/unsures (eg *10 is probably low for fp), and then there would be a justifiable way to give a single number measure. I'm sure a nice little paper could be written on this. If anyone else reading this is keen on doing the research, let me know and maybe I do have time ). > Another option I considered was measuring the percentage of > messages removed from the inbox. It seems that ham and spam > are somewhat asymmetric with regards to unsures. I suspect > that most people are ok with spam being classified as unsure > as long as it isn't left in their inbox, but they would > prefer not to see a ham message removed from the inbox even > if it is only moved to the unsure bin. I'm not brave enough to predict what people think, but I know that I'm fine with ham going to unsures. I don't really care what the mix is there, as long as it's reasonably small (~2% is ok with me). It does seem (from the spambayes@python.org feedback) that unsure boxes do tend to be mostly spam, though. > Any thoughts/suggestions/preferences? I'm fine with the stats that we have now (what I would like, and might get to at some point, is to centralise the stats code somewhat so that we don't have to keep updating both the web interface code and Outlook separately). What do you think about the stats that are requested in the tracker? Another thought I had was that we could fit a "Reset Statistics" button on the Statistics panel (all it would have to do is delete the pickle and reset the session stats). People might want to collect (eg) monthly stats, or stats after an initial training period, and that would make it easier for them. I hate mucking about with the dialogs - you want to do this? ;) =Tony.Meyer From kennypitt at hotmail.com Thu Dec 9 14:38:27 2004 From: kennypitt at hotmail.com (Kenny Pitt) Date: Thu Dec 9 14:39:08 2004 Subject: [spambayes-dev] Enhanced Outlook statistics display In-Reply-To: Message-ID: Tony Meyer wrote: > I've wondered about putting up the spam "cost" as calculated by the > various testtools scripts (by default 10*fp+fn+0.2*unsure). It does > give a single figure for accuracy that takes into consideration how > bad fp's are and works with unsures - and you could use it with > filters that don't have an unsure category. The weights are > adjustable via options, although few people would. That's another possibility, although it would probably be more difficult to compare against other spam filters (especially if anyone did adjust the weights). John's main point in his "batting average" article was that a single accuracy score makes it difficult to see the difference between filters that reduce false positives by letting though a lot of spam vs. filters that kill almost all of the spam at the expense of increased false positives. By reporting the scores separately, the user can make the tradeoff based on what is more important to them. > I'm fine with the stats that we have now (what I would like, and > might get to at some point, is to centralise the stats code somewhat > so that we don't have to keep updating both the web interface code > and Outlook separately). That would be good, but difficult currently because they take entirely different approaches. The Oulook addin totals up the stats as it goes, while sb_server recalculates them by iterating through the data in the messageinfo database. Maybe the changes you made to utilize the same messageinfo database for Outlook will allow us to calculate the Outlook stats the same way. At the very least, though, we could probably create a function that takes the raw counts as a parameter and returns the formatting dictionary with the complete set of statistics. I'll look into that as a first step. > What do you think about the stats that are requested in the tracker? Are you refering to RFE #765924 regarding breaking down the stats by hour/day/week, etc? That seems like a lot of work for a questionable value, especially since we would probably have to store a bit more data in messageinfo to allow it. > Another thought I had was that we could fit a "Reset Statistics" > button on the Statistics panel (all it would have to do is delete the > pickle and reset the session stats). People might want to collect > (eg) monthly stats, or stats after an initial training period, and > that would make it easier for them. I hate mucking about with the > dialogs - you want to do this? ;) Should be easy enough, I'll take a look. It would probably be nice to save the date when the statistics were last reset, as well. I haven't done much with pickles. Is that something that could be easily added to the stats file? -- Kenny Pitt From tmokros at neo.rr.com Thu Dec 9 19:27:54 2004 From: tmokros at neo.rr.com (Todd Mokros) Date: Thu Dec 9 19:28:19 2004 Subject: [spambayes-dev] Support for X-Spambayes-Trained headers in sb_imapfilter In-Reply-To: References: Message-ID: <1102616875.30310.46.camel@localhost> On Thu, 2004-12-09 at 12:17 +1300, Tony Meyer wrote: > I would (personally) have preferred an optional change that let sb_filter > add appropriate entries to the messageinfo database, but yes I think we'd be > interested. Please make sure that the diff is a context diff against > current CVS (1.1's sb_imapfilter is very different to 1.0.x's). Now that you brought it up, updating sb_filter to use the messaginfo db looks like the better solution. Unconditionally using the Spambayes- Trained header as a fallback will prevent training a new database on messages trained on an old database. To handle this case, my original changes added the -f (force) flag to sb_imapfilter, which worked as it does in sb_mboxtrain, ignoring any previous training results. If useful I can submit that as well. I'm submitted the patch for the header fallback with your issues addressed. It also fixes a small bug where the spambayes headers would be lost if a message was untrained. I'll look at implementing use of the messageinfo db in sb_filter, which would solve a number of issues. If I go forward with it, I think it will require an option(or at least a cmdline arg) to choose the header or messageinfo db method, because at least sb_mboxtrain uses the header method for retraining. Any thoughts? Patch URL(wordwrap will probably mess with it): https://sourceforge.net/tracker/index.php? func=detail&aid=1082344&group_id=61702&atid=498105 -- Todd Mokros From T.A.Meyer at massey.ac.nz Thu Dec 9 22:49:30 2004 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Thu Dec 9 22:51:12 2004 Subject: [spambayes-dev] Enhanced Outlook statistics display Message-ID: [Tony talking about the weighted cost calculation] > That's another possibility, although it would > probably be more difficult to compare against other > spam filters (especially if anyone did adjust the > weights). Yes - if it was to be used to compare then the weights would have to be agreed on in advance. > John's main point in his "batting average" article was that a > single accuracy score makes it difficult to see the difference > between filters that reduce false positives by letting though > a lot of spam vs. filters that kill almost all of the spam at > the expense of increased false positives. By reporting the scores > separately, the user can make the tradeoff based on what is more > important to them. The cost does this as long as the weights are correct for that user, though. e.g. if I *hated* fp's, didn't care at all about unsures, and hardly cared about fn's, I could have weights of (eg) 100.0, 0.0 and 0.1 (respectively) and the score would reflect what was important to me. Kinda like John's method of dividing the two numbers into each other, but better. If the user (or reviewer, or whatever) is able to understand having two (or four!) numbers, then that's better, though. Comparing filters is hard for many other reasons, anyway (training regime, mail stream, etc) [consolidating stats code] > That would be good, but difficult currently because > they take entirely different approaches. The Outlook > addin totals up the stats as it goes, > while sb_server recalculates them by iterating through > the data in the messageinfo database. I had forgotten about some of this, although I was thinking about a higher level consolidation taking the raw counts, as you suggest. > Maybe the changes you made to utilize the same > messageinfo database for Outlook will allow us to > calculate the Outlook stats the same way. That's an interesting idea. It would save us having the separate database. I've wondered (since I wrote the web interface method) whether it would get really slow as the db increases in size, since it iterates through the whole thing each time the stats are generated. I should have a play around and see if that is going to be a problem or not (if so, maybe some sort of middle ground between the methods can be found that both systems can use). >> What do you think about the stats that are requested in the tracker? > Are you refering to RFE #765924 regarding breaking down the stats by > hour/day/week, etc? That seems like a lot of work for a questionable > value, especially since we would probably have to store a bit more > data in messageinfo to allow it. Sorry, that was rather vague. Yes, I did mean that RFE. Those were my thoughts too. Maybe a little script that just printed out the current stats would be sufficient - if someone really wanted daily/whatever stats, they could just set up some utility to call that script at the appropriate interval. The number classified would say how much mail was received in that period, and you could probably extract that rest from it. Without any more demand, though, I'm inclined to leave it. [Reset stats button] > Should be easy enough, I'll take a look. Thanks :) > It would probably be nice to save the date when the > statistics were last reset, as well. Good idea. > I haven't done much with pickles. Is that something > that could be easily added to the stats file? >From memory (I don't have access to the code from where I am at the moment), the pickle is just a dict that gets saved. So you could just add another value ('stats["RESET_DATE"] = date' or something) and it would get saved. However, I had forgotten until reading your message about the differences between how the web interface and Outlook go about it. If it is now possible for the plugin to use the messageinfo db, then maybe we don't need the stats pickle any more. We could store a classified_date (and trained_date?) in the messageinfo db easily enough, and then only pull the data we want (adding a 'current stats starting point' value too, I guess). I'll think about this and have a look at how quickly the db is going to increase in size (it's already going to be larger than the old version). =Tony.Meyer From T.A.Meyer at massey.ac.nz Thu Dec 9 22:57:00 2004 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Thu Dec 9 22:57:09 2004 Subject: [spambayes-dev] Enhanced Outlook statistics display Message-ID: [Tony Meyer] > I've wondered about putting up the spam "cost" as calculated by the > various testtools scripts (by default 10*fp+fn+0.2*unsure). One other thought - I suppose there maybe ought to be four weights - fp, fn, ham-unsure and spam-unsure. Maybe people care more about ham ending up in the unsure folder than they do about spam going there, or vice-versa. One more item for my hypothetical research project, anyway . =Tony.Meyer From kennypitt at hotmail.com Mon Dec 13 20:16:32 2004 From: kennypitt at hotmail.com (Kenny Pitt) Date: Mon Dec 13 20:17:05 2004 Subject: [spambayes-dev] RE: [Spambayes] Re: Training empty messages problem In-Reply-To: Message-ID: Tony Meyer wrote: >> When I look in my Junk mail folder those empty spam still have a spam >> probability of below 50%, partly caused by those "message-id:invalid" >> headers (sorry I didn't pick that up sooner). I looked at some >> Exchange mails in my inbox and all those have invalid message-id. So >> just like from:none, all of my internal Exchange mails have >> "message-id:invalid". > > Interesting - I wouldn't have thought they would have any message-id. > Could you pick a random Exchange (good) mail, and send me a copy of > the message id header for that message? Maybe there's an Exchange > format for the things that we can leverage. I get the same message-id behavior from our Exchange server. Here's the complete set of headers from a recent mail as SpamBayes sees them in Show Clues. """ X-Exchange-Message: true Subject: A recent Exchange message From: Joe Smith To: All Employees X-Exchange-Delivery-Time: Mon, 13 Dec 2004 13:52:22 -0500 """ This was taken using latest CVS. Names have been changed to protect the innocent, but otherwise the headers are completely intact. Notice that there is no message id header of any sort, and that the From and To fields do not use Internet standard address format. The following tokens were included among the clues, and are typical for most if not all of my Exchange mail: """ token spamprob #ham #spam 'message-id:invalid' 0.214766 19 9 'x-mailer:none' 0.622068 88 258 'from:no real name:2**0' 0.642539 29 93 """ Maybe there's a property in the Outlook message object somewhere that we need to retrieve and add to the headers when we reconstruct the message? -- Kenny Pitt From tameyer at ihug.co.nz Tue Dec 14 08:17:30 2004 From: tameyer at ihug.co.nz (Tony Meyer) Date: Tue Dec 14 08:18:06 2004 Subject: [spambayes-dev] RE: [Spambayes] Re: Training empty messages problem In-Reply-To: Message-ID: > I get the same message-id behavior from our Exchange server. Ah - so do I, I realise now. I was looking just in the headers that SpamBayes shows in "Show Clues", but I see I get the invalid token even without one showing there. Looking at the tokenizing code, if there is no message-id header, then the 'invalid' token is generated. (I suppose the thought is that not having the header is not a valid case). > Notice that there is no message id header of any sort, and that > the From and To fields do not use Internet standard address format. > The following tokens were included among the clues, and are typical for > most if not all of my Exchange mail: > > """ > token spamprob #ham #spam > 'message-id:invalid' 0.214766 19 9 > 'x-mailer:none' 0.622068 88 258 > 'from:no real name:2**0' 0.642539 29 93 > """ > > Maybe there's a property in the Outlook message object > somewhere that we need to retrieve and add to the headers when we > reconstruct the message? Maybe we ought to be making an attempt to generate headers for all those in the safe_headers option (or, alternatively, changing the default value for safe_headers for Outlook users. The headers that we could probably generate include "date", "from" (we could be smarter about how it is presented), "importance", "in-reply-to" (?), "message-id", "organization" (?), "received" (maybe too much effort), "reply-to", "to" (smarter), and "user-agent". We could generate "x-mailer", which is tokenized separately, too. None of this is hard - it's just a case of running Outlook2000/sandbox/dumpprops.py on one of these messages, looking up the appropriate property names, and then modifying the function to get & format the appropriate data. I guess (but do not know) that getting a few extra properties as well as the ones we already get would not significantly effect the time that was required. However, there is the question of whether this will help or hinder. At the moment, we get a whole bunch of "I'm an Exchange message" tokens, which I suspect for most people are significant ham clues. If we replace those with more data, maybe it'll be harder to nail Exchange messages (I would guess not, but stupid beats smart, etc). We could add an (experimental?) Outlook option "synthesised_exchange_headers", which lists headers (like those above) to try and synthesise (the current situation being "to,from,subject"). That way at least users could relatively easily change the situation (e.g. revert back to 1.0.x behaviour). (Retraining would probably be necessary to have much effect, though). I'll try and find time to whip up something like this and run some test scripts with it (although the ratio of Exchange mail will have a big influence on results, I imagine) and see what happens. Probably not until the end of the week, or the start of next one, though. At least, since it's Outlook, if we make the situation worse, Tim will probably notice and yell at us . =Tony.Meyer From kenny.pitt at gmail.com Wed Dec 15 21:23:49 2004 From: kenny.pitt at gmail.com (Kenny Pitt) Date: Wed Dec 15 21:23:53 2004 Subject: [spambayes-dev] RE: [Spambayes] Re: Training empty messages problem In-Reply-To: References: Message-ID: <2a052b9904121512235df15878@mail.gmail.com> Tony Meyer wrote: > Maybe we ought to be making an attempt to generate headers for all those in > the safe_headers option (or, alternatively, changing the default value for > safe_headers for Outlook users. The headers that we could probably generate > include "date", "from" (we could be smarter about how it is presented), > "importance", "in-reply-to" (?), "message-id", "organization" (?), > "received" (maybe too much effort), "reply-to", "to" (smarter), and > "user-agent". We could generate "x-mailer", which is tokenized separately, > too. I found the property for the Message-ID and added it to the headers. What you get is the same message-id that an external user would see if your Exchange server sends the message on via SMTP. I'll try to have a look at some of the others when I have more time, unless you beat me to it. I see a PR_IMPORTANCE property that can probably be used for importance if we figure out what its integer values translate to. For date, there are two properties to choose from, PR_CREATION_TIME and PR_CLIENT_SUBMIT_TIME. Both appear to be UTC times, but that could just be the way dump_props is showing them. For x-mailer, the only thing I see is what appears to be the Outlook version number of the sender. -- Kenny Pitt From tameyer at ihug.co.nz Wed Dec 15 23:34:04 2004 From: tameyer at ihug.co.nz (Tony Meyer) Date: Wed Dec 15 23:36:26 2004 Subject: [spambayes-dev] Enhanced Outlook statistics display In-Reply-To: Message-ID: Further to earlier discussion about calculating cost figures, and in case other people are interested and not aware of it, JGC's latest newsletter mentions this paper: And this page on the SpamAssassin wiki: I haven't had a chance to read through it in depth, but the "total cost ratio" appears to be more-or-less the same thing as the cost values that the spambayes testtools scripts produce (with the addition of an unsure weight). Maybe Tim knew of this and it's deliberate, but, in any case, it is interesting to see that it has been used elsewhere. (I wish I had found this when I was writing my CEAS paper earlier in the year). =Tony.Meyer From tameyer at ihug.co.nz Thu Dec 16 04:29:02 2004 From: tameyer at ihug.co.nz (Tony Meyer) Date: Thu Dec 16 04:29:40 2004 Subject: [spambayes-dev] RE: [Spambayes] Re: Training empty messages problem In-Reply-To: Message-ID: > I found the property for the Message-ID and added it to the headers. > What you get is the same message-id that an external user would see if > your Exchange server sends the message on via SMTP. Great :) > I'll try to have a look at some of the others when I have more time, > unless you beat me to it. Well, if you're going to make it a race... . I've checked in some changes. > I see a PR_IMPORTANCE property that > can probably be used for importance if we figure out what its integer > values translate to. I looked them up, and have done that. > For date, there are two properties to choose > from, PR_CREATION_TIME and PR_CLIENT_SUBMIT_TIME. Both appear to be > UTC times, but that could just be the way dump_props is showing them. I chose SUBMIT_TIME, as I gather that's the time it was submitted for sending, which is probably the best choice, I think. The formatting was like what I had for X-Exchange-Delivery-Time (which I have reformatted into a Received header). I also did organisation - however, this is another one of those unnamed (and seemingly undocumented) properties, so I'm not 100% that it's correct. For one, I can't set an "organisation" value for my exchange account, only POP/IMAP accounts. However, if I send myself mail, I get the organisation value from another account tacked on to the message, so it seems that's what it is. Could you try this one out (if possible) and see what you get? > For x-mailer, the only thing I see is what appears to be the Outlook > version number of the sender. There are two, I think, eg I have: 0x7E8EFFE2 : '10.0' 0x7E8FFFFD : 104712 The '10.0' is a pretty version, and the other one is what's in the "About" dialog box (as 10.4712....). These would be enough, and I added the code, but for some reason Outlook chokes if I try to get either of these properties (but no other ones that I can find). I'm not sure why that is. You can look into it if you like <0.5 wink>. =Tony.Meyer From tim.peters at gmail.com Thu Dec 16 07:00:01 2004 From: tim.peters at gmail.com (Tim Peters) Date: Thu Dec 16 07:00:13 2004 Subject: [spambayes-dev] Enhanced Outlook statistics display In-Reply-To: References: Message-ID: <1f7befae041215220059a7ff7a@mail.gmail.com> [Tony Meyer] > Further to earlier discussion about calculating cost figures, and in > case other people are interested and not aware of it, JGC's latest > newsletter mentions this paper: > > I'm sure we mentioned that paper here in the early days; and note that Gary Robinson's oft-noted site has referred to it too approximately forever, although via a different link: http://arxiv.org/abs/cs.CL/0006013 > And this page on the SpamAssassin wiki: > > > > I haven't had a chance to read through it in depth, but the "total > cost ratio" appears to be more-or-less the same thing as the cost > values that the spambayes testtools scripts produce (with the > addition of an unsure weight). > > Maybe Tim knew of this and it's deliberate, I knew the paper, and the choice to model costs in SpamBayes testing in terms of hypothetical dollars charged to instances of different kinds of errors was deliberate, but there's really no connection between those. "Dollars and cents" models are simply intuitively appealing to people regardless of statistical background, and I didn't want the volunteer testers on this project to feel put out by a measure that seemed esoteric. I also didn't give a rip about publishing results, so didn't feel compelled to use measures with "lambdas" or "betas" just for academic brownie points . > but, in any case, it is interesting to see that it has been used > elsewhere. If you're going to provide a single figure of merit, there are constraints pushing in this direction. The choice of a linear model is convenient and arguably a good first-order (literally) approximation to a realistic cost model. > (I wish I had found this when I was writing my CEAS paper earlier > in the year). Then you should have asked . From tameyer at ihug.co.nz Thu Dec 16 08:43:07 2004 From: tameyer at ihug.co.nz (Tony Meyer) Date: Thu Dec 16 08:43:48 2004 Subject: [spambayes-dev] Enhanced Outlook statistics display In-Reply-To: Message-ID: [Tim Peters] > I'm sure we mentioned that paper here in the early days; I'll believe you. It may have been prior to the separate mailing list (I only read that far back), or maybe I just don't recall it - when I did read the archives, I wasn't really that interested in reading stuff like that. > and note that > Gary Robinson's oft-noted site has referred to it too approximately > forever, although via a different link: > > http://arxiv.org/abs/cs.CL/0006013 Yes, but *after* the maths, when everyone has stopped reading . > I knew the paper, and the choice to model costs in SpamBayes testing > in terms of hypothetical dollars charged to instances of different > kinds of errors was deliberate, but there's really no connection > between those. "Dollars and cents" models are simply intuitively > appealing to people regardless of statistical background, and I didn't > want the volunteer testers on this project to feel put out by a > measure that seemed esoteric. Since I've got you talking , what was the basis behind the $10,$1,0.20c choices? Just numbers that seemed right, or something more concrete? > If you're going to provide a single figure of merit, there are > constraints pushing in this direction. The choice of a linear model > is convenient and arguably a good first-order (literally) > approximation to a realistic cost model. I think the people asking for more statistics (in the GUI) are probably after a single figure - something to wave in front of people that ask about the accuracy. Straight lines are easier to draw, too, if we ever do provide the requested graphical representation <0.5 wink>. [Tony Meyer] >> (I wish I had found this when I was writing my CEAS paper earlier >> in the year). [Tim Peters] > Then you should have asked . Well, it didn't really make that much difference, but it would have saved me trying to explain the idea. That's what I get for (co)writing a paper in an area I have very little background in without the luxury of time to do better background reading. If I do wade outside my 'proper' research area again, I'll at least be better prepared (and maybe I will ask :). =Tony.Meyer From kenny.pitt at gmail.com Thu Dec 16 16:55:39 2004 From: kenny.pitt at gmail.com (Kenny Pitt) Date: Thu Dec 16 16:55:43 2004 Subject: [spambayes-dev] RE: [Spambayes] Re: Training empty messages problem In-Reply-To: Message-ID: <41c1affc.59952156.3140.0aeb@smtp.gmail.com> Tony Meyer wrote: > I also did organisation - however, this is another one of those > unnamed (and seemingly undocumented) properties, so I'm not 100% that > it's correct. For one, I can't set an "organisation" value for my > exchange account, only POP/IMAP accounts. However, if I send myself > mail, I get the organisation value from another account tacked on to > the message, so it seems that's what it is. Could you try this one > out (if possible) and see what you get? I checked several of my messages with dump_props, and I can't find a property id 0x1037001E in any of my Exchange messages (even those that arrived from outside SMTP senders). I sent myself a message from a POP3 account that had the Organization set and then did a dump_props. The only place in the properties that the organization name I set showed up at all was in the PR_TRANSPORT_MESSAGE_HEADERS_A property. (Just FYI, the e-mail headers were defined by us crazy Americans , so the official header name is spelled "Organization" with a z) I also noticed a couple of small problems with the formatting of the e-mail addresses. From: KennyPitt@invalid (Kenny Pitt) According to RFC 822, this is technically correct because the "(Kenny Pitt)" part should be ignored as a comment. I believe the more common format, however, should be: From: Kenny Pitt Or even safer since we don't know what characters might appear in the display name: From: "Kenny Pitt" For the To header, it just happened that I sent the message to two recipients so that I could compare the Exchange results to the POP3 results. The original content of the To field in Outlook was "kennypitt@hotpop.com; Kenny Pitt". Here is what I got out in my spam clues: To: kennypitthotpop.comKennyPitt@invalid (kennypitt@hotpop.com; Kenny Pitt) The correct format, I believe, would be: To: kennypitt@hotpop.com; "Kenny Pitt" Should be a simple matter of splitting the addresses on the ";" character. I'm going to go take a shot at this and see what I get. -- Kenny Pitt From sethg at GoodmanAssociates.com Thu Dec 16 22:05:44 2004 From: sethg at GoodmanAssociates.com (Seth Goodman) Date: Thu Dec 16 22:05:53 2004 Subject: [spambayes-dev] RE: [Spambayes] Re: Training empty messagesproblem In-Reply-To: <41c1affc.59952156.3140.0aeb@smtp.gmail.com> Message-ID: > From: Kenny Pitt > Sent: Thursday, December 16, 2004 9:56 AM <...> > The correct format, I believe, would be: > > To: kennypitt@hotpop.com; "Kenny Pitt" > > Should be a simple matter of splitting the addresses on the ";" character. > I'm going to go take a shot at this and see what I get. This is acceptable, and typical for Outlook, but it does involve some legacy constructs which have been deprecated in RFC2822. RFC2822 updates and replaces RFC822 for all practical purposes and is a better reference to use. It does list which "obsolete" address formats must be accepted. In particular, the use of the first address without angle brackets is deprecated, though recognized as a legacy format that must be accepted. Current practice is to include all addresses in angle brackets and unless that causes problems in Outlook, that would be preferable. The second problem is the use of a semi-colon to separate addresses. This is now supposed to be a comma, though the obsolete semi-colon delimiter of RFC822 is explicitly supported. My copy of Outlook2000 contains a check box to accept commas as address delimiters, which is the default setting, but it still produce semicolons for display. I think it would be prudent to accept either delimiter, in case MS ever gets a gram of clue and drops the deprecated format. This will also position you to more easily integrate with non-MS MUA's, hopefully open-source ones as they become popular enough. Microsoft never did give a rat's posterior about IETF standards and often uses them as a marketing tool to "differentiate" their products (translation: intentionally create interoperability problems). Another general question on standards compliance is does Spambayes support the Resent-*: series of headers? These are neither generated nor displayed by Outlook, since Microsoft apparently never considered RFC2822 relevant. However, many other MUA's use the remailing syntax of that standard, which uses those headers. Though they are defined as trace headers and in that sense are optional, they are required in order to use the remailing semantics of RFC2822 section 3.6.6. An example of this is Pine's bounce function. The fact that MS completely ignores those headers in their MUA's has created a huge problem for those of us who are involved in message authentication standards efforts. When used, those headers do contain important information, and as authentication becomes more common, they will become more important. My suggestion is that, of that whole series of headers, the ones that would be of interest to Spambayes are: Resent-From: Resent-Sender: Resent-To: Resent-cc: Below is the relevant text from RFC2822. Some tokens are only defined in other sections and there are two that are worth describing here. "Phrase" is a quoted string, an atom or an obsolete format consisting of a combination of words including "." and CFWS. CFWS is "commented folding white space" that encompasses folding white space and comments, where comments are parenthesis-delimited strings. This is relevant to the way you described Outlook presenting some addresses and is the most serious difference from the standards. Even in RFC822, comments were permitted but expressly ignored in address strings, so Microsoft's practice is completely broken. RFC2822 specifically says that comments SHOULD NOT be included in address fields, as legacy implementations sometimes interpret the comments. Apparently, we are now a legacy application because MS has forced us to interpret the content of comments in order to get the correct address-list from their broken MUA. Buggers. " 3.4. Address Specification Addresses occur in several message header fields to indicate senders and recipients of messages. An address may either be an individual mailbox, or a group of mailboxes. address = mailbox / group mailbox = name-addr / addr-spec name-addr = [display-name] angle-addr angle-addr = [CFWS] "<" addr-spec ">" [CFWS] / obs-angle-addr group = display-name ":" [mailbox-list / CFWS] ";" [CFWS] display-name = phrase mailbox-list = (mailbox *("," mailbox)) / obs-mbox-list address-list = (address *("," address)) / obs-addr-list A mailbox receives mail. It is a conceptual entity which does not necessarily pertain to file storage. For example, some sites may choose to print mail on a printer and deliver the output to the addressee's desk. Normally, a mailbox is comprised of two parts: (1) an optional display name that indicates the name of the recipient (which could be a person or a system) that could be displayed to the user of a mail application, and (2) an addr-spec address enclosed in angle brackets ("<" and ">"). There is also an alternate simple form of a mailbox where the addr-spec address appears alone, without the recipient's name or the angle brackets. The Internet addr-spec address is described in section 3.4.1. Note: Some legacy implementations used the simple form where the addr-spec appears without the angle brackets, but included the name of the recipient in parentheses as a comment following the addr-spec. Since the meaning of the information in a comment is unspecified, implementations SHOULD use the full name-addr form of the mailbox, instead of the legacy form, to specify the display name associated with a mailbox. Also, because some legacy implementations interpret the comment, comments generally SHOULD NOT be used in address fields to avoid confusing such implementations. When it is desirable to treat several mailboxes as a single unit (i.e., in a distribution list), the group construct can be used. The group construct allows the sender to indicate a named group of recipients. This is done by giving a display name for the group, followed by a colon, followed by a comma separated list of any number of mailboxes (including zero and one), and ending with a semicolon. Because the list of mailboxes can be empty, using the group construct is also a simple way to communicate to recipients that the message was sent to one or more named sets of recipients, without actually providing the individual mailbox address for each of those recipients. " -- Seth Goodman From kenny.pitt at gmail.com Thu Dec 16 23:21:15 2004 From: kenny.pitt at gmail.com (Kenny Pitt) Date: Thu Dec 16 23:21:17 2004 Subject: [spambayes-dev] RE: [Spambayes] Re: Training empty messagesproblem In-Reply-To: Message-ID: <41c20a5b.77b15e12.62a3.101a@smtp.gmail.com> Seth Goodman wrote: >> From: Kenny Pitt >> Sent: Thursday, December 16, 2004 9:56 AM > > <...> > >> The correct format, I believe, would be: >> >> To: kennypitt@hotpop.com; "Kenny Pitt" >> >> Should be a simple matter of splitting the addresses on the ";" >> character. I'm going to go take a shot at this and see what I get. > > This is acceptable, and typical for Outlook, but it does involve some > legacy constructs which have been deprecated in RFC2822. RFC2822 > updates and replaces RFC822 for all practical purposes and is a > better reference to use. It does list which "obsolete" address > formats must be accepted. In particular, the use of the first > address without angle brackets is deprecated, though recognized as a > legacy format that must be accepted. Current practice is to include > all addresses in angle brackets and unless that causes problems in > Outlook, that would be preferable. The second problem is the use of > a semi-colon to separate addresses. This is now supposed to be a > comma, though the obsolete semi-colon delimiter of RFC822 is > explicitly supported. My copy of Outlook2000 contains a check box to > accept commas as address delimiters, which is the default setting, > but it still produce semicolons for display. I think it would be > prudent to accept either delimiter, in case MS ever gets a gram of > clue and drops the deprecated format. This will also position you to > more easily integrate with non-MS MUA's, hopefully open-source ones > as they become popular enough. Microsoft never did give a rat's > posterior about IETF standards and often uses them as a marketing > tool to "differentiate" their products (translation: intentionally > create interoperability problems). That's all well and good, and I definitely appreciate the additional info, but we're not really concerned with interoperability here so all this may be overkill. The sole purpose here is to provide SpamBayes with something that it can recognize and produce reasonable tokens from when asked to process an Exchange message received from another local Exchange user. As long as we know that Microsoft always uses ';' as the delimeter in the Exchange Display Name field then I would prefer to stick to that delimeter to keep the code concise. The reason there are no brackets around the stand-alone Internet address is because that was how Outlook handed it to me. The Python e-mail package already contains all the necessary parsing logic to figure out what format the address is in. There didn't seem to be much point in doing extra parsing of the address to figure out if I needed to convert it to a different format that would then produce exactly the same set of SpamBayes tokens. > Another general question on standards compliance is does Spambayes > support the Resent-*: series of headers? These are neither generated > nor displayed by Outlook, since Microsoft apparently never considered > RFC2822 relevant. However, many other MUA's use the remailing syntax > of that standard, which uses those headers. Though they are defined > as trace headers and in that sense are optional, they are required in > order to use the remailing semantics of RFC2822 section 3.6.6. An > example of this is Pine's bounce function. The fact that MS > completely ignores those headers in their MUA's has created a huge > problem for those of us who are involved in message authentication > standards efforts. When used, those headers do contain important > information, and as authentication becomes more common, they will > become more important. My suggestion is that, of that whole series > of headers, the ones that would be of interest to Spambayes are: > > Resent-From: > Resent-Sender: > Resent-To: > Resent-cc: There are some differences between what non-Outlook versions of SpamBayes such as sb_server, sb_filter, and sb_imapfilter will see and what the Outlook addin will see because of the way Outlook destroys the original structure of the message. However, one thing that *is* preserved is the original headers of a message received via SMTP, so these headers should be included if they were part of the original message. By default, SpamBayes ignores these headers. There are options that you can tweak in the config file if you want them processed, though. I believe the Tokenizer:safe_headers option is where you would do this, but I've never used it myself so I'm not 100% certain. > Below is the relevant text from RFC2822. Some tokens are only > defined in other sections and there are two that are worth describing > here. "Phrase" is a quoted string, an atom or an obsolete format > consisting of a combination of words including "." and CFWS. CFWS is > "commented folding white space" that encompasses folding white space > and comments, where comments are parenthesis-delimited strings. This > is relevant to the way you described Outlook presenting some > addresses and is the most serious difference from the standards. > > Even in RFC822, comments were permitted but expressly ignored in > address strings, so Microsoft's practice is completely broken. > RFC2822 specifically says that comments SHOULD NOT be included in > address fields, as legacy implementations sometimes interpret the > comments. Apparently, we are now a legacy application because MS has > forced us to interpret the content of comments in order to get the > correct address-list from their broken MUA. Buggers. I don't remember mentioning anything about comments presented by Outlook in the address string. The comments came from Tony's first pass at simulating the address headers for an Exchange e-mail address, which is typically just a real name without an RFC 822 or 2822 compatible e-mail address. That has now been changed to use the real name along with a simulated RFC 822 (and hopefully 2822) compliant local address using the standard <> brackets. -- Kenny Pitt From sethg at GoodmanAssociates.com Fri Dec 17 00:29:12 2004 From: sethg at GoodmanAssociates.com (Seth Goodman) Date: Fri Dec 17 00:29:21 2004 Subject: [spambayes-dev] RE: [Spambayes] Re: Training empty messagesproblem In-Reply-To: <41c20a5b.77b15e12.62a3.101a@smtp.gmail.com> Message-ID: > From: Kenny Pitt > Sent: Thursday, December 16, 2004 4:21 PM <...> > That's all well and good, and I definitely appreciate the additional info, > but we're not really concerned with interoperability here so all this may > be overkill. The sole purpose here is to provide SpamBayes with > something that it can recognize and produce reasonable tokens from when > asked to process an Exchange message received from another local Exchange > user. If that's the goal, no argument. Microsoft will hopefully not control the MUA market forever, so I was just looking to the future. I am hoping that either OpenOffice will one day produce an Outlook look-alike or that Thunderbird will mature into a fully featured product competitive with Outlook. The Outlook plug-in is such a time-saver that it alone will keep me on Outlook for a long time to come :) <...> > > My suggestion is that, of that whole series > > of headers, the ones that would be of interest to Spambayes are: > > > > Resent-From: > > Resent-Sender: > > Resent-To: > > Resent-cc: > > There are some differences between what non-Outlook versions of SpamBayes > such as sb_server, sb_filter, and sb_imapfilter will see and what the > Outlook addin will see because of the way Outlook destroys the original > structure of the message. However, one thing that *is* preserved is the > original headers of a message received via SMTP, so these headers > should be included if they were part of the original message. > > By default, SpamBayes ignores these headers. There are options that you > can tweak in the config file if you want them processed, though. I > believe the Tokenizer:safe_headers option is where you would do this, > but I've never used it myself so I'm not 100% certain. I looked at file:///c:/Program%20Files/SpamBayes/docs/outlook/docs/configuration.html and it doesn't give any Tokenizer options at all, though they obviously exist. The directions also state that there are no experimental options in this release. Where else would I look to find a description of the supported configuration options? One of my email accounts also has a special header for Brightmail detected spam that would be helpful to tokenize. This is not the Brightmail tracker header itself, but one that my ISP adds. This header is always the same text and is as follows: X-TDS-Spam: Potential Spam Is there any support for tokenizing special headers like this? Could I manually add this to the list of safe headers to tokenize with that option? <...> > I don't remember mentioning anything about comments presented by > Outlook in the address string. The comments came from Tony's first pass > at simulating the address headers for an Exchange e-mail address, Sorry if I misinterpreted this. I thought that Outlook passed you an address string that contained some of the information between parentheses. -- Seth Goodman From tameyer at ihug.co.nz Fri Dec 17 02:31:50 2004 From: tameyer at ihug.co.nz (Tony Meyer) Date: Fri Dec 17 02:32:29 2004 Subject: [spambayes-dev] RE: [Spambayes] Re: Training empty messages problem In-Reply-To: Message-ID: Just so it's clear what's being discussed here - the changes we are making/proposing only effect three areas: 1. The tokenization of the message. 2. The display of the message in the 'Show Clues' message. 3. The message text if the export.py script is used. Specifically, the changes don't effect any MUA at all (unless you import the exported mail into another mailer, but then you ought to be using Outlook's export facility). These changes are intended to give more information to the tokenizer and to the sort+group.py script - as long as the tokenizer can parse them, the rest is (practically, if not aesthetically) unimportant. [Seth Goodman] > The second problem is the use of a semi-colon to separate addresses. > This is now supposed to be a comma, though the obsolete semi-colon > delimiter of RFC822 is explicitly supported. Drat, I hadn't realised this. That means our notate_to option is currently wrong, which actually matters. I'll fix that. I agree with Kenny regarding msgstore.py, though. [Seth Goodman] > This will also position you to more easily integrate > with non-MS MUA's, hopefully open-source ones as they become > popular enough. This code is unlikely to be usable for anything outside Exchange, so it shouldn't really matter. [Seth Goodman] >> Another general question on standards compliance is does >> Spambayes support the Resent-*: series of headers? These are >> neither generated nor displayed by Outlook, since Microsoft >> apparently never considered RFC2822 relevant. [Kenny Pitt] > By default, SpamBayes ignores these headers. There are > options that you can tweak in the config file if you want > them processed, though. I believe the Tokenizer:safe_headers > option is where you would do this, but I've never used it > myself so I'm not 100% certain. If you add the headers to Tokenizer:safe_headers then a token will be generated that includes a count of how many times that header appears in the message (something like 'Header:Resent-From:1') if the count is > 0 (and even then, if Tokenizer:record_header_absence is True). It doesn't do any tokenization of the actual value of the header, though. However, the value for the four headers you suggested are all address lists, right (i.e. in the same form as a "To:" header). In that case, they could be added to Tokenizer:address_headers which would mean that you'd get tokens like 'Resent-From:none', 'Resent-From:invalid', and 'Resent-From:addr:ta-meyer@ihug.co.nz'. (You could add them to both if you want all the tokens). Note that these won't already be in your database, so unless you retrain they'll be of no use until some training on messages that yield them is done. [Seth Goodman] > I looked at > file:///c:/Program%20Files/SpamBayes/docs/outlook/docs/configuration.html > and it doesn't give any Tokenizer options at all, though they obviously > exist. The directions also state that there are no experimental options > in this release. Where else would I look to find a description of the > supported configuration options? The Outlook plug-in has two sets of configuration data (one's in the '{profile name}.ini' file, the other is in the 'default_bayes_customize.ini' file). One is for things that only apply to Outlook (folder ids, timer settings, etc), and the other is for options that are shared with the rest of SpamBayes. That file's all about the Outlook-only settings (including Outlook-only experimental options, like the timer values once were) and doesn't mention the SpamBayes-specific options at all. Before 1.1/1.0.2 I'll update the documentation so that if Outlook users do want to play around with these (the vast majority won't) then the information is there. I ought to have done this before asking users to give the experimental options a go, but it slipped my mind (it's very simple for users of the web interface). For the moment, what you need to do is add the options to a file called 'default_bayes_customize.ini' in your Outlook plug-in's data directory (it may or may not already exist). The format is: ''' [Section name] option_name:option_value ''' The log will tell you if you set an option with an invalid value. To figure out what's available, if you have Python installed, you can use the instructions in FAQ 4.12: If not, then the best available at the moment is to read the source of Options.py, which ought to be reasonably readable. The 1.0.1 version is here: [Seth Goodman] > One of my email accounts also has a special header for Brightmail > detected spam that would be helpful to tokenize. This is not the > Brightmail tracker header itself, but one that my ISP adds. Yes, I get one of these too - "X-IHUG-iSpy". It's probably some sort of Brightmail option. [Seth Goodman] > This header is always the same text and is as follows: > > X-TDS-Spam: Potential Spam > > Is there any support for tokenizing special headers like this? Could I > manually add this to the list of safe headers to tokenize with that > option? Yes, however this will only be of use if the header doesn't appear in some mail, otherwise the token will be 'Header:X-TDS-Spam:1' for every message, and be no use at all. My "X-IHUG-iSpy" header appears even if the message isn't thought to be spam (then the value is "Doesn't appear to be Spam"), so this would be the case for me. If in your case the header isn't present for such mail, then this would work. To generate tokens with the content, too, you'll need to either write specific code for it (like the Habeas(tm) headers, for example), or use Tokenizer:basic_header_tokenize (off by default). If that option is enabled, then all headers generate tokens in the form "header:value", unless the header is listed in the Tokenizer:basic_header_skip option. =Tony.Meyer From tameyer at ihug.co.nz Fri Dec 17 02:35:17 2004 From: tameyer at ihug.co.nz (Tony Meyer) Date: Fri Dec 17 02:35:56 2004 Subject: [spambayes-dev] RE: [Spambayes] Re: Training empty messages problem In-Reply-To: Message-ID: > I checked several of my messages with dump_props, and I can't find a > property id 0x1037001E in any of my Exchange messages (even those that > arrived from outside SMTP senders). I sent myself a message > from a POP3 account that had the Organization set and then did a > dump_props. The only place in the properties that the organization name > I set showed up at all was in the PR_TRANSPORT_MESSAGE_HEADERS_A property. I think I'll wipe that one then. I get my actual organisation in lots of the properties after a "/O=" string, but that doesn't appear to be any sort of standard either. So organisation/organization is out. > (Just FYI, the e-mail headers were defined by us crazy Americans , > so the official header name is spelled "Organization" with a z) Opps. Sorry I should have thought about that, but just typed it in as it ought to be spelt . Thanks for the fix. > I also noticed a couple of small problems with the formatting > of the e-mail addresses. Thanks for this - looks good. IAC, it generates appropriate tokens, and that's what counts. =Tony.Meyer From tameyer at ihug.co.nz Fri Dec 17 02:36:01 2004 From: tameyer at ihug.co.nz (Tony Meyer) Date: Fri Dec 17 02:36:35 2004 Subject: [spambayes-dev] RE: [Spambayes] Re: Training empty messages problem In-Reply-To: Message-ID: > I checked several of my messages with dump_props, and I can't find a > property id 0x1037001E in any of my Exchange messages (even those that > arrived from outside SMTP senders). BTW, did you have the same version info in there as me with the same key? If so, were you able to retrieve it? =Tony.Meyer From kenny.pitt at gmail.com Fri Dec 17 16:00:22 2004 From: kenny.pitt at gmail.com (Kenny Pitt) Date: Fri Dec 17 16:00:26 2004 Subject: [spambayes-dev] RE: [Spambayes] Re: Training empty messages problem In-Reply-To: Message-ID: <41c2f487.14b8f82e.40e1.136a@smtp.gmail.com> Tony Meyer wrote: > BTW, did you have the same version info in there as me with the same > key? If so, were you able to retrieve it? I have the version info, but not under the same property id numbers. Yours from previous message: 0x7E8EFFE2 : '10.0' 0x7E8FFFFD : 104712 Mine: 0x7FF4FFE2 : '11.0' 0x7FF5FFFD : 116359 Looks like this property id isn't a fixed value. Maybe it depends on the Exchange Server version (I'm not sure which version of Exchange we're running). Maybe we should just use a fixed "X-Mailer: Microsoft Exchange" and forget the version number. I suspect that someone else on my local Exchange server is no more likely to send me spam if they use Outlook 2003 than if the use Outlook 2000 . -- Kenny Pitt From sethg at GoodmanAssociates.com Sat Dec 18 06:15:22 2004 From: sethg at GoodmanAssociates.com (Seth Goodman) Date: Sat Dec 18 06:15:27 2004 Subject: [spambayes-dev] RE: [Spambayes] Re: Training empty messagesproblem In-Reply-To: Message-ID: > From: Tony Meyer > Sent: Thursday, December 16, 2004 7:32 PM <...> > [Seth Goodman] > >> Another general question on standards compliance is does > >> Spambayes support the Resent-*: series of headers? These are > >> neither generated nor displayed by Outlook, since Microsoft > >> apparently never considered RFC2822 relevant. > > [Kenny Pitt] > > By default, SpamBayes ignores these headers. There are > > options that you can tweak in the config file if you want > > them processed, though. I believe the Tokenizer:safe_headers > > option is where you would do this, but I've never used it > > myself so I'm not 100% certain. > > If you add the headers to Tokenizer:safe_headers then a token will be > generated that includes a count of how many times that header appears > in the message (something like 'Header:Resent-From:1') if the count > is > 0 (and even then, if Tokenizer:record_header_absence is True). > It doesn't do any tokenization of the actual value of the header, > though. This probably has little value. Most messages don't currently contain Resent-*: headers, so the occurrence is not currently valuable. Depending on which authentication schemes eventually come into use, the occurrence of these may become a spam indicator as it is a back door around some of the currently proposed authentication schemes, such as Yahoo DomainKeys (sad but true). > > However, the value for the four headers you suggested are all > address lists, right (i.e. in the same form as a "To:" header). Exactly. > In that case, they could be added to Tokenizer:address_headers > which would mean that you'd get tokens like 'Resent-From:none', > 'Resent-From:invalid', and 'Resent-From:addr:ta-meyer@ihug.co.nz'. This works. <...> > If not, then the best available at the moment is to read the source of > Options.py, which ought to be reasonably readable. The 1.0.1 version is > here: > > http://cvs.sf.net/viewcvs.py/spambayes/spambayes/spambayes/Options.py?rev=1. 107.4.1&view=markup Thanks. Perfectly readable. [Seth Goodman] > > One of my email accounts also has a special header for Brightmail > > detected spam that would be helpful to tokenize. This is not the > > Brightmail tracker header itself, but one that my ISP adds. > > Yes, I get one of these too - "X-IHUG-iSpy". It's probably some sort of > Brightmail option. [Seth Goodman] > > This header is always the same text and is as follows: > > > > X-TDS-Spam: Potential Spam > > > > Is there any support for tokenizing special headers like this? Could I > > manually add this to the list of safe headers to tokenize with that > > option? > > Yes, however this will only be of use if the header doesn't appear in some > mail, otherwise the token will be 'Header:X-TDS-Spam:1' for every message, > and be no use at all. My "X-IHUG-iSpy" header appears even if the message > isn't thought to be spam (then the value is "Doesn't appear to be Spam"), so > this would be the case for me. If in your case the header isn't present for > such mail, then this would work. This would work for me. > > To generate tokens with the content, too, you'll need to either write > specific code for it (like the Habeas(tm) headers, for example), or use > Tokenizer:basic_header_tokenize (off by default). If that option is > enabled, then all headers generate tokens in the form "header:value", unless > the header is listed in the Tokenizer:basic_header_skip option. This would work for your header. As long as I have to modify the source code to change the default list for one or more options, I may as well do something that is useful to others. My suspicion is that a number of users have ISP's that tag spam with programs other than SpamAssassin, so some facility to do this through the configuration file might be useful. One idea is an additional Tokenizer option called special_header_present. The user would list the specific text of the header after the option. The option would cause the tokenizer to generate a token with the count of the header noted. For example, I would have a single entry, since my ISP only puts the header in if it thinks the message is spam: [Tokenizer] special_header_present:"X-TDS-Spam: Potential Spam" Tony would have at least two entries, since the header is always there and the content indicates if the ISP thinks it is spam: [Tokenizer] special_header_present:"X-IHUG-iSpy: Spam" special_header_present:"X-IHUG-iSpy: Doesn't appear to be Spam" Another possibility would be to have an option for special_header_content and produce a single token for the string that appears in the header only if the header is present. This option would probably work better for inboxes that collect from multiple accounts as there would not be any zero-count tokens to skew the scores. Does anyone else think this is useful or is it a waste of time? -- Seth Goodman From tameyer at ihug.co.nz Mon Dec 20 02:28:17 2004 From: tameyer at ihug.co.nz (Tony Meyer) Date: Mon Dec 20 02:28:53 2004 Subject: [spambayes-dev] RE: [Spambayes] Translation to Spanish In-Reply-To: Message-ID: > Hello I'm not a programmer but I can help to translate > the software to spanish; I have many people asking for > a good AntiSpam Outlook plug-in ... > > Someone can help me to mantain a Spanish version for this > great program ?? One of the things we're trying to have done for the 1.1 version is to make SpamBayes easily translatable, and include a couple of languages in the release. Hernan Foffani has done much of this work, and the CVS version of SpamBayes is reasonably ready (particularly Outlook) for translation. I believe he has done some of the work for Spanish (that's es_ES, yes?). He would be better able to explain what can be done to help at this point. (Well, I believe all of the documentation needs to be done - although I should really finish tidying up the English version first). Anyway, please feel free to discuss this on spambayes-dev :) =Tony.Meyer From tameyer at ihug.co.nz Mon Dec 20 02:51:20 2004 From: tameyer at ihug.co.nz (Tony Meyer) Date: Mon Dec 20 02:51:55 2004 Subject: [spambayes-dev] RE: [Spambayes] Re: Training empty messages problem In-Reply-To: Message-ID: > I have the version info, but not under the same property id numbers. [...] > Looks like this property id isn't a fixed value. [...] > Maybe we should just use a fixed "X-Mailer: Microsoft > Exchange" and forget the version number. +1. =Tony.Meyer From adrian at apsistemas.info Mon Dec 20 21:01:22 2004 From: adrian at apsistemas.info (Adrian Perello Marin) Date: Mon Dec 20 21:01:35 2004 Subject: [spambayes-dev] Translation to Spanish Message-ID: <004901c4e6ce$af013580$4501a8c0@samsungx10> Hello I'm not a programmer but I can help to translate the software to spanish; I have many people asking for a good AntiSpam Outlook plug-in ... Someone can help me to mantain a Spanish version for this great program ?? Regards. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes-dev/attachments/20041220/82de1b21/attachment.htm From mgerber at leitwerk.de Tue Dec 21 14:26:08 2004 From: mgerber at leitwerk.de (Mike Gerber) Date: Tue Dec 21 14:26:39 2004 Subject: [spambayes-dev] Translation to Spanish In-Reply-To: <004901c4e6ce$af013580$4501a8c0@samsungx10> References: <004901c4e6ce$af013580$4501a8c0@samsungx10> Message-ID: <20041221132608.GB13875@nin.lan.rwsr-xr-x.de> Hi, > Hello I'm not a programmer but I can help to translate the software to > spanish; I have many people asking for a good AntiSpam Outlook plug-in ... > Someone can help me to mantain a Spanish version for this great program ?? And we would like to contribute support for German. I had a (brief) look at the source code of the plugin and it seems there's much/all of the text hardcoded. So the first would be to introduce gettext or a similiar mechanism (haven't looked at Python i18n yet) - I could help doing that, what do the developers think? Cheers, Mike -- ------------------------------------------------------------------ Mike Gerber Management Internet/Security Development LEITWERK GmbH http://www.leitwerk.de Im Ettenbach 13a Fon: +49 7805 918 0 77767 Appenweier Fax: +49 7805 918 200 ------------------------------------------------------------------ -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes-dev/attachments/20041221/3bfffcdd/attachment.pgp From richie at entrian.com Tue Dec 21 21:39:10 2004 From: richie at entrian.com (Richie Hindle) Date: Tue Dec 21 21:39:18 2004 Subject: [spambayes-dev] Re: [Spambayes] WHICH? Magazine PRE-PUBLICATION CHECK: Anti-virus software In-Reply-To: References: Message-ID: Hello Cecilia, Thanks for your email to spambayes@python.org regarding SpamBayes. > Dear Marketing and PR teams Spambayes is an Open Source project, developed by volunteers. It has no marketing or PR teams. (I'm one of the developers.) Being Open Source, you are free to use it, distribute it, review it, comment on it, modify it, and do pretty much anything with it apart from claiming ownership of it or suing its developers. The licensing terms are here: http://cvs.sourceforge.net/viewcvs.py/*checkout*/spambayes/spambayes/LICENSE.txt?rev=1.4 > RE: WHICH? Magazine PRE-PUBLICATION CHECK: Anti-virus software SpamBayes is an anti-spam tool, not an anti-virus tool. It is of some use defending against email-bourne viri, but it's not designed as an anti-virus tool. > I am sending you a set of checking sheets for Which? Magazine. These are in an Excel file, which I don't want to open precisely because of the possibility of viruses... Could you send a plain-text version? > It would also be helpful if you would complete, sign and return this > information to us no later than Tuesday 4 January 2005. Because SpamBayes is developed by volunteers, who are distributed around the world, it has no central management and all communication is electronic. If it's vitally important that you have a signature, I can sign on behalf of the development team (if no-one has any objections) but we prefer to work over email. As I explained above, that shouldn't present any problems. Thanks for your interest, -- Richie Hindle richie@entrian.com From tameyer at ihug.co.nz Tue Dec 21 23:15:46 2004 From: tameyer at ihug.co.nz (Tony Meyer) Date: Tue Dec 21 23:16:24 2004 Subject: [spambayes-dev] RE: [Spambayes] WHICH? Magazine PRE-PUBLICATION CHECK: Anti-virussoftware In-Reply-To: Message-ID: > These are in an Excel file, which I don't want to open precisely because > of the possibility of viruses... Could you send a plain-text version? I opened it (somewhere it couldn't do any damage) and it seems ok, although it includes macros, which I disabled since they are the most risk. The contents (filled in) are at the end of this message. > Because SpamBayes is developed by volunteers, who are distributed around > the world, it has no central management and all communication is > electronic. If it's vitally important that you have a signature, I can > sign on behalf of the development team (if no-one has any objections) FWIW, I have none. > but we prefer to work over email. As I explained above, that shouldn't > present any problems. The instructions in the Excel file said that the file could be emailed back, so I presume that the signature is not that important. If it is, you can replace my name (below) with yours (Richie) if you want :) =Tony.Meyer --- Excel file contents follow --- Anti virus software Pre-publication Check Which? Limited Correct at 16 December 2004 Please check the information supplied carefully as it could be used for publication. Company / Product / Brand Name SpamBayes Requirements: Please check and amend the details below 1 Product specification Name SpamBayes Version number 1.0 Release CD No DVD No Internet download Yes Win Yes Linux Yes Mac Yes Phone Support None Extra costs None PLEASE COMPLETE BELOW BEFORE RETURNING Changes Do you plan any changes to the above in the next 6 months? (in confidence, not for publication) Yes If YES, please give brief details of what the changes will be? A new version (both a bugfix release and a new minor release) will be released, but not other than that. When they will come into effect? January/March, probably. When we can contact you for details? You can contact spambayes@python.org any time you like. Pre-publication check completed by I confirm the information provided in this questionnaire is correct as at 16 December 2004. Name Tony Meyer Position Developer Department N/A Direct telephone N/A Direct Fax N/A Email spambayes@python.org Address N/A Please return this pre-publication check to: Cecilia.Desouza@which.co.uk By 04 January 2005. If you cannot meet this deadline, please phone Cecilia De Souza on 020 7770 7683 MANY THANKS FOR YOUR HELP From richie at entrian.com Wed Dec 22 00:21:29 2004 From: richie at entrian.com (Richie Hindle) Date: Wed Dec 22 00:21:38 2004 Subject: [spambayes-dev] Re: [Spambayes] WHICH? Magazine PRE-PUBLICATION CHECK: Anti-virussoftware In-Reply-To: References: Message-ID: <1pbhs0tgg8pv3soq7fvqml7121rkr97qth@4ax.com> [Tony] > I opened it (somewhere it couldn't do any damage) and it seems ok Thanks, Tony. > you can replace my name (below) with yours (Richie) if you want :) I don't think that's necessary. 8-) Did you send the completed spreadsheet back to Cecilia? I doubt anyone would object to your doing so, having seen the contents of it. -- Richie Hindle richie@entrian.com From kennypitt at hotmail.com Tue Dec 21 16:05:33 2004 From: kennypitt at hotmail.com (Kenny Pitt) Date: Wed Dec 22 02:32:05 2004 Subject: [spambayes-dev] RE: [Spambayes-checkins] spambayes/scripts sb_server.py, 1.32, 1.33 In-Reply-To: Message-ID: Tony Meyer wrote: > Kenny's fix worked if there was more than +OK, but not if there was > just +OK. Fix the fix so that both cases work. Oops, sorry, that was my limited Python experience showing through. I assumed that split() would just return None for the second value if there was no separator. Thanks for covering me! BTW, any idea what might have happened to the stats page in sb_server? I'm looking for the problem right now, but every time I try to view stats I get a "500 Server error" with the following traceback: """ Traceback (most recent call last): File "C:\src\python\spambayes\spambayes\Dibbler.py", line 461, in found_terminator getattr(plugin, name)(**params) File "C:\src\python\spambayes\spambayes\UserInterface.py", line 915, in onStats s = Stats.Stats() File "C:\src\python\spambayes\spambayes\Stats.py", line 50, in __init__ self.CalculateStats() File "C:\src\python\spambayes\spambayes\Stats.py", line 72, in CalculateStats msginfoDB._getState(m) AttributeError: 'MessageInfoDB' object has no attribute '_getState' """ I just upgraded to Python 2.4 Final yesterday, so don't know if that might have something to do with it. -- Kenny Pitt From tameyer at ihug.co.nz Wed Dec 22 02:36:15 2004 From: tameyer at ihug.co.nz (Tony Meyer) Date: Wed Dec 22 02:36:50 2004 Subject: [spambayes-dev] RE: [Spambayes-checkins] spambayes/scriptssb_server.py, 1.32, 1.33 In-Reply-To: Message-ID: Odd. I just got this message, but I presume it's out of date now? > Oops, sorry, that was my limited Python experience showing > through. I assumed that split() would just return None for > the second value if there was no separator. Thanks for > covering me! Since you were covering my bug, it's only fair . > BTW, any idea what might have happened to the stats page in > sb_server? I'm looking for the problem right now, but every > time I try to view stats I get a "500 Server error" with the > following traceback: [...] > AttributeError: 'MessageInfoDB' object has no attribute > '_getState' """ This ought to work now - but (as above) I think this is what you checked a fix in for some time back now, anyway. =Tony.Meyer From tameyer at ihug.co.nz Wed Dec 22 03:01:54 2004 From: tameyer at ihug.co.nz (Tony Meyer) Date: Wed Dec 22 03:02:31 2004 Subject: [spambayes-dev] Translation to Spanish In-Reply-To: Message-ID: > And we would like to contribute support for German. I had a > (brief) look at the source code of the plugin and it seems > there's much/all of the text hardcoded. > > So the first would be to introduce gettext or a similiar mechanism > (haven't looked at Python i18n yet) - I could help doing > that, what do the developers think? I had hoped that Hatuka Nezumi would have responded to the earlier message, but I haven't heard anything from him for a while (busy, perhaps). He is leading the i18n process for SpamBayes (I'm helping and doing the checking in). Just about all the work adding gettext (etc) to the code is done. If you want to look at it/modify it you'll need to get it from CVS (no i18n is in 1.0.1, it'll appear in 1.1). Both the Outlook plug-in and the web interface have been set up. To translate the majority of the web interface, you should be able to simply get hold of the CVS copy of spambayes\spambayes\resources\ui.html and provide a translation of that. (There are a few things not in there, like the statistics data, but we'll get to those as necessary). To translate the majority of the Outlook dialogs, you should be able to use the gettext tools. If you're familiar with them, that should be simple - if not, I think I can make a template file, which is what needs to be translated. There are again a few items that this will not include, but they will also be got to at some point. There's also the documentation - but please say if you are interested in translating this, as I'll hurry up and update it for the 1.1 release so that it changes as little as possible after translation. Finally there is figuring out a release process (language packs, translated installers, etc), but that's a while off yet. =Tony.Meyer From tameyer at ihug.co.nz Wed Dec 22 03:14:09 2004 From: tameyer at ihug.co.nz (Tony Meyer) Date: Wed Dec 22 03:14:45 2004 Subject: [spambayes-dev] RE: [Spambayes] WHICH? Magazine PRE-PUBLICATION CHECK:Anti-virussoftware In-Reply-To: Message-ID: [Richie] > Thanks, Tony. The filename had "spam results" in it, so I thought maybe it was some sort of analysis/test results, so was curious...(it wasn't, at all). > I don't think that's necessary. 8-) Did you send the > completed spreadsheet back to Cecilia? I doubt anyone would > object to your doing so, having seen the contents of it. I will do, then. =Tony.Meyer From tameyer at ihug.co.nz Wed Dec 22 04:55:51 2004 From: tameyer at ihug.co.nz (Tony Meyer) Date: Wed Dec 22 04:56:27 2004 Subject: [spambayes-dev] Version Data In-Reply-To: Message-ID: [Kenny, way back in February] > My proposal is that all apps share the source release version > as their primary version number. The shared release version > would consist of the following parts: > > * A major/minor version number (1.0) > * A release number that would increment for each release. If > alpha9 is our ninth release then the release number would be 9, > but it would increment to 10 for beta1. The release number could > be reset to 1 when we move on to 1.1 development (although it > wouldn't have to be). The purpose of this is to give us a three-part > version number major.minor.release that is always increasing. > * A string representation of the version ("1.0a9" or "1.0b1"). The > binary major.minor.release version would be used for version check > comparisons, but this string representation is what would be > visible to the user. > * A release date ("Feb 2004") This all sounds pretty good to me - I think we might as well make the bits all separate, e.g: { 'major': 1, 'minor': 1, 'release': 0, 'prerelease': 1, # i.e. first alpha release 'date': 'January 2005', 'download_page': 'http://spambayes.sourceforge.net/windows.html', 'release_notes_page': 'http://whatever', 'short_string': spambayes.__version__, 'long_string': "SpamBayes version %(major)d.%(minor)d.%(release)d.%(prerelease)d (%(date)s)", } (short_string and long_string could just be functions, of course). > In addition to the shared version info, the engine and each > application would have a separate "revision number" that we would > increment during development to track changes specific to that app. I've completely gone off the idea of having anything separate for the different apps. One release for them all is much simpler. > OK, I think that's more than enough to fuel the fire for now, > so have at it! Must have been plenty of fire to keep it simmering this long . =Tony.Meyer From hernan at orgmf.com.ar Wed Dec 22 12:32:34 2004 From: hernan at orgmf.com.ar (=?iso-8859-1?Q?Hern=E1n_Mart=EDnez_Foffani?=) Date: Wed Dec 22 12:33:22 2004 Subject: [spambayes-dev] Translation to Spanish Message-ID: >> And we would like to contribute support for German. I had a >> (brief) look at the source code of the plugin and it seems >> there's much/all of the text hardcoded. >> >> So the first would be to introduce gettext or a similiar mechanism >> (haven't looked at Python i18n yet) - I could help doing >> that, what do the developers think? > > I had hoped that Hatuka Nezumi would have responded to the earlier > message, but I haven't heard anything from him for a while (busy, > perhaps). He is leading the i18n process for SpamBayes (I'm helping > and doing the checking in). I guess that's me, Hern?n Foffani. Yes, I'm very busy these days and will be out of town for 3 weeks soon. I'm very sorry. > Just about all the work adding gettext (etc) to the code is done. If > you want to look at it/modify it you'll need to get it from CVS (no > i18n is in > 1.0.1, it'll appear in 1.1). Both the Outlook plug-in and the web > interface have been set up. Did you get to add gettext to the spambayes directory (Options.py and like)? > To translate the majority of the Outlook dialogs, you should be able > to use the gettext tools. If you're familiar with them, that should > be simple - if not, I think I can make a template file, which is what > needs to be translated. There are again a few items that this will > not include, but they will also be got to at some point. Ideally the dialogs should be translated using a dialog editor like Visual Studio but gettext tools could be used as an alternative if yours truly get away from his compromise. -H. From kenny.pitt at gmail.com Wed Dec 22 15:45:03 2004 From: kenny.pitt at gmail.com (Kenny Pitt) Date: Wed Dec 22 15:45:42 2004 Subject: [spambayes-dev] RE: [Spambayes-checkins] spambayes/Outlook2000 filter.py, 1.43, 1.44 manager.py, 1.104, 1.105 msgstore.py, 1.97, 1.98 In-Reply-To: Message-ID: <41c98870.29ddc686.0534.021c@smtp.gmail.com> Tony Meyer wrote: > Update of /cvsroot/spambayes/spambayes/Outlook2000 > In directory > sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv11996/Outlook2000 > > Modified Files: > filter.py manager.py msgstore.py > Log Message: > It makes life much simpler if the classification strings match the > non-Outlook ones. > > Thresholds are 0-100, cutoffs are 0.0-1.0 - need to convert between > them, or everything is spam! > > Index: filter.py > =================================> RCS file: /cvsroot/spambayes/spambayes/Outlook2000/filter.py,v > retrieving revision 1.43 > retrieving revision 1.44 > diff -C2 -d -r1.43 -r1.44 > *** filter.py 22 Dec 2004 00:30:26 -0000 1.43 > --- filter.py 22 Dec 2004 01:22:00 -0000 1.44 > *************** > *** 17,29 **** > disposition > attr_prefix > ! msg.c > elif prob_perc >> disposition > attr_prefix > ! msg.c > else: > disposition > attr_prefix > ! msg.c > > ms > --- 17,30 ---- > disposition > attr_prefix > ! msg.c > elif prob_perc >> config.unsure_threshold: disposition > attr_prefix > ! msg.c > else: > disposition > attr_prefix > ! msg.c > ! mgr.classifier_data.message_db.store_msg(msg) > > ms I believe the non-Outlook versions store fixed values of 's', 'h', or 'u' in the msg.c field. See the RememberClassification function in spambayes.Message which contains the comment: """ # this must store state independent of options settings, as they # may change, which would really screw this database up """ Using only the first character of the configured header strings could be especially bad if the user configured the strings to something like "***SPAM***", "***GOOD***", and "***UNSURE***". -- Kenny Pitt From kenny.pitt at gmail.com Wed Dec 22 16:44:15 2004 From: kenny.pitt at gmail.com (Kenny Pitt) Date: Wed Dec 22 16:44:20 2004 Subject: [spambayes-dev] RE: Version Data In-Reply-To: Message-ID: <41c99651.26d454bb.0534.024e@smtp.gmail.com> Tony Meyer wrote: > [Kenny, way back in February] >> My proposal is that all apps share the source release version >> as their primary version number. The shared release version >> would consist of the following parts: >> >> * A major/minor version number (1.0) >> * A release number that would increment for each release. If >> alpha9 is our ninth release then the release number would be 9, >> but it would increment to 10 for beta1. The release number could >> be reset to 1 when we move on to 1.1 development (although it >> wouldn't have to be). The purpose of this is to give us a three-part >> version number major.minor.release that is always increasing. >> * A string representation of the version ("1.0a9" or "1.0b1"). The >> binary major.minor.release version would be used for version check >> comparisons, but this string representation is what would be >> visible to the user. >> * A release date ("Feb 2004") > > This all sounds pretty good to me - I think we might as well make the > bits all separate, e.g: > > { > 'major': 1, > 'minor': 1, > 'release': 0, > 'prerelease': 1, # i.e. first alpha release > 'date': 'January 2005', > 'download_page': 'http://spambayes.sourceforge.net/windows.html', > 'release_notes_page': 'http://whatever', > 'short_string': spambayes.__version__, > 'long_string': "SpamBayes version > %(major)d.%(minor)d.%(release)d.%(prerelease)d (%(date)s)", > } > > (short_string and long_string could just be functions, of course). > >> In addition to the shared version info, the engine and each >> application would have a separate "revision number" that we would >> increment during development to track changes specific to that app. > > I've completely gone off the idea of having anything separate for the > different apps. One release for them all is much simpler. As long as we have only one version, I think a lot of this could be stored once in the main "spambayes" package __init__.py file and eliminate all the duplication. We could put the following values into __init__.py: """ __version__ = "1.1a1" __date__ = "January 2005" """ distutils has a StrictVersion class which parses a version string similar to what we use with versions of the form "1.0", "1.0.1", "1.1a1", etc. It also provides a compare function for comparing two version numbers, which would be great to replace the float version number stuff we're currently using for update checks. It doesn't support the "1.0rc2" release candidate format, but that shouldn't take more than a couple of minutes to add. For consistency, I think it would be good to provide the separate components of the version in the format of the sys.version_info tuple. This could be incorporated into the Version class along with the function to format the long form of the version. >From what I understand, we're free to put any custom metadata we want in the __init__.py file, so we could even include the download and release notes URLs there. -- Kenny Pitt From tameyer at ihug.co.nz Thu Dec 23 02:08:43 2004 From: tameyer at ihug.co.nz (Tony Meyer) Date: Thu Dec 23 02:09:20 2004 Subject: [spambayes-dev] Translation to Spanish In-Reply-To: Message-ID: > I guess that's me, Hern?n Foffani. Many apologies. I wasn't paying enough attention and grabbed the name of an email with i18n in the subject, but it was the wrong one. > Yes, I'm very busy these > days and will be out of town for 3 weeks soon. I'm very sorry. It's not a problem at all! We appreciate any help, and all of us are too busy to work on this at some point. > Did you get to add gettext to the spambayes directory > (Options.py and like)? No. I can do this next, then. =Tony.Meyer From tameyer at ihug.co.nz Thu Dec 23 02:37:36 2004 From: tameyer at ihug.co.nz (Tony Meyer) Date: Thu Dec 23 02:38:12 2004 Subject: [spambayes-dev] RE: Version Data In-Reply-To: Message-ID: > As long as we have only one version, I think a lot of this > could be stored once in the main "spambayes" package > __init__.py file and eliminate all the duplication. +0. > distutils has a StrictVersion class which parses a version > string similar to what we use with versions of the form > "1.0", "1.0.1", "1.1a1", etc. It also provides a compare > function for comparing two version numbers, which would be > great to replace the float version number stuff we're > currently using for update checks. I wasn't aware of that class - I agree, it seems like a perfect choice. > It doesn't support the > "1.0rc2" release candidate format, but that shouldn't take > more than a couple of minutes to add. We could switch to just "c" (which is nice, since it's then 'a', 'b', 'c') and we'd then match the Python distribution itself. In that case, this addition might even be worth submitting as a distutils patch. (Although from a quick look, it's a case of adding a single character ('c') to a regular expression...) > For consistency, I think it would be good to provide the > separate components of the version in the format of the > sys.version_info tuple. This could be incorporated into the > Version class along with the function to format the long form > of the version. That should be simple, too. It's just .version + .prerelease to get (eg) (1,1,0,'a',1), and then a simple conversion of 'a'->'alpha', etc. > From what I understand, we're free to put any custom metadata > we want in the __init__.py file, so we could even include the > download and release notes URLs there. There's also the 'check for new version' stuff. I guess that could also move to __init__.py - I gather that just means that importing any of the spambayes.* modules will import that as well, right? As long as it doesn't actually do any work on import, it doesn't seem like that would hurt. Assuming no-one else pipes up with objections, would you like to make the changes, or shall I? =Tony.Meyer From tameyer at ihug.co.nz Thu Dec 23 03:02:20 2004 From: tameyer at ihug.co.nz (Tony Meyer) Date: Thu Dec 23 03:02:55 2004 Subject: [spambayes-dev] RE: [Spambayes-checkins] spambayes/Outlook2000filter.py, 1.43, 1.44 manager.py, 1.104, 1.105 msgstore.py, 1.97, 1.98 In-Reply-To: Message-ID: > I believe the non-Outlook versions store fixed values of 's', > 'h', or 'u' in the msg.c field. [...] > Using only the first character of the configured header > strings could be especially bad if the user configured the strings > to something like "***SPAM***", "***GOOD***", and "***UNSURE***". You'd have to be a right idiot to change those values if you used the Outlook plug-in, since they have absolutely no effect (other than this breakage). However, against the chance of an idiot somewhere in the world doing exactly that and complaining to us about it, I've checked in a fix for this; thanks :) =Tony.Meyer From tameyer at ihug.co.nz Thu Dec 23 06:25:26 2004 From: tameyer at ihug.co.nz (Tony Meyer) Date: Thu Dec 23 06:25:58 2004 Subject: [spambayes-dev] RE: [Spambayes-checkins] spambayes/Outlook2000/dialogs/resourcesdialogs.h, 1.22, 1.23 dialogs.rc, 1.48, 1.49 In-Reply-To: Message-ID: [...] > Modified Files: > dialogs.h dialogs.rc > Log Message: > Add a button on the Statistics tab to reset the Outlook > statistics. Also display the date when the statistics were last reset. This looks good :) I've just made the whole dialog larger: (a) it seems to work, but are the changes ok? I did some of it in VC++ and the rest by hand. I hate those things :) (b) is this too large? I'm open to reverting this change. The extra room does offer a number of advantages: * It's always bugged me a little that the general tab never really had enough room if you filtered more than a couple of folders (or if they just had long names). * The 'ham folder' can be set via the GUI now. This does get requested relatively often, and is a bugger to set manually, so is nice to have. * The stats all fit now (my cost ones didn't if there were any fp/fn's), plus we can squeeze one or two more in, or a button of some type (export? show graphs?) in the future. * It's becoming apparent that training is quite important in getting good results, so I suspect there will be some sort of extra training option soon, and the training tab now has room for that. What do Outlook people think? I'm away until the 27th, so if everyone hates it you'll have to put up with it until then or revert it yourself :) Merry Christmas (or whatever holiday greeting you prefer) to all :) =Tony.Meyer From josh at joshholtzman.com Thu Dec 23 08:38:58 2004 From: josh at joshholtzman.com (Josh Holtzman) Date: Thu Dec 23 09:04:39 2004 Subject: [spambayes-dev] Multiple IMAP accounts Message-ID: <20041223080438.BED0D1E4003@bag.python.org> According to the documentation, if I need to connect to multiple IMAP servers through SpamBayes, I should "Please let the mailing list know if you are in this situation so that we can consider coming up with a better solution." I'm in that situation, so I'm going to stick with Outlook's crappy junk mail controls for now and keep checking to see when multiple IMAP servers will be easily supported. (I have 2 IMAP accounts because I want to keep my work and my old consulting company mail separate. a forward just won't do.) Thanks, and keep up the great work! Josh Holtzman -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes-dev/attachments/20041222/28b2952d/attachment.htm From kenny.pitt at gmail.com Thu Dec 23 15:20:01 2004 From: kenny.pitt at gmail.com (Kenny Pitt) Date: Thu Dec 23 15:20:04 2004 Subject: [spambayes-dev] RE: Version Data In-Reply-To: Message-ID: <41cad412.121de6c5.1924.108c@smtp.gmail.com> Tony Meyer wrote: >> It doesn't support the >> "1.0rc2" release candidate format, but that shouldn't take >> more than a couple of minutes to add. > > We could switch to just "c" (which is nice, since it's then 'a', 'b', > 'c') and we'd then match the Python distribution itself. In that > case, this addition might even be worth submitting as a distutils > patch. (Although from a quick look, it's a case of adding a single > character ('c') to a regular expression...) Not necessary, it was a simple change to the regex to look for either the 'a' or 'b' characters or the 'rc' string. >> For consistency, I think it would be good to provide the >> separate components of the version in the format of the >> sys.version_info tuple. This could be incorporated into the >> Version class along with the function to format the long form >> of the version. > > That should be simple, too. It's just .version + .prerelease to get > (eg) (1,1,0,'a',1), and then a simple conversion of 'a'->'alpha', etc. Yep, easy enough. My SBVersion class now stores the version in a "version_info" field just like sys.version_info. The nice thing about this is that the defined strings for the release level are strictly increasing alphabetically from earliest "alpha" prerelease up to the "final" stable release. This allows versions to be compared with a straight comparison of the version_info tuples. >> From what I understand, we're free to put any custom metadata >> we want in the __init__.py file, so we could even include the >> download and release notes URLs there. > > There's also the 'check for new version' stuff. I guess that could > also move to __init__.py - I gather that just means that importing > any of the spambayes.* modules will import that as well, right? As > long as it doesn't actually do any work on import, it doesn't seem > like that would hurt. On further inspection, the "Download Page" value is only used from the version.cfg, not from the local version. The comment was that this allows us to change the download page from the server side, which seems like a good idea so I think we should just leave it there. > Assuming no-one else pipes up with objections, would you like to make > the changes, or shall I? Almost done, so I guess I'll handle it. I just need to figure out what to generate in the version.cfg for the "Outlook" and "POP3 Proxy" sections so that existing 1.0.x versions will be able to update properly when 1.1 is released. Speaking of upgrades: now that we have a 1.0 "final" out there, are we only going to notify users of new "final" releases or do we need to add a way for them to check for new pre-release versions as well? -- Kenny Pitt From kenny.pitt at gmail.com Thu Dec 23 18:21:34 2004 From: kenny.pitt at gmail.com (Kenny Pitt) Date: Thu Dec 23 18:21:38 2004 Subject: [spambayes-dev] RE: [Spambayes-checkins] spambayes/Outlook2000/dialogs/resourcesdialogs.h, 1.22, 1.23 dialogs.rc, 1.48, 1.49 In-Reply-To: Message-ID: <41cafe9f.7d0728eb.4f1b.065f@smtp.gmail.com> Tony Meyer wrote: > I've just made the whole dialog larger: > > (a) it seems to work, but are the changes ok? I did some of it in > VC++ and the rest by hand. I hate those things :) It loads fine here and all the tabs look good, so that's really all that matters, right? I did tweak one value in the .H file that Visual Studio uses when allocating new control ids. > (b) is this too large? I'm open to reverting this change. The old dialog was the maximum height that would fit on a 640x480 display. I doubt anyone is still using 640x480 resolution, especially since Outlook would be an incredibly tight squeeze at that res anyway, so I think the new size should be fine. -- Kenny Pitt From chanckowiak at socataaircraft.com Mon Dec 27 22:15:46 2004 From: chanckowiak at socataaircraft.com (Christophe Hanckowiak) Date: Mon Dec 27 22:38:08 2004 Subject: [spambayes-dev] will it work ? Message-ID: <200412271515546.SM01876@CHANCKOWIAK> Skipped content of type multipart/alternative-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 6214 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes-dev/attachments/20041227/38899a65/attachment.jpe From tameyer at ihug.co.nz Tue Dec 28 23:44:31 2004 From: tameyer at ihug.co.nz (Tony Meyer) Date: Tue Dec 28 23:45:48 2004 Subject: [spambayes-dev] Multiple IMAP accounts In-Reply-To: Message-ID: > According to the documentation, if I need to connect > to multiple IMAP servers through SpamBayes, I should > "Please let the mailing list know if you are in this > situation so that we can consider coming up with a > better solution." > > I'm in that situation, so I'm going to stick with > Outlook's crappy junk mail controls for now and keep > checking to see when multiple IMAP servers will be > easily supported. Are you wanting to use multiple IMAP accounts with the Outlook plug-in? If so, then that should work fine already. The quote above refers only to sb_imapfilter, which is used if you do not use Outlook. If you're wanting sb_imapfilter to work with multiple IMAP accounts, please let us know, and we'll try and get this done for 1.1 (it shouldn't be that difficult). =Tony.Meyer From tameyer at ihug.co.nz Wed Dec 29 04:06:59 2004 From: tameyer at ihug.co.nz (Tony Meyer) Date: Wed Dec 29 04:07:35 2004 Subject: [spambayes-dev] RE: Version Data In-Reply-To: Message-ID: > I just need > to figure out what to generate in the version.cfg for the > "Outlook" and "POP3 Proxy" sections so that existing 1.0.x > versions will be able to update properly when 1.1 is released. Good point - although we could always change the download page value so that 1.0.x looks in a different place than 1.1.x. I gather you've got it going anyway, though, so it'll work with the same file. > Speaking of upgrades: now that we have a 1.0 "final" out > there, are we only going to notify users of new "final" > releases or do we need to add a way for them to check for new > pre-release versions as well? Good question :) I presume this will do the 'wrong' thing at the moment, since 1.1a1 is technically a higher version than 1.0.1. I think the most sensible thing would be to only suggest upgrading when there is a newer final version out. spambayes-announce and the website will cover prereleases coming out, which should be enough (since they may not actually work well anyway). Is that a simple change to the code? An alternative would be to enhance the 'new version' dialog so that if I'm running 1.0.1 and I check for a new version I get something like: """ You are using the most recent version available (SpamBayes 1.0.1). """ """ You are using the most recent version available (SpamBayes 1.0.1). There is also a prerelease (testing) version available (SpamBayes 1.1a1). If you would like to help test this version, you may download and install this. """ """ There is a new version available (SpamBayes 1.0.2). We suggest that you upgrade when possible. Alternatively, there is a prerelease (testing) version available (SpamBayes 1.1a1). If you would like to help test this version, you may download and install this. """ """ There is a new version available (SpamBayes 1.1). We suggest that you upgrade when possible. """ I presume that once there is a 1.1 final, we'll no longer be doing any 1.0.x bugfix releases. =Tony.Meyer From tameyer at ihug.co.nz Wed Dec 29 06:58:06 2004 From: tameyer at ihug.co.nz (Tony Meyer) Date: Wed Dec 29 06:58:43 2004 Subject: [spambayes-dev] RE: [Spambayes-checkins] spambayes/Outlook2000/dialogs/resourcesdialogs.h, 1.22, 1.23 dialogs.rc, 1.48, 1.49 In-Reply-To: Message-ID: > > (b) is this too large? I'm open to reverting this change. > > The old dialog was the maximum height that would fit on a > 640x480 display. I doubt anyone is still using 640x480 > resolution, especially since Outlook would be an incredibly > tight squeeze at that res anyway, so I think the new size > should be fine. Hmmm. I'd forgotten how small 640x480 was. It's pretty huge at 800x600, too. I suppose that if no-one else speaks up here, we can just leave it like this, and see what the response to the alpha is. 1.0.2 will still work with 640x480, of course, so it would just mean that they couldn't gain the many new features in 1.1. =Tony.Meyer From josh at joshholtzman.com Wed Dec 29 07:58:31 2004 From: josh at joshholtzman.com (Josh Holtzman) Date: Wed Dec 29 07:58:53 2004 Subject: [spambayes-dev] Multiple IMAP accounts In-Reply-To: Message-ID: <20041229065852.ACFEC1E4003@bag.python.org> I see -- I should have read the documentation more carefully (or less carefully... just running installation would have worked perfectly!). I now have the Outlook plugin filtering both of my IMAP accounts. Thanks so much, Josh -----Original Message----- From: Tony Meyer [mailto:tameyer@ihug.co.nz] Sent: Tuesday, December 28, 2004 2:45 PM To: 'Josh Holtzman'; spambayes-dev@python.org Subject: RE: [spambayes-dev] Multiple IMAP accounts > According to the documentation, if I need to connect > to multiple IMAP servers through SpamBayes, I should > "Please let the mailing list know if you are in this > situation so that we can consider coming up with a > better solution." > > I'm in that situation, so I'm going to stick with > Outlook's crappy junk mail controls for now and keep > checking to see when multiple IMAP servers will be > easily supported. Are you wanting to use multiple IMAP accounts with the Outlook plug-in? If so, then that should work fine already. The quote above refers only to sb_imapfilter, which is used if you do not use Outlook. If you're wanting sb_imapfilter to work with multiple IMAP accounts, please let us know, and we'll try and get this done for 1.1 (it shouldn't be that difficult). =Tony.Meyer From tameyer at ihug.co.nz Thu Dec 30 08:08:01 2004 From: tameyer at ihug.co.nz (Tony Meyer) Date: Thu Dec 30 08:08:38 2004 Subject: [spambayes-dev] RE: [Spambayes] Translation offer In-Reply-To: Message-ID: > I would be happy to participate in doing the french > translation, if this has not been done before. I can > translate text from the program itself and web pages as well. That would be fantastic! It's looking like we'll manage to have Spanish and German (and English) translations with the 1.1 version, so adding French would be great. The best thing would be if you could subscribe to the spambayes-dev mailing list for the moment: It's a very low-volume list for discussion about the development of SpamBayes. There is some discussion about the translation effort at the moment (the archives will have the messages) - you can just ignore the other messages. The status at the moment is that most of the work required to be done to the code to let translation be a reasonably simple process has been done. If you're in a hurry to get started, then to translate the web interface most of what you need to do is translate the ui.html file (get the CVS copy, e.g. from ). To translate the Outlook plug-in, you mostly need to translate the dialogs.rc/dialogs.h files with VC++ or similar (although there are other methods). If you're not in a hurry, I'll try and do some more work into getting this sorted next week (NZ is pretty much on holiday this week) and writing up a "how to translate" guide. Feel free to remind me if I haven't said anything by Wednesday... Thanks again for the offer, we really appreciate it! =Tony.Meyer -- Please always include the list (spambayes@python.org) in your replies (reply-all), and please don't send me personal mail about SpamBayes. http://www.massey.ac.nz/~tameyer/writing/reply_all.html explains this. From kenny.pitt at gmail.com Fri Dec 31 03:17:48 2004 From: kenny.pitt at gmail.com (Kenny Pitt) Date: Fri Dec 31 03:19:57 2004 Subject: [spambayes-dev] Re: Version Data In-Reply-To: References: Message-ID: <2a052b990412301817408af76@mail.gmail.com> Tony Meyer wrote: > > I just need > > to figure out what to generate in the version.cfg for the > > "Outlook" and "POP3 Proxy" sections so that existing 1.0.x > > versions will be able to update properly when 1.1 is released. > > Good point - although we could always change the download page value so that > 1.0.x looks in a different place than 1.1.x. I gather you've got it going > anyway, though, so it'll work with the same file. Well, sort of. The idea is that 1.1 and later versions will only look at the [SpamBayes] section in Version.cfg, and will ignore the [Outlook] and [POP3 Proxy] sections which will be used for upgrading 1.0.x versions. However, I didn't like having it report 0.3 as the latest version, so for now it's still looking at the Outlook section. The [SpamBayes] section in Version.cfg is never looked at by anything in the 1.0.x versions, so we could probably go ahead and change that value to 1.0.1 (or to 1.1a0) if we want. If we make that change, then I can go ahead and update the code to look only at that. > > Speaking of upgrades: now that we have a 1.0 "final" out > > there, are we only going to notify users of new "final" > > releases or do we need to add a way for them to check for new > > pre-release versions as well? > > Good question :) I presume this will do the 'wrong' thing at the moment, > since 1.1a1 is technically a higher version than 1.0.1. Yes, that's what it would do if we were to put 1.1a1 in the Version.cfg file. Alternatively, we could only update the Version.cfg file with the latest final release. I know I updated the message in pop3proxy_tray so that it indicates when you are running a newer version than what is in Version.cfg. I'll have to check if that is the case for Outlook or not. > I think the most sensible thing would be to only suggest upgrading when > there is a newer final version out. spambayes-announce and the website will > cover prereleases coming out, which should be enough (since they may not > actually work well anyway). Is that a simple change to the code? As above, the easiest solution if we want only the final versions is to never put a pre-release version number in Version.cfg. However, there are probably a fair number of people that would also want to know about the latest pre-release versions. The best thing here would probably be to store two different version numbers in Version.cfg: one for latest final version and one for latest pre-release version. Then we could show both the latest final version and the latest pre-release version in the update dialog, or we could have an advanced option that you could edit into your INI file to indicate whether you want to see pre-release versions or only final versions. -- Kenny Pitt From lililili26 at sina.com Fri Dec 31 06:59:49 2004 From: lililili26 at sina.com (lidengdeng) Date: Fri Dec 31 07:01:54 2004 Subject: [spambayes-dev] Evaluating a training corpus Message-ID: <20041231060153.BC3BC1E4003@bag.python.org> Hello Greg Ward, I have seen message posted by you in the below website http://mail.python.org/pipermail/spambayes-dev/2003-June/000122.html .And I think u have solved those ploblems already. I'm a Chinese postgraduate in an university in China. And my disquisition is to build a large-scale email-corpus. So I very need your help, can u send me some datum or papers about emial-corpus, and how to evaluate a training corpus. Are there many rules in the process of constructing the corpus? I'm sorry for bothering you. And I will be greatly appreciated if you can reply me. Best wishes and happy new year! lidengdeng lililili26@sina.com 2004-12-31